ABSTRACT Title of Dissertation: EVALUATION OF SELECTED SIDE-CHANNEL ANALYSIS METHODS FOR RANSOMWARE CLASSIFICATION AND DETECTION Jennie E. Hill Doctor of Philosophy, 2023 Dissertation Directed by: Professor Martin C. Peckerar Department of Electrical and Computer Engineering The physical implementation of computer hardware leads necessarily to physical behavior on the part of an operating computer. This physical behavior has physical characteristics, many of which become channels of information leakage that can be observed by an unintended receiver. This poses a serious threat to computer security. These “side-channels” of computer operations, such as current usage and power consumption, generation of heat and electromagnetic radiation, and events at the micro-architectural level, can be exploited to compromise the confidentiality of a system. This work considers side-channel analysis techniques for the temperature, power, and micro-architectural side-channels for the purpose of classifying state-of-the-art ransomware on real world, non-virtualized Windows systems. Over three thousand ransomware and benign trials were collected to generate training and testing data sets, which required development of a process to synchronize collection of on-system (e.g. performance counters) and off-system (e.g. power) measurements, safely transfer trial data from the encrypted system, and restore the system to a “clean” state without the use of virtualization techniques, which negatively impact the validity of side-channel measurements. Side-channels were evaluated on their effectiveness in accurately differentiating between ransomware and benign operations such as background operating system activity, 7zip encryption, and SPEC benchmarks, in a given time duration, with Matthews’ Correlation Coefficient (MCC) used to measure overall classifier performance of five machine learning classification algorithms. The temperature side-channel, accessed through thermal imaging, was found to be unsuitable for the ransomware detection/classification application due to its sensitivity to thermal noise, significant pre-processing requirements, and slow response times due to the loss of signal components above the low kHz range. This limited its ability to identify ransomware before encryption operations typically begin (within 2 seconds of execution, on average). The power side-channel, accessed by monitoring the current drawn by a solid state drive, generated best-case classification accuracy results of 96% (0.92 MCC) with 15 seconds of current data and ≥ 90% (MCC ≥ 0.8) for all five classifiers tested with at least 5 seconds of data. Tests demonstrated at least four seconds of data were required to attain a best case classification accuracy greater than 90%, and at 2 seconds the best-performing classifier attained an MCC of just 0.66 with 83.3% accuracy. The micro-architectural side-channel was accessed through hardware performance counters, which provided the highest MCC and accuracy results in the shortest period of time. Hardware performance counters are registers built into a CPU’s Performance Monitoring Unit, and measure events related to processor and memory system operations (e.g. CPU clock cycles, total instructions retired, memory accesses, cache hits/misses, branches taken, etc.). Over 230 hardware events were collected, tested, and ranked by their contribution to overall classifier performance. Each classification algorithm was found to have a distinct performance counter feature ranking, and the selected features could be further optimized by desired detection window duration. Examination of results showed that, despite the quantity of features collected, classifier performance only marginally improved after 6 features for ≤ 2 seconds, with MCC ≥ 0.9 for 1 second of data for 3 of 5 classifiers tested with just 4-6 performance counter features, and a best-case MCC of 0.98 with 1 second of data and 4 performance counter features. MCC results for the shortest duration event window (0.1s) were found to be within 0-7% of the best case MCC result window (1-2 s) for each classifier, indicating that ransomware can be classified using only a tenth of a second of 4-6 performance event measurements with greater than 90% accuracy for four of five classifiers tested, which makes the implementation of this approach in a real-time ransomware detector feasible. With the financial impact of ransomware estimated to cost more than $30 billion globally this year, the usefulness of new detection techniques for non-virtualized computer systems has significant real-world implications. EVALUATION OF SELECTED SIDE-CHANNEL ANALYSIS METHODS FOR RANSOMWARE CLASSIFICATION AND DETECTION by Jennie Elizabeth Hill Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2023 Advisory Committee: Professor Martin C. Peckerar, Chair/Advisor Professor Bruce L. Jacob, Co-Advisor Professor Ankur Srivastava Professor T. Owens Walker, III Professor Donald Yeung Professor Amitabh Varshney, Dean’s Representative Acknowledgements This was absolutely not a solo effort. Many people deserve my sincere gratitude; I could not possibly capture them all... So here are some. My advisors: Dr. Jacob and Dr. Peckarar, for being available to answer my texts at all hours and for pushing the pace when I was on the verge of procrastinating. Your guidance and mentorship have been both insightful and essential in helping me to get to this point. My “CECSR” (Computer Engineering and Cyber Security Research) team - James, Justin, Dane, Rob, Owens, Ryan, Hau - all of whom were very generous, extremely patient mentors: For teaching me, letting me borrow equipment that I may or may not have returned, and reminding me that research is a journey rather than a destination. Your bald-faced humor, insistence on the insertion of randomly-chosen words (e.g. “bald”) into papers, and ability to not take yourselves too seriously made the long days in the lab tolerable. Special thanks to ENS Brendan Farmer for far outperforming all expectations as a lab assistant, and to my small-but-mighty UMD crew of Devesh and Ananth for the selfless gifts of their time and guidance. My Naval Academy colleagues, particularly: Jeff, for his technical expertise, critical lab support, and many years of friendship. Mike and Chris, for numerous hours of EI, well outside the scope of their normal duties. Hatem, Ethan, and Ann, for their unwavering support and encouragement. The ECE Department and PMP program leadership, for their support in making this opportunity a possibility in the first place, commitment to finding a program that was a realistic fit, and then ensuring I had the resources to finish the job. Nick, Jon, and Sam: For adapting to living through a global pandemic in a single-parent household where mom turned the dining room into a lab for her PhD research, and for all the additional responsibilities they assumed in the process (and especially to Sam for his many hours of company when we finally had a real lab to go to). Together, they made sure I never got too much sleep or quiet time, and provided me with an endless supply of stories to tell my research group when I should have been working. My family: My parents for making multiple trips to Annapolis to assist with cooking/driving/ childcare, my mom for doubling as a copy-editor, my brothers for their consistent phone support availability, and to Amanda for being like-a-sister for longer than either of us cares to admit (unless we met when we were 4 or younger). My CW crew, QuaranTEAM (Alana, Sara, Chad), and friends: Cindy, Ora, Meredith, Kristin, & Kristen, to name just a few. If you ever stumble across this and think maybe you should be part of this list, you’re right. Thank you all, seriously, for your support. I really couldn’t have done it without you. And adderall. It was prescribed. But mainly you. Last and least, to generative AI for producing a solid Acknowledgements section draft that incorporated humor and sarcasm. And for the warning to not use it. ii Table of Contents Acknowledgements ii Table of Contents iii List of Tables vii List of Figures viii List of Abbreviations xviii Chapter 1: Introduction 1 1.1 A Brief History of Side-channels . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Considering Side-channels in the framework of Electronic Warfare . . . . 3 1.2 Side-channel Analysis to detect Ransomware . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2: Side-channels 10 2.1 Side-channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.1 Power Side-channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.2 Electromagnetic Side-channel . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.3 Acoustic Side-channel . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.4 Optical Side-channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.5 Temperature Side-channel . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.6 Micro-architectural Side-channel . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Side-channel Attack, Analysis, and Defense . . . . . . . . . . . . . . . . . . . . 13 2.3 Side-channel Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1 Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Entropy, Conditional Entropy, & Guessing Entropy . . . . . . . . . . . . 16 2.3.3 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.4 Success Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.5 Welch’s T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.6 Side-channel Vulnerability Factor (SVF) . . . . . . . . . . . . . . . . . . 18 2.3.7 Spatial Thermal Side-channel Factor (STSF) . . . . . . . . . . . . . . . . 18 2.3.8 Cache Side-channel Vulnerability (CSV) . . . . . . . . . . . . . . . . . . 19 2.3.9 Signal Available to the Attacker (SAVAT) . . . . . . . . . . . . . . . . . 19 iii 2.3.10 Thermal-Security-in-Multi-Processors (TSMP) . . . . . . . . . . . . . . 19 2.3.11 Maximal Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.12 Information Leakage Rate . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.13 Local Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.14 Trust Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.15 Holistic Assessment Criterion . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.16 Machine Learning Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 21 Chapter 3: The Temperature Side-channel 22 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Characteristics of the Temperature Side-channel . . . . . . . . . . . . . . . . . . 22 3.3 Sensing the Thermal Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1 Internal Temperature Sensing Methods . . . . . . . . . . . . . . . . . . . 24 3.3.2 External Temperature Sensing Methods . . . . . . . . . . . . . . . . . . 25 3.4 Temperature Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.1 Temperature Side-channel Attacks . . . . . . . . . . . . . . . . . . . . . 28 3.4.2 Covert Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.3 Physical Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5 Temperature Side-channel Defense . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5.1 Design-time Defense . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.2 Run-time Defense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Chapter 4: The Micro-architectural Side-channel 33 4.1 Micro-architectural Side-channel . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.1 Timing Side-channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.2 Memory (Access) Side-channel . . . . . . . . . . . . . . . . . . . . . . 34 4.2 Hardware Performance Counters for the Micro-architectural Side-channel . . . . 34 4.2.1 Profiling Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.2 Intel®VTuneTM Profiler Primer . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 Considerations When Using Performance Counters for Security Applications . . 39 Chapter 5: Malware and Ransomware 42 5.1 Malware Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.1.1 Ransomware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Malware Detection via Side-Channel Analysis . . . . . . . . . . . . . . . . . . . 45 5.2.1 Malware Detection with the Power Side-channel . . . . . . . . . . . . . 45 5.2.2 Malware Detection with the EM Side-channel . . . . . . . . . . . . . . . 45 5.2.3 Micro-architectural Side-Channel for Malware Detection (via Hardware Performance Counters) . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.4 HPC-based Machine Learning Classification . . . . . . . . . . . . . . . . 48 5.3 Ransomware Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3.1 Ransomware Detection with Hardware Sensors and Performance Counters 50 Chapter 6: Temperature Side-channel Analysis Experiments 54 iv 6.1 TSCA Study 1: Temperature Side-channel Analysis to detect File Operations on an SSD with Thermal Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.1.1 Experimental Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.1.2 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.1.3 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . 60 6.2 TSCA Study 2: Temperature Side-channel Analysis to detect Simulated Malware on a Single Board Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2.1 TSCA Study 2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 62 6.2.2 Thermal Signature of Write Operations . . . . . . . . . . . . . . . . . . 66 6.3 TSCA Experiment Conclusions and Future Research Directions . . . . . . . . . 70 Chapter 7: Investigating Vtune HPCs for Ransomware Detection 72 7.1 Proof-of-Concept A: Vtune Data Collection of Ransomware Trials . . . . . . . . 72 7.1.1 Hardware Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.1.2 Data Collection Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.1.3 System Restore Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.1.4 Initial Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.2 Proof-of-Concept B: Simultaneous Current and Vtune Data Collection of ransomware in Non-virtualized and Virtualized Environments . . . . . . . . . . . . . . . . . 78 7.2.1 Non-Virtualized Hardware Setup . . . . . . . . . . . . . . . . . . . . . . 79 7.2.2 Virtualized System Set-up . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.2.3 Data Collection Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.2.4 Vtune Performance Counter Analysis . . . . . . . . . . . . . . . . . . . 84 7.2.5 Proof-of-Concept B Observations . . . . . . . . . . . . . . . . . . . . . 84 7.2.6 Drive Restore Procedure Using Image for Linux . . . . . . . . . . . . . . 85 7.3 Study 1: Automate, Expedite, and Expand Vtune collection . . . . . . . . . . . . 86 7.3.1 Data Collection Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.3.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.4 Study 2: Expanded Vtune HPC Collection with Additional Ransomware Samples 90 7.4.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.5 Analysis of Performance Counters for Classification of Ransomware Op-erations for Studies 1 (23-class) and 2 (31-class) . . . . . . . . . . . . . . . . . . . . . . 92 7.5.1 Feature Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.5.2 Classification Algorithm Identification . . . . . . . . . . . . . . . . . . . 94 7.5.3 HPC Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.5.4 Single-HPC F1-based Feature Reduction . . . . . . . . . . . . . . . . . . 118 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Chapter 8: Selecting Vtune HPCs for Ransomware Detection 126 8.1 Study 3: 35-class, 90 HPC Feature Collection with Networked Database transfer and Parallel Report Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 8.1.1 Training Data Verification . . . . . . . . . . . . . . . . . . . . . . . . . 127 8.2 Training Data Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.2.1 Matthews’ Correlation Coefficient (MCC) . . . . . . . . . . . . . . . . . 131 8.2.2 HPC Feature Selection Process . . . . . . . . . . . . . . . . . . . . . . . 133 v 8.2.3 Cross-Validation of Training Data . . . . . . . . . . . . . . . . . . . . . 136 8.2.4 Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.3 Classifier Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 8.3.1 Test Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 8.4 Towards Start-time Ransomware Detection . . . . . . . . . . . . . . . . . . . . . 148 8.4.1 Identification of Operation Start Time . . . . . . . . . . . . . . . . . . . 150 8.4.2 Cross Validation and Testing Results . . . . . . . . . . . . . . . . . . . . 151 8.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.5 Top Performance Counter Feature Comparison . . . . . . . . . . . . . . . . . . . 158 8.5.1 25 second Start Time Feature Set . . . . . . . . . . . . . . . . . . . . . . 158 8.5.2 Performance Counter Ranking Discussion . . . . . . . . . . . . . . . . . 159 8.6 MASC Experiments Summary and Future Research Directions . . . . . . . . . . 159 Chapter 9: Current-based Power Side-channel Analysis Experiments 167 9.1 Proof of Concept: Power Side-channel Analysis via Current Measurements for Detection of SSD File Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 167 9.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 9.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.1.3 Follow-on Power Side-channel Analysis Work . . . . . . . . . . . . . . . 176 9.2 Study: Classification of Read, Write, and Idle Operations with Current-based Power Side-channel Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 9.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 9.3 Proof of Concept: Accessing the Power Side-channel via Current Draw for Ransomware Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.3.2 Current Side-Channel for Ransomware Detection . . . . . . . . . . . . . 196 9.4 Study: 35-class Current-draw-based Side-channel Analysis for Ransomware Classification197 9.4.1 Part 1: Power Feature Generation and Classifier Identification . . . . . . 199 9.4.2 Part 2: Current-based Power side-channel Analysis for Ransomware vs. Benign Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 9.4.3 Study 2 Results and Initial Conclusions . . . . . . . . . . . . . . . . . . 212 9.5 PSCA Experiment Summary and Future Research Directions . . . . . . . . . . . 223 Chapter 10: Conclusions and Future Work 227 10.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Appendix A: Table of Vtune Hardware Performance Counter Names, Descriptions, and associated Feature Numbers for each Study, adapted from [1] 232 Bibliography 258 vi List of Tables 6.1 Comparison between the thermal cameras used in experiments. . . . . . . . . . . 64 8.1 Accuracy, Precision, Recall, F1, and MCC metrics for the Confusion Matrix shown in Figure 8.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 9.1 Solid State Drives Tested . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.2 Solid State Drives Tested . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.3 Current data from trials were converted to Power Spectral Density using Welch’s Method with varying combinations of data duration, window size, and maximum frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9.4 Current data from all training and testing trials were converted to Power Spectral Density using Welch’s Method with varying combinations of data duration, window size, and maximum frequency. All power features were generated in both Watts and dBW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 vii List of Figures 1.1 Electromagnetic wave, reprinted from [2]. . . . . . . . . . . . . . . . . . . . . . 3 1.2 Electromagnetic Spectrum, adapted from [3]. . . . . . . . . . . . . . . . . . . . 4 1.3 Relationship between Electronic Warfare terminology and Side-Channel terminology. 6 6.1 Experimental set up to capture thermal images of SSDs with a FLIR A325sc thermal camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2 Thermal images of a Samsung 970 EVO Plus SSDs during idle, read, write, operations. All images were captured with a FLIR A325sc thermal camera. . . . 56 6.3 Thermal images of a Samsung 970 EVO Plus SSDs during idle, read, write, operations. All images were captured with a FLIR A325sc thermal camera. . . . 57 6.4 Consolidated Confusion Matrix showing the ability to differentiate activity (read/write) from non-activity (idle) for all SSD, operation, and file size combinations for the 3 classifiers trained and tested. . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.5 10-fold Cross Validation Results for 3 classifiers trained with read, write, and idle operations at file sizes of 100, 500, and 1000MB. Each row shows results for a single classifier at increasing read/write sizes. . . . . . . . . . . . . . . . . . . . 59 6.6 Test Data Results for 3 classifiers trained with read, write, and idle operations at file sizes of 100, 500, and 1000MB. Each row shows results for a single classifier at increasing read/write sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.7 System diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.8 Measurement system setup. The green (upper) box in the left image indicates the thermal camera, while the white (lower) box highlights the single board computer. The right images show the top and side views of the camera and board set up. . . 63 6.9 Comparison of images captured with (a) FLIR A325sc, (b) Seek Thermal CompactPro, and (c) FLIR T530 cameras. FLIR image temperature scaled to 75-135 °F with FLIR Research Studio software. SeeK CompactPRO scaled to 75-135 °F with SeeK Thermal iPhone app. FLIR T530 scaled from 80-110 °F in FLIR Tools. Blue in SeeK image indicates temperatures < 75°. . . . . . . . . . . . . . . . . . 65 6.10 File operations performed on a BeagleBone Black, as observed with a SeeK CompactPRO thermal camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.11 Thermal signature of a 100 MB file write over time. The left image shows a BeagleBone Black with three components highlighted: (1) Power Management Integrated Chip, (2) Embedded Multi Media Card, and (3) Processor, all of which produce distinct heat signatures over the course of the write operation. . . . . . . 66 viii 6.12 Accuracy of SeeK Compact Pro and FLIR A325sc cameras, using both automated and manual detection techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.13 Precision of SeeK Compact Pro and FLIR A325sc cameras, using both automated and manual detection techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.14 Recall of SeeK Compact Pro and FLIR A325sc cameras, using both automated and manual detection techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.1 Data Collection Summary for Proof of Concept A . . . . . . . . . . . . . . . . . 73 7.2 Collection apparatus. Includes the data recorder and current probe/amplifier to measure current supplied to the drive, but those capabilities were not incorporated into the proof-of-concept Vtune collection. . . . . . . . . . . . . . . . . . . . . . 73 7.3 Data Collection and System Restore Procedure for manually executed 7zip and Sodinokibi randomized trials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.4 Mean Event Rate for a 7zip encryption trial vs. REvil/Sodinokibi ransomware, from analysis of VTune microarchitectural hardware performance counters. . . . 77 7.5 Data Collection Summary for Proof of Concept B . . . . . . . . . . . . . . . . . 78 7.6 Experimental setup for monitoring power supplied to the test OS SSD. Current is measured with a current probe, amplified, and acquired and stored with the data recorder. Vtune microarchitectural hardware performance counters are collected from Vtune Profiler running on-system and then transferred off-system for additional analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.7 Data Collection and System Restore Procedure for manually executed trials. . . . 83 7.8 Vtune Hardware Performance Counters collected solely in 20 of 20 non-idle trials. 84 7.9 Vtune Hardware Performance Counters collected solely in 15 of 15 non-idle trials. 85 7.10 Data Collection Summary for Study 1 . . . . . . . . . . . . . . . . . . . . . . . 86 7.11 SPECworkstation 3.1 Benchmarks incorporated into data collection. . . . . . . . 88 7.12 Data Collection and System Restore Procedure Study 1. . . . . . . . . . . . . . . 89 7.13 Data Collection Summary for Study 2. . . . . . . . . . . . . . . . . . . . . . . . 90 7.14 Data Collection, System Restore, and Report Generation Procedure for 31-class data set (Study 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.15 Start Time - Window Size - Stop Time Combinations for Study 1. Each column indicates a time interval for which HPC feature sets were generated (12 total). . . 93 7.16 Start Time - Window Size - Stop Time Combinations for Study 2. Each column indicates a time interval for which HPC feature sets were generated (12 total). . . 94 7.17 MATLAB Classification Models Evaluated in Classification Learning Application, adapted from [4]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.18 MATLAB Classification Learning Application Cross-Validation Accuracy for 23- class data set (Study 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.19 MATLAB Classification Learning Application Cross-Validation Accuracy for 31- class data set (Study 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.20 Five Classification Algorithms were selected based on their performance across both data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 ix 7.21 Weighted importance score calculation example for the 31-class (220 HPC) data set and the MRMR algorithm. Feature ranking rows 11-210 are hidden for the 31-operation classification ranking results, while feature rows 11-220 are hidden for the binary classification ranking results. On the right hand side, weighted importance scores are summed across each row for each HPC to get the total importance score for each HPC and ranking. Importance scores are then summed by column to get a weighted importance score for each HPC for all rankings and data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.22 Weighted importance scores by algorithm for the to 10 HPCs in the 31-class experiment, ranked from highest to lowest importance score value. . . . . . . . . 103 7.23 Single-HPC Feature Classification Accuracy for top 20 Features using Bagged Tree Ensemble Classification Algorithm for multi-class and binary versions of Studies 1 and 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.24 Single-HPC Feature Classification Accuracy for top 20 Features using Subspace Discriminant Ensemble Classification Algorithm for multi-class and binary versions of Studies 1 and 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.25 Cross Validation Accuracy for incremental performance counters in order from highest to lowest individual accuracy values for Bagged Tree Ensemble Classifier. 23-class data sets are shown in the top two plots, with 31-class data sets in the bottom two plots. The multi-class version of each data set is on the left, while the binary version of each data set is on the right. . . . . . . . . . . . . . . . . . . . 106 7.26 Cross Validation Accuracy for incremental performance counters in order from highest to lowest individual accuracy values for Subspace Discriminant Ensemble Classifier. 23-class data sets are shown in the top two plots, with 31-class data sets in the bottom two plots. The multi-class version of each data set is on the left, while the binary version of each data set is on the right. . . . . . . . . . . . . 107 7.27 Comparison of Bagged Tree Classifier Cross Validation Accuracy for incremental performance counters in order from highest to lowest individual accuracy values for windows of 0.1 s, 0.5 s, 1 s, and 2 s before, at, and after operation start time. 23-class multi-class data sets are shown across the top, with 31-class multi-class data sets across the bottom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.28 Comparison of Bagged Tree Classifier Cross Validation Accuracy for incremental performance counters in order from highest to lowest individual accuracy values for windows of 0.1 s, 0.5 s, 1 s, and 2 s before, at, and after operation start time. 23-class BINARY data sets are shown across the top, with 31-class BINARY data sets across the bottom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.29 A Confusion Matrix is a standard way to visualize supervised machine learning outcomes. False Negatives (FN) are placed in Quadrant I, True Positives (TP) are placed in Quadrant II, False Positives (FP) are placed in Quadrant III, and True Negatives (TN) are in Quadrant IV . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.30 Example “worst case” Confusion Matrix for single-HPC classification tests. . . . 112 7.31 Best single-HPC Confusion Matrix resulting from training 5 classification algorithms with a single HPC feature at a time. This Confusion Matrix shows the cross validation results of a Fine KNN classifier trained with 2 seconds of data and the single HPC feature BR INST RETIRED.NEAR RETURN PS. . . . . . . . . . . 113 x 7.32 F1 scores for each HPC feature and start/stop time window were calculated and then averaged across all start/stop time windows to obtain an average F1 score for HPC feature ranking purposes. . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.33 Cross validation accuracy, precision, recall, and F1 plots for binary 23-class (Study 1) and 31-class (Study 2) data sets trained with Bagged Tree Ensemble classifier and incrementing the number of HPCs used for training. HPCs were ranked from high to low by F1 score averaged over start/stop time window sizes. . 115 7.34 Cross validation accuracy, precision, recall, and F1 plots for binary 23-class (Study 1) and 31-class (Study 2) data sets trained with Subspace Discriminant Ensemble classifier and incrementing the number of HPCs used for training. HPCs were ranked from high to low by F1 score averaged over start/stop time window sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.35 Cross validation accuracy, precision, recall, and F1 plots for binary 23-class (Study 1) and 31-class (Study 2) data sets trained with Linear Support Vector Machine classifier and incrementing the number of HPCs used for training. HPCs were ranked from high to low by F1 score averaged over start/stop time window sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.36 Cross validation accuracy, precision, recall, and F1 plots for binary 23-class (Study 1) and 31-class (Study 2) data sets trained with Fine K-Nearest Neighbor classifier and incrementing the number of HPCs used for training. HPCs were ranked from high to low by F1 score averaged over start/stop time window sizes. . 116 7.37 Cross validation accuracy, precision, recall, and F1 plots for binary 23-class (Study 1) and 31-class (Study 2) data sets trained with Narrow Neural Network classifier and incrementing the number of HPCs used for training. HPCs were ranked from high to low by F1 score averaged over start/stop time window sizes. . 117 7.38 Top 20 F1-ranked HPCs for each classification algorithm. Study 1 (23-class) HPC features are on the left column and Study 2 (31-class) HPC features are on the right. Corresponding feature numbers from the opposing study are listed in gray next to the HPC number for each study. . . . . . . . . . . . . . . . . . . . . 117 7.39 Top F1-ranked HPC for each classification algorithm for Study 2 (31-class data set) only. These were used as a starting point to identify the fewest number of HPC features to generate the highest possible F1 score. . . . . . . . . . . . . . . 119 7.40 HPC Features selected for each classifier to achieve the highest possible F1 score with the fewest number of features, ranked by F1 score. . . . . . . . . . . . . . . 119 7.41 Cross Validation F1 for each classifier, with HPC features selected to achieve the highest possible F1 score with the fewest number of features. The HPC Features selected for training each classifier are shown in Figure 7.40. . . . . . . . . . . . 120 7.42 Top 3 Cross Validation F1 scores for each classifier and F1 ranks, with HPC features selected to achieve the highest possible F1 score with the fewest number of features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.43 Comparison of HPC features with three highest F1 cross validation scores for 1st 4 ranks, with HPC features selected to achieve the highest possible F1 score with the fewest number of features using Bagged Tree Ensemble Classifier. . . . . . . 122 xi 7.44 Comparison of HPC features with three highest F1 cross validation scores for 1st 4 ranks, with HPC features selected to achieve the highest possible F1 score with the fewest number of features using Subspace Discriminant Ensemble Classifier. . 122 7.45 Comparison of HPC features with three highest F1 cross validation scores for 1st 4 ranks, with HPC features selected to achieve the highest possible F1 score with the fewest number of features using Linear Support Vector Machine Classifier. . . 123 7.46 Comparison of HPC features with three highest F1 cross validation scores for 1st 4 ranks, with HPC features selected to achieve the highest possible F1 score with the fewest number of features using Fine K-Nearest Neighbor Classifier. . . . . . 123 7.47 Comparison of HPC features with three highest F1 cross validation scores for 1st 4 ranks, with HPC features selected to achieve the highest possible F1 score with the fewest number of features using Narrow Neural Network Classifier. . . . . . . 124 8.1 Study 3 included all ransomware samples used in Studies 1 and 2, plus 4 new ransomware versions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8.2 Data Collection Summary for Study 3. . . . . . . . . . . . . . . . . . . . . . . . 128 8.3 Data Collection, System Restore, and Networked Report Transfer/Generation Procedure for 35-class data set (Study 3). . . . . . . . . . . . . . . . . . . . . . . 128 8.4 Summary of 20 Training Data Trials for each of 35 classes in Study 3 (HPC 48) INST RETIRED.ANY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 8.5 Example Plot of HPC 48 (INST RETIRED.ANY) Individual Sodinokibi Trials for Training Data Verification Trial 151 was identified as anomalous and repeated. 130 8.6 Start Time - Window Size - Stop Time Combinations for Study 3. Each column indicates a time interval for which HPC feature sets were generated (12 total). . . 131 8.7 Example: Confusion Matrix for Matthews’ Correlation Coefficient (MCC) vs. F1 Score for imbalanced data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 8.8 Top 10 ranked HPC features for each classifier for 5 rounds of cross validation and training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.9 MCC scores averaged across each start/stop window for each classifier plotted against the number of optimized HPCs used to train the classifier. . . . . . . . . . 136 8.10 Cross validation confusion matrices for Bagged Tree Ensemble Classifier trained with 6 optimized HPCs. Window size increases from left to right. . . . . . . . . . 137 8.11 Cross validation confusion matrices for Subspace Discriminant Ensemble Classifier trained with 6 optimized HPCs. Window size increases from left to right. . . . . . 137 8.12 Cross validation confusion matrices for Linear SVM Classifier trained with 6 optimized HPCs. Window size increases from left to right. . . . . . . . . . . . . 138 8.13 Cross validation confusion matrices for Fine KNN Classifier trained with 6 optimized HPCs. Window size increases from left to right. . . . . . . . . . . . . . . . . . . 138 8.14 Cross validation confusion matrices for Narrow Neural Network Classifier trained with 6 optimized HPCs. Window size increases from left to right. . . . . . . . . . 138 8.15 Average MCC by window size for classifiers trained with 6 optimized HPCs. . . . 139 8.16 Percentage of Correct Predictions by Classifier and Start/Stop Window for all ransomware trials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 8.17 Percentage of Correct Predictions by Classifier and Start/Stop Window for all benign trials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 xii 8.18 Classification Accuracy for classifiers evaluated with separate test data, collected in randomized order on two different systems with identical hardware. . . . . . . 144 8.19 Classification MCC for classifiers evaluated with separate test data, collected in randomized order on two different systems with identical hardware. . . . . . . . 144 8.20 Average MCC by window size for classifiers tested with 6 optimized HPC features.145 8.21 Confusion Matrix for Bagged Tree Ensemble classifier trained with 6 optimized HPC features for window sizes of 0.1s, 0.5s, 1s, and 2s at operation trigger time. . 145 8.22 Confusion Matrix for Subspace Discriminant Ensemble classifier trained with 6 optimized HPC features for window sizes of 0.1s, 0.5s, 1s, and 2s at operation trigger time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 8.23 Confusion Matrix for Linear SVM classifier trained with 6 optimized HPC features for window sizes of 0.1s, 0.5s, 1s, and 2s at operation trigger time. . . . . . . . . 146 8.24 Confusion Matrix for Fine KNN classifier trained with 6 optimized HPC features for window sizes of 0.1s, 0.5s, 1s, and 2s at operation trigger time. . . . . . . . . 147 8.25 Confusion Matrix for Narrow Neural Network classifier trained with 6 optimized HPC features for window sizes of 0.1s, 0.5s, 1s, and 2s at operation trigger time. . 147 8.26 Percentage of Correct Predictions for 6 HPC feature Classifiers for all (12) sets of Start/Stop Time Windows for all ransomware trials. . . . . . . . . . . . . . . . 149 8.27 Percentage of Correct Predictions for 6 HPC feature Classifiers for all (12) sets of Start/Stop Time Windows for all benign trials. . . . . . . . . . . . . . . . . . 149 8.28 Top 10 ranked HPC features for each classifier for 5 rounds of cross validation and training, with features generated for durations of 0.1, 0.5, 1, and 2s from visually identified operation start time. . . . . . . . . . . . . . . . . . . . . . . . 151 8.29 Cross Validation Accuracy (top) and MCC (bottom) for classifiers trained with the top 1-9 HPC features by MCC rank generated for durations of 0.1, 0.5, 1, and 2s from visually identified operation start time. . . . . . . . . . . . . . . . . . . 152 8.30 Test Data Accuracy (top) and MCC (bottom) for classifiers trained with the top 1-9 HPC features by MCC rank generated for durations of 0.1, 0.5, 1, and 2s from visually identified operation start time. . . . . . . . . . . . . . . . . . . . . . . . 152 8.31 Summary of Test Data Accuracy (top) and MCC (bottom) for classifiers trained with the top 1-9 HPC features by MCC rank generated for windows of 0.1, 0.5, 1, and 2s beginning at 15s, adapted from Section 8.3.1 for comparison purposes. . 153 8.32 % of Correct Predictions for benign operations, using 0.1, 0.5, 1, and 2s windows at actual operation start time (identified visually) and 9 HPC features. Operations for which 100% of trials were predicted correctly across all feature sets are in bold. Operation/classifier combinations which predicted less than 80% of trials correctly are in orange, while operation/classifier combinations that had fewer than 50% of trials predicted correctly are in red. . . . . . . . . . . . . . . . . . . 155 8.33 % of Correct Predictions for ransomware operations, using 0.1, 0.5, 1, and 2s windows at actual operation start time (identified visually) and 9 HPC features. Operations for which 100% of trials were predicted correctly across all feature sets are in bold. Operation/classifier combinations which predicted less than 80% of trials correctly are in orange, while operation/classifier combinations that had fewer than 50% of trials predicted correctly are in red. . . . . . . . . . . . . . . . 155 xiii 8.34 Comparison of % of Correct Predictions for benign operations, with 15s Start Time Results (from Figure 8.27) on the left, and visually identified actual operation start time shown on the right. Just with just 6 HPC features were required for the results on the left, while 9 HPC features were required to obtain the results on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.35 Comparison of % of Correct Predictions for ransomware operations, with 15s Start Time Results (from Figure 8.27) on the left, and visually identified actual operation start time shown on the right. Just with just 6 HPC features were required for the results on the left, while 9 HPC features were required to obtain the results on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.36 Titles and descriptions of top 3 ranked performance counters for all combinations of classifiers at 15 seconds, 25 seconds, and actual operation start time. . . . . . . 160 8.37 Summary of all training trials and all 35-classes of benign and ransomware operations for performance counter 2: BR INST RETIRED.FAR BRANCH PS. The 20 benign operations are displayed in the top 5 rows, while the 15 ransomware operations are displayed in the bottom 4 rows. . . . . . . . . . . . . . . . . . . . . . . . . . 161 8.38 Summary of all training trials and all 35-classes of benign and ransomware operations for performance counter 64: L2 RQSTS.CODE RD MISS. The 20 benign operations are displayed in the top 5 rows, while the 15 ransomware operations are displayed in the bottom 4 rows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 8.39 Summary of all training trials and all 35-classes of benign and ransomware operations for performance counter 67: L2 RQSTS.PF HIT. The 20 benign operations are displayed in the top 5 rows, while the 15 ransomware operations are displayed in the bottom 4 rows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 9.1 Experimental setup for monitoring power supplied to the test SSD from the host computer. Current is measured with a current probe, amplified, and acquired and stored with the data recorder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 9.2 Seven SSDs were tested. Drives were selected to cover multiple form factors (2.5” SSD and M.2 module), interface types (SATA III and PCIe 3.0 x4), and technologies (3D NAND and 3DXP), as indicated above. . . . . . . . . . . . . . 170 9.3 Data Collection Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.4 Representative Reads of 500 MB files for all Crucial, Optane, and Samsung SSDs tested. Fifteen trials were conducted at each of 5 file sizes. Power is provided through the 12V line to all PCIe drives (all Optane drives, the Crucial P5, and the Samsung 970 EVO Plus), and through the 5V line for the SATA III SSDs (Crucial MX500 and Samsung 850 EVO). Note: Y-axis for each drive is scaled to show detailed signature and should not be used for direct power comparison. . . . . . . 172 9.5 Representative Writes of 500 MB files for all Crucial, Optane, and Samsung SSDs tested. Fifteen trials were conducted at each of 5 file sizes. Power is provided through the 12V line to all PCIe drives (all Optane drives, the Crucial P5, and the Samsung 970 EVO Plus), and through the 5V line for the SATA III SSDs (Crucial MX500 and Samsung 850 EVO). Note: Y-axis for each drive is scaled to show detailed signature and should not be used for direct power comparison. . . . . . . 174 xiv 9.6 Longest pulse average power for (a) writes and (b) reads as a function of file size. Values plotted are averages across trials (n = 15). Error bars are +/- 2 standard errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.7 Data Flow for collection of read, write, and idle operations on four SSDs under test.179 9.8 Representative time domain (left) and spectrogram (right) plots for 1 GB read operations for the Samsung, Western Digital, Crucial, and Optane drives, respectively (top to bottom). (Values smaller than -115 dB in the spectrogram plots are thresholded to -115 dB to facilitate visualization by improving image contrast.). . 180 9.9 Representative time domain (left) and spectrogram (right) plots for 1 GB write operations for the Samsung, Western Digital, Crucial, and Optane drives, respectively (top to bottom). (Values smaller than -115 dB in the spectrogram plots are thresholded to -115 dB to facilitate visualization by improving image contrast.). . 181 9.10 Representative time domain (left) and spectrogram (right) plots for the idle state for the Samsung, Western Digital, Crucial, and Optane drives, respectively (top to bottom). (Values smaller than -115 dB in the spectrogram plots are thresholded to -115 dB to facilitate visualization by improving image contrast.). . . . . . . . . 182 9.11 Classification results: Drive/Operation/File size . . . . . . . . . . . . . . . . . . 186 9.12 Full classification results for the individual test data sets: 2022 Drive 1 data and 2022 Drive 2 data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.13 Classification by operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.14 Classification results by operation for the individual test data sets: 2022 Drive 1 data and 2022 Drive 2 data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.15 Classification by drive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 9.16 Classification results by by drive model for the individual test data sets: 2022 Drive 1 data and 2022 Drive 2 data. . . . . . . . . . . . . . . . . . . . . . . . . 191 9.17 Classification results by drive model and operation for both 2022 Drive 1 data and 2022 Drive 2 data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 9.18 Classification results by drive model and operation for the individual test data sets: 2022 Drive 1 data and 2022 Drive 2 data. . . . . . . . . . . . . . . . . . . . 193 9.19 Collection apparatus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 9.20 Current Signatures of idle, 7zip encryption, and live ransomware samples (Darkside, Sodinokibi, WannaCry). Data is recorded for 150 seconds, with encryption and Ransomware triggered manually at the 30 second mark. . . . . . . . . . . . . . . 194 9.21 1 second section of Current Signatures of idle, 7zip encryption, and live ransomware samples (Darkside, Sodinokibi, WannaCry). . . . . . . . . . . . . . . . . . . . . 195 9.22 Frequency Spectrum of idle, 7zip encryption, and live ransomware (Darkside, Sodinokibi, WannaCry) at sampling rate of 200kHz with amplitude truncated after 1 mA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 9.23 1kHz of Frequency Spectrum of idle, 7zip encryption, and live ransomware (Darkside, Sodinokibi, WannaCry) at sampling rate of 200kHz. . . . . . . . . . . . . . . . . 196 9.24 Confusion matrix for 5-way classification of idle, 7zip encryption, and live ransomware (Darkside, Sodinokibi, WannaCry). . . . . . . . . . . . . . . . . . . . . . . . . . 197 9.25 Current for a representative trial for each operation type from trial initiation for 175 seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 xv 9.26 Current for a representative trial for each operation type from operation initiation at 60 second mark for 5 seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . 198 9.27 Data Collection Summary for simultaneous current and Vtune HPC collection for 36-classes, used to fine-tune the feature generation and select well-performing classifiers for further exploration. One ransomware class (WannaCry 1022) was eventually dropped from the final study because it did not perform encryption operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 9.28 Cross Validation Accuracy for non-exhaustive combinations of trail duration, Welch Window size, maximum feature frequency, and dB units, recorded from MATLAB Classification Learning Application. . . . . . . . . . . . . . . . . . . 202 9.29 Top 20 Cross Validation Accuracy results (based on Coarse Tree validation ranking) for non-exhaustive combinations of trail duration, Welch Window size, maximum feature frequency, and dB units, recorded from MATLAB Classification Learning Application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 9.30 Current, Full 100 kHz Frequency Spectrogram, and Cutoff 300 Hz Frequency Spectrogram for benign operations: OS only, 7zip, SPEC01 7zip, SPEC02 octave, SPEC03 Blender. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 9.31 Current, Full 100 kHz Frequency Spectrogram, and Cutoff 300 Hz Frequency Spectrogram for benign operations: SPEC04 CalculiX, SPEC05 Convolution, SPEC06 FFTW, SPEC07 fsi, SPEC08 handbrake. . . . . . . . . . . . . . . . . . 206 9.32 Current, Full 100 kHz Frequency Spectrogram, and Cutoff 300 Hz Frequency Spectrogram for benign operations: SPEC09 Kirchhoff, SPEC10 lammps, SPEC11 luxrender, SPEC12 namd, SPEC13 WPCcfd. . . . . . . . . . . . . . . . . . . . . 207 9.33 Current, Full 100 kHz Frequency Spectrogram, and Cutoff 300 Hz Frequency Spectrogram for benign operations: SPEC14 poisson, SPEC15 python36, SPEC16 rodinaLifeSci, SPEC17 rodiniaCFD, SPEC18 srmp. . . . . . . . . . . . . . . . . 208 9.34 Current, Full 100 kHz Frequency Spectrogram, and Cutoff 300 Hz Frequency Spectrogram for ransomware operations: Babuk 1022, DarkSide Spr21, DarkSide 0521, Gibberish 0422, HiddenTear 0422. . . . . . . . . . . . . . . . . . . . . . . 209 9.35 Current, Full 100 kHz Frequency Spectrogram, and Cutoff 300 Hz Frequency Spectrogram for ransomware operations: Phobos 0122, Phobos 0522, Phobos 0922, Snatch 0422, Snatch 0521. . . . . . . . . . . . . . . . . . . . . . . . . . . 210 9.36 Current, Full 100 kHz Frequency Spectrogram, and Cutoff 300 Hz Frequency Spectrogram for ransomware operations: REvilSodinokibi Spr21, Sodinokibi 0722, Sodinokibi 0222, WannaCry Spr22, WannaCry 0622. . . . . . . . . . . . . 211 9.37 Cross Validation Accuracy for all combinations of trial duration, Welch Window Size, and Cutoff Frequency Power Spectral Density features. . . . . . . . . . . . 213 9.38 Cross Validation Accuracy for all combinations of trial duration, Welch Window Size, and Cutoff Frequency Power Spectral Density features in dB. . . . . . . . . 214 9.39 Cross Validation MCC for all combinations of trial duration, Welch Window Size, and Cutoff Frequency Power Spectral Density features. . . . . . . . . . . . . . . 215 9.40 Cross Validation MCC for all combinations of trial duration, Welch Window Size, and Cutoff Frequency Power Spectral Density features in dB. . . . . . . . . . . . 216 xvi 9.41 Percent of Total Correct Predictions for all combinations of Welch Window and Cutoff Frequency for 1-3s, 5s, 7s, 10s, and 15s for benign operations, with PSD conversion in dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 9.42 Test Data Classification Accuracy for all combinations of trial duration, Welch Window Size, and Cutoff Frequency Power Spectral Density features. . . . . . . 219 9.43 Test Data Classification Accuracy for all combinations of trial duration, Welch Window Size, and Cutoff Frequency Power Spectral Density features in dB. . . . 220 9.44 Test Data Classification MCC for all combinations of trial duration, Welch Window Size, and Cutoff Frequency Power Spectral Density features. . . . . . . . . . . . 221 9.45 Test Data Classification MCC for all combinations of trial duration, Welch Window Size, and Cutoff Frequency Power Spectral Density features in dB. . . . . . . . . 222 9.46 Top 3 results for each trial segment length for eleven different durations (ranging from 1-15 seconds) tested. Results are ranked from low-to-high segment length and then by high-to-low MCC. . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 xvii List of Abbreviations 3DXP 3D Cross Point AES Advanced Encryption Standard BIOS Basic Input/Output System BPI Blind Power Identification CER Cross Entropy Ratio CLI Command Line Interface CPU Central Processing Unit CSV Cache Side-channel Vulnerability DSS Digital Signature Standard DVS Dynamic Voltage & Frequency Scaling EA Electronic Attack EDR Endpoint Detection & Response EM Electromagnetic eMMC embedded Multi-Media Card EMS Electromagnetic Spectrum EMSC Electromagnetic Side-Channel EP Electronic Protection ES Electronic Warfare Support EW Electronic Warfare FFT Fast Fourier Transform FLIR Forward Looking Infrared FPGA Field Programmable Gate Array GE Guessing Entropy GPU Graphics Processing Unit HAC Holistic Assessment Criterion HMD Hardware-level Malware Detection HPC Hardware Performance Counter xviii IC Integrated Circuit IDQ Instruction Decode Queue IFL Image for Linux ILR Information Leakage Rate IO Input Output IP Intellectual Property IR Infrared KNN K-Nearest Neighbor LED Light-Emitting Diode LPF Low-Pass Filter MASC Microarchitectural Side-Channel MCC Matthews’ Correlation Coefficient MI Mutual Information ML Machine Learning MOE Measure of Effectiveness NETD Noise Equivalent Temperature Difference NVMe Non-Volatile Memory express OHM Open Hardware Monitor OS Operating System PCA Principal Component Analysis PCIe Peripherial Component Interconnect express PCTA Progressive Correlation Thermal Analysis PI Perceived Information PSC Power Side-Channel PSCA Power Side-Channel Attack QLC Quad Level Cell RaaS Ransomware-as-a-Service RAT Resource Allocation Table RMS Root Mean Square RO Ring Oscillator RSA Rivest Shamir Adleman SATA Serial AT (or Advanced Technology) Attachment SAVAT Signal AVailable to the ATtacker SBC Single-Board Computer xix SCA Side-Channel Attack SCP Secure Copy SCAN Side-Channel ANalysis SCD Side-Channel Defense SQ Super Queue SR Success Rate SSD Solid State Drive SSH Secure SHell STSF Spatial Thermal Side-channel Factor SVF Side-channel Vulnerability Factor SVM Support Vector Machine TSC Temperature Side-Channel TSCA Temperature Side-Channel Attack TSMP Thermal-Security-in-Multi-Processors TVLA Test Vector Leakage Assessment UOPS Micro-operations USB Universal Serial Bus xx Chapter 1: Introduction Computers exude information, by design. They have become fully integrated into our lives - from word processing to gaming, streaming music and movies, big data crunching, and accessing unlimited information through the internet with the click of a button, often in the palm of your hand - because we have come to depend on the myriad services computer systems provide. Much of the information that is obtained from computers is intentional - documents, movies, and games are displayed to monitors, music streams through headphones and speakers, and network packets are transmitted wirelessly to and from routers, all originating from the device and sent through the proper channel to an intended receiver. The physical implementation of computer hardware that provides these services leads necessarily to physical behavior on the part of an operating computer. This physical behavior has physical characteristics, many of which become channels of information leakage that can be observed by an unintended receiver. This can pose a serious threat to computer security. 1.1 A Brief History of Side-channels These “side-channels” of computer operations, such as current usage and power consumption, generation of heat and electromagnetic radiation, and events at the micro-architectural level, can be exploited to compromise the confidentiality of a system. The identification of side-channels 1 as avenues to gather and exploit valuable information is relatively new - it was 1996 when Paul Kocher used differences in the timing of computer performance optimizations (a function of the chosen micro-architectural implementation) to find the entire secret key of asymmetric encryption algorithms such as Diffie-Hellman, Rivest Shamir Adleman (RSA), and Digitial Signature Standard (DSS) [5], followed closely by analyzing the power consumed during cryptographic operations to extract keys from dozens of products [6]. In the 2000s the National Security Agency declassified the TEMPEST program [7], and electromagnetic signal leakage was identified as another viable side-channel [8]. In the mere 20 years that followed, a number of additional side-channels have been commonly accepted: Acoustic, optical, temperature, memory/cache, and micro-architectural; and advantages, disadvantages, and methods of exploiting and securing each are active research areas. The related nomenclature, however, is highly inconsistent. Side-channel attacks are classified a number of ways : Active vs. Passive, Invasive vs. Non-invasive [9], Simple vs. Differential [10], Profiled vs. Non-Profiled [11], but the terms are often used in overlapping contexts. Some works refer to Side-channel Analysis and Constructive Side-channel Analysis while others refer to the same techniques as passive side-channel attacks and side-channel defense, respectively. Meanwhile, what actually constitutes a side-channel is also ambiguous - with multiple names used interchangeably for the same leakage vector, sometimes compounded by conflating the name of the side-channel with an attack method. “Mutual information,” generally considered statistical measure of the amount of information shared between two random variables, is at times considered a metric [12, 13] , a side-channel [14], an analysis method [15, 16], and an attack method [17]. There are as many as six different terms used for the similar side-channel methods related to micro-architecture/memory/cache/access/timing/transient execution, with no 2 clear guideline or definition to distinguish between them. The highly publicized Spectre [18] and Meltdown [19] attacks, released in 2018, exemplify this issue. In a fairly new, rapidly evolving field, every novel attack that is identified sends researchers scrambling for a solution. It’s not at all surprising that the field is disorganized and terminology is inconsistent; the focus is always reacting to the next threat. For the purposes of this work, side-channels will organized and discussed in the framework of Electronic Warfare, as discussed in Section 1.1.1, below. 1.1.1 Considering Side-channels in the framework of Electronic Warfare Electronic Warfare (EW) is the term given to military operations to control the electromagnetic spectrum (EMS), or the range of frequencies of electromagnetic radiation. Electromagnetic waves consists of perpendicular electric and magnetic fields, as shown in Figure, and the EMS is organized by frequency (and corresponding wavelength), seen in Figure 1.2. Figure 1.1: Electromagnetic wave, reprinted from [2]. EW focuses on maintaining friendly control of the EMS while denying its use to adversaries, and is further divided into 3 categories: Electronic Attack (EA), Electronic Warfare Support (ES), and Electronic Protection (EP) [20]. EW has been used in a military capacity for over a hundred years. The earliest documented attempt to use EW in a military capacity was during the Russo-Japanese War in 1905, when a Russian Captain requested (but was denied) permission to transmit a signal to interfere with a Japanese plane that had spotted the Russian Fleet in the 3 Figure 1.2: Electromagnetic Spectrum, adapted from [3]. Tsushima Straight, which ended in a crushing Russian defeat that ultimately ended the war in Japan’s favor [21]. It has been widely used to gain an operational advantage since World War II, when both the Allies and Axis powers employed EW against the navigational systems of bomber aircraft, among other uses [22]. In the century since EW was first introduced, a time-tested framework has been developed to define and organize the EW field. EW focuses on maintaining friendly control of the EMS while denying its use to adversaries, and is further divided into 3 categories: Electronic Attack, Electronic Support, and Electronic Protection [20]. • Electronic Attack (EA) - EA is the use of the EMS or directed energy to attack an adversary, either by preventing their access to the EMS or by preventing the adversary from denying friendly access to the spectrum [23]. • Electronic Warfare Support (ES) - ES is the use of the EMS for information gathering, specifically for the purpose of threat recognition, planning, or supporting future operations [23]. • Electronic Protection (EP) - EP refers to actions taken to protect personnel, facilities, and equipment from any effects the use of the EMS for EA or ES [23]. 4 Although side-channels are quite recent in comparison, similarities exist. EW is focused on actions involving the range of electromagnetic radiation in the EMS with EA indicating “jamming.” Side-channels, in comparison, center on information leakage from the physical behavior of computer hardware, where attacks primarily constitute obtaining secret encryption keys or privileged system access. This work will consider side-channels in the framework of EW, where the distinct characteristics of the power, acoustic, electromagnetic, temperature, optical, and micro-architectural side-channels are analogous to the divergent properties and propagation methods exhibited by the different frequency bands in the EMS. To organize inconsistent terminology for side-channel methods, actions to (actively) attack side-channels with the goal of cryptanalysis or privileged system access will still be called Side-Channel Attacks (SCA). Use of side-channels for the purposes of information gathering and threat recognition will be termed Side-Channel ANalysis (SCAN), and actions taken to defend side-channels - either through minimizing side- channel leakage or by reducing the relationship between the leakage and sensitive information - will be considered categorized as Side-Channel Defense (SCD). For reference purposes, relationships between EW and SC methods are shown in Figure 1.3. 1.2 Side-channel Analysis to detect Ransomware Side-channel analysis is valuable in determining what kind of unintentional information is leaked from a system, and can be used to detect modifications to a system or validate the expected operation [24–26]. In order to provide scope and a pertinent, real-world appliction for which to evaluate the effectiveness of disparate side-channel methods, this research focuses on side-channel analysis of selected side-channels (power, temperature, and micro-architectural), 5 Figure 1.3: Relationship between Electronic Warfare terminology and Side-Channel terminology. with specific applications to the detection of state-of-the-art, real ransomware running on a live, non-virtualized system. The impact of even the perceived threat of ransomware attacks was brought to the front of national consciousness in May 2021, when the largest US refined fuel pipeline operator, Colonial Pipeline, shut down its fuel distribution system for 5 days in response to a ransomware attack on its administrative system by the DarkSide ransomware group, causing panic buying, price surges, and fuel outages [27]. Myriad reports of attacks on critical infrastructure, school systems, hospitals, government systems, and private companies occur almost weekly; the first human death was attributed to a ransomware attack when an German woman’s ambulance was re-routed from a nearby Dusseldorf hospital after it was attacked by WannaCry ransomware in September 2020 [28]. In November 2022, the U.S. Treasury released that the total cost of ransomware attacks 6 on U.S. financial institutions alone in 2021 increased 200% over the prior year to $1.2 billion dollars [29], and these numbers are estimated to be well below the true figure as it includes only data that was required to be reported by U.S. banks [30]. The global impact of ransomware is so significant that the White House has hosted two International Counter Ransomware Initiative Summits in the past two years [31]. Typical ransomware detection methods depend on up-to-date anti-virus software signatures created from previously discovered versions of ransomware, and are unlikely to prevent brand new “zero-day” ransomware attacks. This work considers side-channel analysis techniques for the temperature, power, and micro-architectural side-channels for the purpose of classifying state- of-the-art ransomware on real world, non-virtualized Windows systems. Over three thousand ransomware and benign trials were collected to generate training and testing data sets, which required development of a process to synchronize collection of on-system (e.g. performance counters) and off-system (e.g. power) measurements, safely transfer trial data from the encrypted system, and restore the system to a “clean” state without the use of virtualization techniques, which negatively impact the validity of side-channel measurements. Side-channels were evaluated on their effectiveness in accurately differentiating between ransomware and benign operations such as background operating system activity, 7zip encryption, and SPEC benchmarks, in a given time duration, with Matthews’ Correlation Coefficient (MCC) used to measure overall classifier performance of five machine learning classification algorithms. With the financial impact of ransomware estimated to cost more than $30 billion globally this year [32], the usefulness of side-channel analysis to detect ransomware in a non-virtualized computer system has significant real-world implications. 7 1.3 Contributions This work is the first hardware-based ransomware classification and detection exploration to leverage side-channel analysis of multiple (power, temperature, micro-architectural) side- channels on hardware without use of virtualization techniques. Each side-channel is evaluated for its usefulness in classifying ransomware based on detection time required to achieve at least 90% classification accuracy. Specifically, the following contributions are insights that came from this work: 1. This work developed a process to collect system-wide data from the power and micro- architectural side-channels simultaneously during ransomware execution without the use of virtualization. 2. The micro-architectural side-channel, accessed by collecting hardware performance counters through Intel’s VTune Profiler, produced test accuracy of 99% in ≤ 1 second. Over 200 hardware events were collected and systematically tested for their ability to differentiate ransomware from benign operations. The top 10 events were ranked by classifier, and best events were found to change based on classifier and time window size. Accuracy results for a 0.1s window were within 10% of best case results for each classifier. 3. The power side-channel, accessed by collecting current supplied to a solid state drive, produced test accuracy of 96% in 15 seconds, and required at least 4 seconds to pass 90%. 4. Temperature side-channel-related research was surveyed and organized, but proof-of-concept efforts to utilize the Temperature Side-channel for ransomware detection indicated it is not 8 a viable path for time sensitive applications without the use of pre-processing, which is outside the scope of this work. 5. Since the terminology used to refer to side-channel topics is inconsistent at best, a preliminary attempt to organize side-channel concepts in the framework of Electronic Warfare is described and utilized in this work. 1.4 Organization The remainder of this work is organized as follows: Chapter 2 provides a literature review and background on side-channel types, methods, and metrics. Chapter 3 considers the less- utilized temperature side-channel in depth, and Chapter 4 provides additional background on the micro-architectural side-channel and use Hardware Performance Counters (HPCs) to obtain micro-architectural side-channel information. Chapter 5 includes background and related literature on side-channel analysis-based methods to detect malware and ransomware, along with additional analysis of all previous works that have investigated the use of HPCs for ransomware detection. Experiments accessing the temperature side-channel through thermal imaging are included in Chapter 6. Chapters 7 and 8 describe the experiments and results leveraging micro-architectural side-channel hardware performance counter event data to classify ransomware operations. Eval- uation of power side-channel-based current-draw analysis and its feasibility for ransomware classification and detection are provided in Chapter 9. This work concludes with Chapter 10, which summarizes results and discusses future research directions to employ side-channel analysis techniques for detection of ransomware on a non-virtualized system. 9 Chapter 2: Side-channels 2.1 Side-channels The physical implementation of computer hardware leads necessarily to physical behavior on the part of an operating computer. This physical behavior has physical characteristics, many of which become channels of information leakage that can be observed by an unintended receiver. Digital computations result in physical effects such as current usage and power consumption, generation of heat and electromagnetic radiation, and events at the micro-architectural level. Although these effects are not intended to convey information about the operations being performed, they create the opportunity to do so through side-channel analysis. A side-channel is a source of observable information leakage through methods other than the intended communications channel. Whereas the communications channel itself is the medium through which information is passed between a transmitter and receiver, a side-channel is a medium through which information may be leaked or observed due to the physical effects caused by the operation or implementation of a digital process, rather than through direct access to the device hardware or software itself. Typical side-channels include power consumption, electromagnetic emissions, acoustic emissions, optical emissions, heat generated, and micro-architectural effects: execution time required and memory resources consumed. These commonly used side-channels are considered below. 10 2.1.1 Power Side-channel The power side-channel is based on the current draw of CMOS devices in a transitory state. As data is processed, the millions of these transitions taking place have a direct impact on a device’s power consumption. Power side-channel analysis infers activity or operations based on measuring the power consumed by a device. The power side-channel has been widely studied for over 20 years and recently summarized in comprehensive survey papers [33, 34]. It allows for multiple types of power and current collection methods to determine the ground truth of the amount of power consumed by a device with high fidelity; however, it is typically invasive to measure and limited to a single device or chip. 2.1.2 Electromagnetic Side-channel The electromagnetic (EM) side-channel is a result of the electromagnetic field generated by the flow of current. It has also been widely studied and recently surveyed [35, 36], is considered the most useful side-channel when power measurements are unavailable, and is particularly helpful when an implementation is resistant to side-channel power analysis [8]. The EM side- channel does not require direct access to a device, and is usually measured with a near-field probe placed in close proximity to the device under test. A newer, related application of the EM side-channel is called “Backscattering,” which is a result of EM signals being reflected and simultaneously combined with effects of the circuit activity at that moment [37, 38]. 11 2.1.3 Acoustic Side-channel A classic form of eavesdropping, the acoustic side-channel is a result of the sound that is created in multiple computation scenarios [39], including determining keystrokes on a keyboard [40–42], finger taps on the touch-screen of a smart phone [43], inferring what is printed on a traditional [44] or 3D [45, 46] printer, and even cryptanalysis resulting in the extraction of encryption keys [47–49]. The most recent application of the acoustic side-channel is the first active acoustic SCA by Cheng et. al. [50], which used inaudible audio signals to enable a smartphone to use SONAR to track human movements and disclose unlock patterns. 2.1.4 Optical Side-channel Most examples of the optical side-channel consider the analysis of the photons emitted when a transistor changes state, which has been applied to AES [51–53] and RSA [54] encryption. The optical side-channel has also been applied to supplement power analysis when using a charge-coupled device light-sensing camera [55], and by using the status of router LEDs to covertly communicate and exfiltrate data [56]. 2.1.5 Temperature Side-channel The speed at which a microprocessor operates, combined with the movement of charge required to change the state of transistors gives rise to heat, or a temperature-based side-channel. An in-depth discussion of the temperature side-channel is provided in Chapter 3. 12 2.1.6 Micro-architectural Side-channel Microarchitecture is the specific design of a microprocessor, and consists of all the digital logic, arithmetic, and data path and control circuits required to implement an instruction set in a given processor. The micro-architectural side-channel leverages the distinct hardware features inherent in the particular microprocessor design, particularly with regards to the time it takes to execute instructions, contention inherent is the sharing of hardware resources, and the functionality of the memory subsystem. In contrast to previous side-channels, this side-channel is software- based and does not necessarily require physical proximity to the target device. The Micro- architectural side-channel consists of both Timing and Memory Access related side-channel information and is considered in detail in Chapter 4. 2.2 Side-channel Attack, Analysis, and Defense Side-channel attacks take advantage of the physically observable characteristics of computer tasks on specific hardware implementations, and were originally considered to be sortable into two orthogonal classifications according to [9]: 1. Active. vs. Passive. Active attacks deliberately tamper with the proper functioning of the device in question. Passive attacks observe the behavior of the device without causing any disruption to the device itself. 2. Invasive. vs. Non-Invasive. Invasive attacks require gaining access to the inside components of a chip, usually by depackaging. Non-invasive attacks exploit information which is externally observable. 13 More recent classification methods consider additional orthogonal axes: 1. Active. vs. Passive. 2. Invasive. vs. Semi-Invasive. vs. Non-Invasive. 3. Simple. vs. Differential. This classification is based on the method used to analyze the side-channel data. A simple analysis generally utilizes a single side-channel trace, where information can be extracted directly from the side-channel observations. A differential analysis, on the other hand, uses many side-channel traces and statistical (or ML) approaches to find the correlation between the side-channel information and the secret data. [10] 4. Non-Profiled. vs. Profiled. A non-profiled attack uses traces measured directly from the target device (e.g. Simple or Differential Power Analysis), while a profiled attack depends on the use of a separate device to construct a profile of the target device, which is then used to attack the target. Template attacks and Deep learning-based attacks are powerful examples of profiled side-channel attacks [11]. Most frequently, active side-channel attacks are used to ascertain the secret key of an encryption algorithm. This is so frequent that the term “side-channel attack” has become synonymous with cryptanalysis. The passive, non-invasive attack described above essentially constitutes information gathering, and is better described as side-channel analysis, allowing for timely insights into the effectiveness of offensive or defensive measures. Side-channel defense is the use of side- channel insights to to defend against side-channel attacks. Defensive measures either aim to eliminate side-channel leakage itself or to eliminate the relationship between the side-channel leakage and sensitive information. 14 2.3 Side-channel Metrics Side-channel attacks pose a considerable threat to the security of a system, so the ability to understand, quantify, and compare the threat vectors posed by side-channel leakage is crucial to understanding the a system’s vulnerabilities. A thorough survey of over 80 technical privacy metrics was released in 2018 [57], which considered all measures that described any level of privacy across six privacy domains: Communi- cations systems, databases, location-based services, smart metering, social networks, and genome privacy, and grouped metrics by the output measures of uncertainty, information gain/loss, data similarity, indistinguishibility, error, time, accuracy/ precision, and the adversary’s probability of success. This survey captured many of the metrics that have been used to quantify privacy, although many did not specifically focus on side-channel leakage. The pertinent metrics from [57], along with a wide variety of other statistical and novel side-channel leakage-based metrics are summarized in this section. 2.3.1 Statistical Tests The earliest work on side-channel metrics looked at the ability of statistics to quantify the immunity of a device to leakage, as minimal leakage provides limited opportunity for the creation of side-channels. Leakage tests were designed to detect general vulnerability to timing and power consumption attacks, as well as correlation tests (Hamming Weight, external parameters). These tests were based on existing statistical tests for randomness (F-test, R-test) and significance (distance of means, goodness of fit, sum of ranks). Failure of any of the designed leakage test indicated probable emission of secret information; however, passing any of these tests does NOT 15 imply a device does not leak information [58]. 2.3.2 Entropy, Conditional Entropy, & Guessing Entropy Entropy is the average amount of information provided by the outcome of a random event, or the measure of uncertainty of a random variable [59]. A decrease in uncertainty corresponds to an increase in information. Entropy as a side-channel metric originated from Shannon’s description of a communications channel and was explored as a desirable measure of predictability in the context of atmospheric science and weather forecasts by [60] and for early side-channel attack modeling by [61]. It is used as the basis of many metrics, including Gu’s Spatial Thermal Side-channel Factor (STSF) in section 2.3.7 [62]. [57] summarized arguments against its use as a privacy metric due to the heavy influence of outlier values, which can make it misleading and difficult to use in comparison, as it only gives a general indication of uncertainty with no insight into how accurate the estimate is. 2.3.2.1 Conditional Entropy Conditional Entropy is a measure of how much information is needed to describe the outcome of an event X given the known outcome of another event Y. In the side-channel context, Y can be considered as the attacker’s observations of the given side-channel of interest. Conditional Entropy directly yields Mutual Information (Section 2.3.3) [12, 57, 61]. 16 2.3.2.2 Guessing Entropy Guessing entropy is a security metric that gives the average number of questions that must be asked to correctly guess a value. In the context of side-channel attacks, it estimates the average number of key candidates that will need to be tested after the attack is complete, thus quantifying the effectiveness of the attack [12, 61, 63]. 2.3.3 Mutual Information Mutual Information is the measure of the amount of information shared between two random variables. In the side-channel context, the variables are usually the true distribution of information and the attacker’s observations of that information, hence measuring the amount of information leakage [12, 57]. [14–17, 64–66] all explore Mutual Information Analysis for side-channel analysis in varying contexts. Mutual Information is also used as the basis of the Side-channel Leakage Evaluator and Analysis Kit in [67] and is related to Success Rate (Section 2.3.4) [13]. 2.3.4 Success Rate Generally speaking, Success Rate measures the probability that the adversary is successful by determining the percentage of successes over a large number of attempts [57]. In the side- channel application, Success Rate indicates the efficiency with the adversary can recover the secret key [12, 13]. [68] explores the application of this metric to deep-learning-based side- channel analysis of imbalanced data and finds that it is difficult to embed in DL algorithms, but closely related to Cross Entropy Ratio (Section 2.3.16.2). 17 2.3.5 Welch’s T-Test Welch’s T-test uses hypothesis testing to determine if two separate distributions with unequal variances have equal means. It is the underlying metric used by the Test Vector Leakage Assessment (TVLA) testing methodology initially developed by [69], which focuses on determining the resistance of a cryptographic module to leakage of information through the power side-channel. [70] thoroughly examines and extends the work of [69], with detailed applications of the T-test in higher-order settings. It is identified by [57] as a privacy metric to measure data similarity, and by [71] with gate-level power measurements in conjunction with electronic design automation tools. [72], however, cautions against the use of Welch’s T-Test as a standalone pass/fail metric in TVLA to assess whether a cryptographic implementation is safe. 2.3.6 Side-channel Vulnerability Factor (SVF) SVF is the novel metric developed in [73–75] that measures the information leakage through a side-channel by determining the correlation between the actual cache activity trace (“oracle”) and the side-channel observations for memory side-channels. 2.3.7 Spatial Thermal Side-channel Factor (STSF) The Entropy-based (Section 2.3.2) STSF was developed in [62] as a two-dimensional metric to complement SVF (Section 2.3.6). STSF accounts for the temperature of function blocks as they correlate to secret information. 18 2.3.8 Cache Side-channel Vulnerability (CSV) CSV constrains SVF (Section 2.3.6) by limiting its application to caches only and assumes the strongest possible attacker in order to remove the ambiguity of differences between system vulnerabilities and attacker capabilities [76]. 2.3.9 Signal Available to the Attacker (SAVAT) SAVAT is an instruction-level metric that measures the signal made available through the side-channel as a result of a single instruction variation, and is determined by directly analyzing the variation between individual processor instructions [77]. 2.3.10 Thermal-Security-in-Multi-Processors (TSMP) TSMP was introduced by [78] as a metric to quantify the security of multiprocessors against a temperature side-channel attack. TSMP ranges from 0 (not secure) to 1 (more secure). 2.3.11 Maximal Leakage Maximal Leakage is defined in [79] as the multiplicative increase in the likelihood of correctly guessing a randomized function of X after observing Y, maximized over all such functions. It quantifies the leakage of information from X to Y, by measuring the difference between an informed guess (after observing Y) and a blind guess. Applications of maximal leakage are further explored in [80–82], and as a way to measure information gained from databases or social networks in [57]. [83] argues for the use of maximal leakage over mutual information or channel capacity in evaluating (timing) side-channels. 19 2.3.11.1 Maximal α-Leakage When refining guesses, maximal α-leakage gives an adversary the ability to fine-tune the level of confidence of additional guesses, allowing for continuous adjustment between mutual information (α = 1, Section 2.3.3) and maximal leakage (α = ∞, Section 2.3.11) [84]. 2.3.12 Information Leakage Rate Information leakage rate is a novel metric introduced in [85] to evaluate the amount of information leaked through the electromagnetic side-channel and compare the quality of EM side-channel measurement systems. 2.3.13 Local Differential Privacy The local differential privacy metric measures how indistinguishable two items of interest are from each other [57], and is considered a useful leakage metric when a system designer is extremely risk averse [81, 82]. 2.3.14 Trust Coverage Trust coverage is a framework that calls for the application of a variable weighted sum of three different coverage metrics (Functional Coverage, Structrual Coverage, Asset Coverage) to quantify the trustworthiness of hardware at the gate-level, as a response to the popularity of hardware trojans [86]. 20 2.3.15 Holistic Assessment Criterion The Holistic Assessment Criterion focuses on assessing the leakage of power side-channels, and is intended to improve on the TVLA’s T-test (Section 2.3.5) by focusing on a null hypothesis built on a well-founded definition of exploitable leakage [87]. 2.3.16 Machine Learning Metrics Recently, several metrics specific to use of machine learning in side channel analysis have been introduced due to discrepancies between the traditional machine learning (ML) metrics of accuracy, precision, and recall and side-channel metrics, with an eye towards designing a metric that better suits ML rather than increasing complexity by attempting to include side-channel metrics in ML algorithms [88]. 2.3.16.1 Perceived Information [89] found that, for balanced data, using the Negative Log Likelihood (aka Cross Entropy) loss function during the training of deep neural nets is the equivalent of maximizing perceived information, which is the lower bound of mutual information (Section 2.3.3) between the observation and the leakage. 2.3.16.2 Cross Entropy Ratio Cross entropy ratio, which is closely related to both Guessing Entropy (Section 2.3.2.2) and Success Rate (Section 2.3.4), is a useful metric to evaluate the performance of deep learning models for side-channel analysis [68]. 21 Chapter 3: The Temperature Side-channel 3.1 Overview There exists an array of literature related to the the use of both offensive and defensive temperature methods to address the security of computer systems via the temperature side-channel. 3.2 Characteristics of the Temperature Side-channel The TSC leakage has a linear relationship to the leakage of the PSC, but with limitations due to the thermal properties of materials. Modeling dynamic temperature changes is accomplished with an RC-equivalent circuit with large (thermal) capacitance [90], and therefore behaves like a low-pass filter (LPF) with a cutoff frequency (fc = 1 2πRC ) in the low kHz range. Since modern processors operate in the GHz range, this low-pass filtering effect attenuates higher-frequency components of system activity, resulting with a delay in temperature compared to power (which changes virtually instantaneously). Temperature changes are slower both to increase and to decrease, which results in previous and current operations being superimposed at the sensor, so leakage has an integrative effect. Advantages and disadvantages of the TSC are listed below [91–94]. Advantages of the TSC include: 1. Linear relationship to power. 22 2. Ease of access through the use of on-chip sensors, with some applications for infrared imaging. 3. Leakage can be measured internally or externally through temperature variations. The disadvantages of the TSC are not insignificant: 1. Low bandwidth due to LPF behavior, which attenuates the leakage of high frequency computations, resulting in slow temperature changes. 2. Noisy - since heat generated is superimposed spatially and temporally. 3. Data collection is limited by the response time and resolution of the thermal sensor. 4. Any temperature offset varies over time, as no mechanism exists to control it directly. Conversely, Dynamic Voltage and Frequency Scaling regulates power. 3.3 Sensing the Thermal Channel Temperature side-channel readings can be obtained through both internal and external means. This section provides an overview of commonly used internal and external sensing options, as well as the context in which these have been used in the literature. The majority of works have used internal on-chip sensors to monitor the temperature side channel, with the Ring Oscillator being a popular internal implementation on FPGAs. External sensors were much less common, with only a small number of works using an external temperature sensor or infrared (IR) camera to capture temperature traces. Across these categories, limited success was found with direct IR image captures, with pre-processing required to adapt temperature readings for analysis through traditional power methods. 23 3.3.1 Internal Temperature Sensing Methods Internal temperature readings are either acquired through the internal temperature sensor of each processor core or, when using re-programmable devices, through the use of a specially designed circuit (ring oscillator). 3.3.1.1 Internal Core Sensor Since maintaining the proper operating temperature range is vital to the performance and lifetime of a chip, cores contain software-readable temperature sensors to monitor temperature and adjust the speed of operations if necessary. These sensors provide an easy source of temperature information if one has access to the sensors, either through physical access to the device or the use of a malicious program to provide sensor information. Resolution of this side-channel source is limited by the number of sensors on the device and their placement, as well as by the frequency of temperature readings [95]. The vast majority of existing literature utilizes on-chip sensors, many of which leverage “HotSpot” modeling software [96] to simulate sensor temperatures for a variety of applications. These applications include covert communications [97, 98], physical temperature attacks [99], thermal modeling [62, 93, 100, 101], defense against TSCAs [102] and detection of hardware trojans [103,104]. Experimental results using on-chip sensors are limited to applications in covert communications [105, 106], and TSCA defense [78, 107, 108]. 24 3.3.1.2 Ring Oscillator A ring oscillator is a circuit that consists of and odd number of inverters in a feedback loop. When the device is enabled, the signal through the inverters oscillates and generates heat. Each inverter contributes to the delay of the signal, which decreases the frequency of the oscillating signal. Ring oscillators can also be used to detect temperature changes by correlating the output frequency drift to degrees Celsius or Fahrenheit. A number of works have explored the ring oscillator as a temperature sensor on FPGAs [109–117], and are useful in that context due to their ease of implementation, their dynamic nature (they can be added, moved, and removed as desired), their ability to measure junction temperature (in lieu of package temperature, like other on-chip sensors), and that they can be placed as desired (e.g. in an array in order to obtain a thermal map of the die). In the context of temperature side-channels they are used almost exclusively for covert communications applications [112, 118–120]. 3.3.2 External Temperature Sensing Methods External methods to sense temperature include infrared imaging, temperature sensors, and occasionally, fan speeds. Each method is described below. 3.3.2.1 Infrared Camera Hot objects radiate electromagnetic infrared (IR) frequencies and the intensity of the radiation depends on the object’s temperature. A thermal camera uses an array of infrared sensors to measure this electromagnetic radiation and then converts the measurement to an image showing the temperature of different objects as colors. There are several factors that determine the quality 25 of a thermal image, but the most pertinent to this work are resolution, temperature range, and thermal sensitivity. Resolution. The resolution of the sensor is the number of sensitive elements (“pixels”) that make up the sensor. Since infrared radiation has longer wavelengths than visible light, sensitive elements of infrared cameras are larger than those of traditional cameras, so thermal cameras have fewer pixels and lower resolution overall. Thermal cameras with more pixels produce higher- quality images. Temperature Range. The temperature range of a given camera is the range of temperatures from the lowest to highest that the camera’s thermal sensor (microbolometer) is capable of measuring. Some cameras have multiple temperature range options, which need to be chosen based on the temperature of the object being measured. Thermal Sensitivity. Thermal sensitivity, also called Noise Equivalent Temperature Difference (NETD), is the smallest temperature difference a microbolometer can detect in the presence of electronic circuit noise. Lower numbers indicate the ease of distinguishing subtle temperature variations. Several works propose that this would be an “ideal” way to sense temperature because it creates an air-gap between the monitoring equipment and the system being monitored (thus more secure), and does not require access to the on-chip sensors. Additionally, IR imaging has the potential to provide better resolution than a single individual on-chip sensor, as well as two- dimensional thermal/spatial correlation information. This method also eliminates monitoring overhead on the system, which is highly advantageous in resource and power constrained devices [101, 104, 107]. To date, there has only been limited success in applying infrared imaging to the temperature 26 side-channel, and primarily in processing data for analysis using power methods. Cochran et. al. presented a “thermal-to-power-inversion” method to estimate spatial power using a thermal camera collecting emissions from the back of a silicon die and compensated for challenges associated with the spatial LPF effect of heat diffusion [91]. Meanwhile, Reda et. al. solved the “Blind Power Identification” (BPI) problem, which uses only the chip’s total power measurements and (internal or external) thermal sensor measurements to find the thermal model and fine-grain power consumption of the chip, without requiring knowledge of the thermal power model to identify sources [100]. Reda’s BPI technique was evaluated using simulation software, real multi-core embedded sensors, and an IR camera, but the camera application was limited to a test chip with a 10x10 grid of microheaters for proof-of-concept validation. Werner et. al. created a thermal modeling framework for accelerator-rich architectures, which estimates the power consumption profile IC by solving the inverse of the heat transfer equation using information gathered from the thermal side-channel, which required images to be pre-processed to account for the spatial LPF effect due to thermal diffusion [101]. 3.3.2.2 External Sensor Although less frequent, some works either placed an external temperature sensor on a depackaged chip [92, 121], while others leveraged sensors embedded in a development board [94, 122]. 27 3.3.2.3 Fan Speeds Fan speeds, which have a strong correlation to board temperature, were occasionally considered as an acoustic and thermal side-channel. [107] leveraged the CPU fan’s acoustic emissions in combination with CPU core temperature for a more robust embedded system monitoring capability. From a convert communications standpoint, [118] posited that since the fan’s angular speed is software readable, it could be suitable to indicate temperature, and [105] considered the fan speed and its impact on implementing a bi-directional communications channel. 3.4 Temperature Attacks The temperature side-channel is an ongoing area of investigation and is increasingly being considered as a vector for many types of temperature attacks; however, not all temperature attacks are side-channel attacks. “Temperature