ABSTRACT Title of Dissertation: MEASUREMENT, SIMULATION, AND COMPACT MODELING OF COMPLEX ELECTRON DYNAMICS Liam Alexander Pocher Doctor of Philosophy, 2025 Dissertation Directed by: Professor Daniel P. Lathrop Department of Physics & Professor Patrick G. O’Shea Department of Electrical and Computer Engineering Systems containing large numbers of electrons can exhibit surprisingly complex and rich dynamics. In this dissertation, we ask: What is the minimum necessary detail in measurement or data-driven modeling and simulation to capture complex dynamics manifesting from these systems? To answer this, we integrate experiment, simulation, and theory to understand their complex dynamics. In this dissertation, we examine two such systems: (i) a superparamagnetic tunnel junction (SMTJ) and (ii) charged-particle beam dynamics. We first consider the deterministic-stochastic behavior of a constant-current driven SMTJ, where we create a measurement-driven, overdamped Langevin model capturing statistical prop- erties of the device. We show both how this model captures device statistics across time scales and how it can be refined to capture higher-order behavior. We next examine the centroid motion of a charged-particle beam and propose a method for understanding and predicting it using an interpretable, data-driven approach whose output is directly identifiable to terms in underlying low-dimensional evolution equations. We derive the evolution equations solely on the basis of data—with no recourse to an underlying first principles model. We compare and contrast our methodology with both a machine learning technique and a first principles model, and we show that we can learn interpretable equations for nonlinear beam dynamics at lower computational cost while achieving comparable accuracy. Lastly, we investigate the phase space evolution of a charged-particle beam. Accurate knowledge of the phase space at beam creation is crucial for understanding and predicting beam dynamics. We measure a velocity space modulation to initialize the phase space of first principle simulations and capture beam statistics and internal beam structure—resembling a cruciform— with high fidelity. This contrasts with both employed first principles models—which do not account for beam structure as they assume a uniform beam cross section—and simulations using ideal phase space distributions. Finally, we demonstrate sensitivity to beam and lattice parameters varied within experimental measurement error. MEASUREMENT, SIMULATION, AND COMPACT MODELING OF COMPLEX ELECTRON DYNAMICS by Liam Alexander Pocher Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2025 Advisory Committee: Professor Daniel P. Lathrop, (Chair/Co-Advisor) Professor Patrick G. O’Shea, (Co-Advisor ) Professor Thomas M. Antonsen Jr. Doctor Irving Haber Professor Christopher Jarzynski (Dean’s Representative) Doctor Advait Madhavan © Copyright by Liam Alexander Pocher 2025 Acknowledgments I owe gratitude to all those who made this dissertation possible and because of whom I persevered through my graduate studies It is impossible to thank everyone, and I apologize to those I’ve inadvertently excluded. The first person I would like to thank is the chair of my committee and co-advisor, Daniel Lathrop, who always supported me through thick and thin on many research projects, the ups and downs of funding situations, challenges both professional and personal, and who introduced me my other co-advisor Patrick O’Shea. Just like Dan, Pat has been steadfast in his support, enabling me to collaborate with the broader accelerator physics community, provided guidance on technical writing and presentation, and always offered excellent professional advice. In addition to my two co-advisors, I would also like to thank the rest of my committee members for agreeing to serve on my committee and each contributed to this dissertation. Ad- vait Madhavan and Irving Haber in particular have both dedicated significant time and effort to mentoring me during graduate school tenure. Advait provided the amazing opportunity to work on an international project at NIST, a project which became the textbook example of an excellent research collaboration. Irving always pushed me to make models and simulation better agree with experiment, and use one or the other sniff out errors in both! Thomas Antonsen encouraged me to publish, provided insight and concrete suggestions on how to make it so, and persevered through the referee process. Chris Jarzynski provided feedback on my work involving stochastic ii processes, and helped steer that work into a fruitful research direction. Other staff and scientists at the University of Maryland (UMD) and the National Institute of Standards and Technology (NIST) that were indispensable in my research: Matthew Daniels for excellent research discussion, writing feedback, and by being a exemplary physicist; Mark Stiles for clear, insightful explanation of complex technical concepts, amazing presentation critique, and being a role-model theorist; Brian Beaudoin for his skill as technologist and freely spending his hard-pressed-for time in the laboratory to push me over the finish line; Dan Abell for precise editing; Dorothea Brosius for this dissertation’s LATEXtemplate; the rest of the administrative staff of the Institute for Research in Electronics and Applied Physics (IREAP); and both Santiago Bernal and David Sutter for their setup of my final experiment. A boon to any student is the environment created by fellow students who also are pursuing the same goal of earning a PhD. I would like to say thank to all these students I have had the privilege to work with and/or be supported by. In no particular order they are: John Rose, Eduardo Lozano, Heidi Komkov, Ian DesJardin, Ella Vicente, Sidra Gibeault, Elaine Jaross, Alex Pick- Alaus, and Shiyi (Shelley) Wang. A significant reason I pursued graduate school and came to the University of Maryland has to do with prior mentors before graduate school at the Colorado School of Mines (CSM) and Los Alamos National Laboratory (LANL). Each in their own way caused me to learn something more about how to become a better scientist and person. Special thanks is due Jonathan Mace, who supported me as a young physicist since my undergraduate studies. Travis Peery is another individual I would especially like to thank, as he along with Jonathan enabled me to successfully apply to graduate school and be admitted to my desired program, provided advice for years, and always was ready to talk physics over coffee (and iii long emails). Other mentors of mine from LANL I feel necessary to acknowledge (and thank) are Dan Borovina, Michael Murphy, and Nathaniel Morgan. They each contributed in some way to shaping my view on how to be either good leader and scientist. My mentors at Colorado School of Mines Nils Tilton and Charles (Chip) Durfee introduced me to research, sparking my curiosity to be a scientist, and enabling me to establish connections with scientists at LANL. They also each introduced me to the University of Maryland, and specif- ically the Chaos program, which set me on the path to UMD years before I applied. Three hobbies that helped me through graduate school was coffee, yoga, and rock climbing. Having ready, and often too tempting, access to amazing coffee is all a graduate student can ask for. The other past time that helped me focus was practicing yoga. The last hobby was rock- climbing; it regularly provided physical and emotional focusing and release. Another round of special thanks is due my parents. They are always supportive, and always gave me more good reasons to keep pursuing this degree. This work would not have been possible without funding from the US DOE-HEP grant DE-SC0022009, the National Science Foundation via NSF grant number:CCF2121957, and by the Agence Nationale de la Recherche StochNet Project ANR-21-CE94-0002-01. iv Table of Contents Acknowledgements ii Table of Contents v List of Tables viii List of Figures ix List of Abbreviations xi Chapter 1: Introduction 1 1.1 Examples of Complex Electron Dynamics . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Disseration Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 An Overview of Deterministic Dynamics . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 The Lorenz Attractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 An Overview of Stochastic Dynamics . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Mathematical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.2 Pseudorandom number generators . . . . . . . . . . . . . . . . . . . . . 12 1.4 Associated Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.2 Conference Proceedings and Talks . . . . . . . . . . . . . . . . . . . . . 14 Chapter 2: Measurement-driven Langevin modeling of superparamagnetic tunnel junc- tions 16 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Model Construction Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3 Time Trace Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 One-dimensional Langevin model . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.6 Kramers-Moyal Coefficient Visualization . . . . . . . . . . . . . . . . . . . . . 41 2.7 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.8 Other parameterizations of D̃2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.9 Characteristic dwell times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Chapter 3: Data-Driven Discovery of Beam Centroid Dynamics 52 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 v 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2.1 Accelerator Lattice and First Principles Model . . . . . . . . . . . . . . . 55 3.2.2 Proposing Data-Driven, Interpretable Models . . . . . . . . . . . . . . . 60 3.3 SINDy Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3.1 Fit 1 - Periodic Forcing . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.2 Fit 2 - SHO and Periodic Forcing . . . . . . . . . . . . . . . . . . . . . . 66 3.3.3 Fit 3 - SHO, Periodic Forcing, and Nonlinear Interaction . . . . . . . . . 68 3.3.4 Prediction Ability Case Study . . . . . . . . . . . . . . . . . . . . . . . 68 3.4 Comparisons Between Centroid Model, SINDy Fit, and Simulation Data . . . . . 72 3.5 Beam Moment Evolution Equations . . . . . . . . . . . . . . . . . . . . . . . . 75 3.6 Testing for a Necessary but not Sufficient Chaos Condition . . . . . . . . . . . . 80 3.7 Fourier Transforms Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Chapter 4: Investigating Beam Dynamics with Measurement-Driven Initialization 92 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.2.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3 Beam Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.3.1 Axisymmetric Envelope Model . . . . . . . . . . . . . . . . . . . . . . . 101 4.3.2 Single-Particle Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.4 First Principles Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.4.1 Accept-Reject Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.4.2 Comparing Experiment to Simulations . . . . . . . . . . . . . . . . . . . 116 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Chapter 5: Conclusion 129 Appendix A: Tables of SINDy Model Coefficients 136 Appendix B: Dipole Parameters for UMER Lattice Simulation 139 Appendix C: Solenoid Magnetic Fields 141 C.1 Magnetic Field Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 C.2 Solenoid Magnetic Field Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 143 C.3 Remotely Measuring the Solenoid Field with Larmor Angle Rotation . . . . . . . 143 Appendix D: Configuration Space at the Slit Plane 149 D.1 Axisymmetric Envelope Equation Solution to the Slit Plane . . . . . . . . . . . . 149 D.2 Comparison to Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 D.3 Relating Between Statistical Beam Quantities . . . . . . . . . . . . . . . . . . . 158 Appendix E: Direct Velocity Space Measurement 160 E.1 Pinhole Scan Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 E.2 Pinhole Centroid Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 E.3 Measuring Velocity Space Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . 164 vi Bibliography 166 vii List of Tables 3.1 UMER lattice parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 Power spectra peaks from Warp data . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1 Experimental Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2 Simulation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.1 Coefficients for the various dxc/dz models . . . . . . . . . . . . . . . . . . . . . 137 A.2 Coefficients for the various dyc/dz models . . . . . . . . . . . . . . . . . . . . . 138 B.1 UMER dipole field strengths and locations . . . . . . . . . . . . . . . . . . . . . 140 D.1 Location and descriptions of items in the experimental lattice as shown in Fig. 4.1. Appendix C details the transfer functions of the solenoids relating applied current to maximum axial magnetic field along the beam line Bz(r = 0, z). . . . . . . . . 150 E.1 Maximum measured radial divergence r′max and statistical beam size in vleocity space from Fig. E.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 viii List of Figures 1.1 Integration of Experiment, Simulation, and Theory Cartoon . . . . . . . . . . . . 3 1.2 Lorenz 1963 Equation Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Matlab PRNG comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Visual outline of the 1D Langevin model generation process . . . . . . . . . . . 23 2.2 Typical bi-stable SMTJ voltage fluctuations . . . . . . . . . . . . . . . . . . . . 25 2.3 Histogram of SMTJ state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4 Fit to the effective energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.5 Time-delayed drift coefficient and diffusion coefficients in experiment and simu- lation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.6 Dwell time distributions extracted from experiment and simulation . . . . . . . . 40 2.7 Kramers-Moyal Coefficients Distribution and Voltage Cuts . . . . . . . . . . . . 41 2.8 Voltage transition probability Pij(Φ) as a function of voltage . . . . . . . . . . . 43 2.9 Voltage transition probabilities for simulation and experiment measured at differ- ent subsampling rates in (P↓↑ ,P↑↑) space . . . . . . . . . . . . . . . . . . . . . . 44 2.10 Experimental and Langevin simulation results for the time-delayed, drift and dif- fusion coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.11 Dwell time distributions between experiment and simulation . . . . . . . . . . . 46 2.12 Mean dwell times of experiment and simulation as a function of sampling time . 47 3.1 UMER diagram and parametric centroid trajectory over 1 lattice turn . . . . . . . 56 3.2 Hard-edge quadrupole and dipole profile of the UMER lattice . . . . . . . . . . . 59 3.3 Regularized power spectra for centroid trajectory over one turn . . . . . . . . . . 63 3.4 SINDy centroid trajectory training results . . . . . . . . . . . . . . . . . . . . . 65 3.5 SINDy centroid power spectra training results . . . . . . . . . . . . . . . . . . . 69 3.6 Different SINDy optimization hyperparameter results for the the centroid trajec- tory over three lattice turns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.7 Different SINDy optimization hyperparameter error fraction results . . . . . . . . 70 3.8 Horizontal centroid power spectra comparison between Warp simulation data and first principle centroid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.9 Parametric plot comparing Warp data, first principles modeling, and SINDy’s result 73 3.10 Two dimensional histogram for centroid trajectory over 100 turns for Warp data, first principles model, and SINDy fit . . . . . . . . . . . . . . . . . . . . . . . . 74 3.11 Power spectra for the horiztonal centroid for Warp data, first principles model, and SINDy fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.12 Smoothed horiztonal centroid trajectory difference . . . . . . . . . . . . . . . . . 81 3.13 Exponential slope of horizontal centroid trajectory difference . . . . . . . . . . . 82 ix 3.14 Matrix plot of two dimensional histograms for centroid trajectory over various length scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.15 Unregularized power spectra for the solution to sample differential equations . . . 85 3.16 Labeled, regularized power spectrum of horizontal centroid in UMER . . . . . . 87 4.1 Beamline Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.2 Experimental Beamline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.3 Blackbody Cathode Light with and without Filter . . . . . . . . . . . . . . . . . 98 4.4 Experimental Moment Data with Raw Images . . . . . . . . . . . . . . . . . . . 99 4.5 Beam Envelope Comparison between Experiment, Simulation, and Modeling . . 103 4.6 Envelope Comparison between Experiment, Model, and Simulation . . . . . . . 105 4.7 Dipole Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.8 Single-Particle Current Fractions and Trajectories Comparison . . . . . . . . . . 110 4.9 Single-Particle Final Position as Function Initial Conditions . . . . . . . . . . . . 111 4.10 Smoothed, Experimental Pinhole Data Compared to Accept-Reject Sampling Result116 4.11 Accept-Reject Sampling Histograms Using Pinhole Scan Data . . . . . . . . . . 117 4.11 Experiment to Simulation Comparison for PH Results . . . . . . . . . . . . . . . 119 4.11 Experiment to Simulation Comparison for Varying Simulation Hyperparameters . 124 4.12 Phase Space from Simulation Compared to Experiment . . . . . . . . . . . . . . 126 C.1 Axial Profiles of Solenoid M1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 C.2 Br(r, z) and Bz(r, z) Profiles for Solenoid M1 . . . . . . . . . . . . . . . . . . . 144 C.3 Pinhole Beam Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 C.4 Measured versus Theoretical Larmor Angle Rotations for Short Solenoid M1 . . . 147 C.5 Measured versus Theoretical Larmor Angle Rotations for Short Solenoid M2 . . 148 D.1 Beamline Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 D.2 Beam size at the slit plane z = 43.54 cm as a function of solenoid current I1 and I2 151 D.3 Beam-edge slope sensitivity to changes in applied solenoid current along the waist-line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 D.4 Dimensionless change in beam edge radius R for different envelope model pa- rameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 D.5 Change in beam edge radius slope R′ for different envelope model parameters . . 154 D.6 Experimental images shown in (I1, I2) parameter space . . . . . . . . . . . . . . 157 D.7 Model Beam Size Validation to Experiment . . . . . . . . . . . . . . . . . . . . 158 E.1 Pinhole Scan Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 E.2 Pinhole Scan Data at the Gatevalve and Boxscreen . . . . . . . . . . . . . . . . . 162 E.3 Pinhole Scan Data at the Box Screen and Gate Valve . . . . . . . . . . . . . . . 163 E.4 Pinhole Scan Data Absolute Centroid Motion and Relative Divergence . . . . . . 164 x List of Abbreviations CERN Conseil Européen pour la Recherche Nucléaire CSM Colorado School of Mines IREAP Institute for Research in Electronics and Applied Physics KM Kramers-Moyal KV Kapchinskij-Vladimirskij LANL Los Alamos National Laboratory LHC Large Hadron Collider ND N-dimensional NIST National Institute of Standards and Technology NP non-deterministic polynomial-time (computational problem) ODE Ordinary Differential Equation P polynomial-time (computational problem) PCB Printed Circuit Board PH Pinhole PRNG Pseudo-Random Number Generator SG Semi-Gaussian SDE Stochastic Differential Equation SINDy Sparse Identification of Nonlinear Dynamics SMTJ Superparamagnetic Tunnel Junction UMD University of Maryland rms root mean square xi Chapter 1: Introduction This chapter provides a brief overview of the history of the study of deterministic and stochastic processes, and ties together the—at first glance—disparate chapters of this disserta- tion. In summary, this dissertation integrates experimental, numerical, and theoretical physics to answer the question: What is the minimum necessary detail in measurement or data-driven modeling and simulation to capture complex electron dynamics? 1.1 Examples of Complex Electron Dynamics Science is the study of the natural world. Many phenomena across great scales in space and time exhibit behavior manifesting from complex processes: stars gravitationally interacting in the galactic morass, electrons pinging off molecules in semiconductors, hurricanes brewed in the Gulf of Mexico, etc. These are examples of deterministic-stochastic processes whose complex dynamics are best understood through the interdisciplinary interweaving of theoretical, numerical, and experimental physics. Examples of deterministic and stochastic process are chaos and noise, respectively. In this dissertation, we take chaos as the result of deterministic, rule-based behavior whose dy- namics are exponentially sensitive—on the length scale of the underlying attractor—to perturba- tions/differences in initial conditions naturally occurring (e.g. measurement error from experi- 1 ment or numerical round off in computation). Noise, on the other hand, while potentially arising from high-dimensional dynamics (e.g. Brownian motion of particles in a background fluid), can- not be described with deterministic rules, but rather with underlying statistical moments of the distribution producing said noise. The complex electron dynamics studied in this dissertation come from the seemingly dis- parate fields of probabilistic computing and accelerator physics and exhibit dynamics including chaos and noise. While these fields are separated by a gulf of application, they yet share the com- mon thread of being systems containing large number of electrons. The systems of electrons that we examine are (i) a superparamagnetic tunnel junction (SMTJ) that uses not just electron charge but also electron spin combined with material properties to thermally induce bi-stable fluctuations constituting random bit streams for Ising machines, and (ii) an electron beam in either an elec- tron ring or linear accelerator. We ask: What is the minimum necessary detail necessary in either measurement or data-driven modeling and simulation to capture complex dynamics? Auxiliary questions include: What are the limits of these models? Why include (exclude) certain terms? How can they be integrated with the other disciplines of experiment, simulation, or theory and can we attribute correct reasons for successes or limitations? In this dissertation, we integrated experiment, simulation, and theory to understand the complex dynamics manifesting from these systems. Figure 1.1 shows a cartoon of how each discipline of experimental, computational, and theoretical physics is necessary to create a foundation for understanding complex dynamics. 2 Ex pe rim en t Sim ulation Theory Figure 1.1: A cartoon of a sitting-stool (a sample body of scientific work) fully supported by the interweaving of experimental, numerical, and theoretical physics. 1.1.1 Disseration Outline In Chapter 2, we pose these questions to the creation of a measurement-driven model for the mixed deterministic-stochastic behavior of the constant-current driven SMTJ. We use an em- pirical measurement to seed a 1D Langevin model—a simple stochastic differential equation— with both short-time and long-time statistics and show how that model breaks down while still capturing macroscopic statistics and outperforms first principles modeling for accuracy and first principles simulations in computational expense. We assume a 1D Langevin model and not a higher order one as the simplest equation to model a system with random forcing. We then progress in Chapter 3 to using virtual data from a well-validated and -verified simulation to motivate an interpretable, data-driven model using an algorithm from the nonlinear dynamics community called Sparse Identification of Nonlinear Dynamics (SINDy). This model is a coupled system of first order ordinary differential equations which used the underlying data to populate physically motivated terms in the model to simulate environments where first principles modeling is inadequate. After we generated the data-driven model, we compared it to an accurate, first principles model and the simulation and note the various discrepancies and reasons for them. We evolve our approach in Chapter 4 to not just use empirical data to motivate terms in 3 stochastic/ordinary differential equations, but an entire phase space distribution. We integrate all three disciplines of experimental, numerical, and theoretical physics to ascertain model, ex- perimental, and simulation limitations. We initialize the phase space of an electron beam using experimental measurements and compare data for the beam’s configuration space dynamics1 to both simulation and theory. The complex structural evolution—resembling a cruciform—in the experiment is reconstructed in the simulation because of the accurate initialization of the beam’s initial phase space. This contrasts with both first principles models—which do not account for beam structure as they assume a uniform beam cross section—and simulations using ideal veloc- ity distributions. The initial velocity space of a space-charge dominated beam has a large impact on its internal structural evolution, which this study has shown to be highly sensitive to small variations in beam parameters within measurement error. Finally, in Chapter 5 we summarize the conclusions of each chapter, and propose steps for further research. 1.2 An Overview of Deterministic Dynamics Humanity has studied the heavens since time immemorial. This study has produced: cre- ation myths; ancient timekeeping across the world in the form of the Egyptian, Mayan, Chinese, etc. calendars; and, not the least, celestial mechanics. The predominant Ptolemaic (geocentric) view of the world was challenged in the Renaissance by scientists such as Galileo, who cham- pioned the Copernican (heliocentric) model of the solar system predicated upon observational evidence in the solar system. 1Configuration space (x, y) is but one 2D projection of the 6 unique 2D projection of the total 4D transverse phase space. 4 Fifty years after Galileo, Newton applied rigorous science to the study of the two gravita- tional bodies in the mid 17th century. In the late 19th century, Poincaré expanded this analysis to the three body problem (the sun, earth, and moon), but couldn’t find analytic solutions to the underlying equations of motion [1]. Instead, he recoursed himself to a geometric approach. Thus, the study of chaos was born, if it was not yet named as such. Poincaré’s approach to the three- body problem produced the eponymous Poincaré recurrence theorem, which states that a system state would return arbitrarily close to its starting position after a sufficiently long time. Further work in the burgeoning field of chaos and nonlinear dynamics included a technique generated at the turn of the 20th century: the analysis of stability of dynamical systems, exem- plified by Lyapunov’s disseration published in 1892 [2]. More foundational work came from Birkhoff who discovered in 1931 ergodic theorem [3], which became a foundation of modern statistical mechanics. Other work in the field of nonlinear dynamics and chaos included fundamental work by Russian scientists such as Kolmogorov [4], Arnold [5], and Moser [6] which culminated in the Kolmorgorov-Anrold-Moser (KAM) theorem. The KAM theorem describes how invariant tori corresponding to integrals (constants) of motion are robust to small perturbations. Advances in computing also accelerated study of nonlinear problems, prime examples of such being the Fermi-Pasta-Ulam-Tsinguo (FPUT) problem [7] and Lorenz’s study of convection in the atmosphere [8]. The FPUT problem was motivated by studying thermalization in nonlinear dynamical systems, specifically a series of nonlinear harmonic oscillators. The paper’s authors expected a rapid thermalization of system energy, yet they observed a Poincaré recurrence of the initial condition’s energy distribution among the system’s normal modes. This sparked a revolution of interest in nonlinear oscillators continuing to this day. 5 1.2.1 The Lorenz Attractor The Lorenz system is a the quintessential example of chaos; it was the motivation of the phrase “the butterfly effect” [9]. The differential system Lorenz derived is d dt x = σ (y − x) (1.1a) d dt y = x (ρ− z)− y (1.1b) d dt z = xy − βz (1.1c) with x, y, and z being coefficients of low-order modes of the stream function and the temperature profile, σ being the Prandtl number, ρ being a modified Rayleigh number, and β being a geometric factor associated with the characteristic period of assumed convection cells. Equations (1.1a) to (1.1c) are equations describing the colloquially referred to “butterfly” attractor [9], as seen in Fig. 1.2 10 0 10 20 20 10 0 10 20 10 0 10 20 0 10 20 30 40 20 10 0 10 20 0 10 20 30 40 (a) (b) (c) Figure 1.2: Lorenz equations solution. Panel (a) is the x−y projection of the total [x(t), y(t), z(t)] solution, panel (b) is the x−z projection, and panel (c) is the y−z projection. The butterfly shape is readily visible in pane (b), while the characteristic double lobes of the attractor are visible in each panel. 6 Deterministic dynamics may be described with point particles with the (non-relativistic) Newton’s second law F = m~̈x = m~a , (1.2) or an entire ensemble through Liouville’s equation (conservative, non conservative systems) ∂ ∂t ρ+ n∑ i ( ∂ρ ∂qi dqi dt + ∂ρ ∂pi dpi dt ) = 0 , (1.3) where each qi is the coordinate of the i’th particle and each pi is the i’th particle’s momenta associated with coordinate qi. Equations (1.2) and (1.3) 1.3 An Overview of Stochastic Dynamics Random, or seeming so, motion has captured imagination for millennia. The poem “De Rerum Natura” written in the first Century BC by the Roman poet Lucretius [10] discusses the apparent random motion of dust in sunlight. Lines 114 to 140 of Book II from the Rouse transla- tion of “De Rerum Natura”: ... apply your scrutiny whenever the sun’s rays are let in and pour their light through a dark room: you will see many minute particles mingling in many ways throughout the void in the light itself of the rays ... give attention to these bodies which are seen to be in turmoil within the sun’s rays, because such turmoil indicates that there are secret and unseen motions also hidden in matter... then the bodies that form a small combination and, as one may say, are atoms, nearest to the powers of the first- beginnings, are set moving, driven by the unseen blows of these, while they in their 7 turn attack those that are a little larger. Thus the movement ascends from the first- beginnings and by successive degrees emerges upon our senses, so that those bodies also are moved which we are able to perceive in the sun’s light, yet it does not openly appear by what blows they are made to do so. While the motion Lucretius describes is not random, but rather arises from the turbulent motion of air, it does bear resemblance to the first recorded stochastic dynamics observed by Robert Brown [11] and quantified by Louis Bachelier [12], Brownian motion (i.e. a Wiener process). The early field of stochastic dynamics was theoretically developed further by by Einstein [13], Smoluchowski [14] Ornstein and Uhlenbeck [15], and experimentally by Perrin [16] (winning Perrin the 1926 Nobel Prize in Physics for ”for his work on the discontinuous structure of matter, and especially for his discovery of sedimentation equilibrium”.) Later developments included the work of Ito who developed the calculus of stochastic dy- namics which may be applied the general equations governing stochastic dynamic called stochas- tic differential equations (SDEs). A generic form for a first order SDE is dx = µ(x, t)dt+ σ(x, t)dW , (1.4) where x is the dependent (state) variable (e.g. the position of a Brownian particle) µ(x, t) is a drift term which can depend on both the state x and time t and dW represents a diffusion term with zero expectation value and unit standard deviation. Typically, this diffusion term is taken to be white noise with 〈W (x, t)W (t′)〉t = 2δ(t′ − t) and 〈W (t)〉t = 0. The prefactor term σ(x, t) scales the amplitude of the noise. It is typically assumed that at every time t that the state of the system does not depend on the state any time before. In the limit of no drift, µ(x, t) = 0, 8 Eq. (1.4) becomes a Wiener process reproducing Brownian motion. If the drift term is linear in x, then Eq. (1.4) becomes an Ornstein-Uhlenbeck process commonly used in mathematical finance or as the model for a particle in a linear harmonic potential U(x) ∼ x2 with some random driving force. Treating the motion of all microscopic variables becomes intractable as the number of par- ticles increases; typically, macroscopic variables are the desired quantities. The process through which microscopic detail is washed out has been extensively studied for decades [17–20]. Ran- domness is required to obtain the important macroscopic variables (voltage or particle position in fluid) from first principles models of short time scale fluctuating processes (e.g. magnetic spins in semiconductors, thermal particle velocities). This process has usually been referred to as “Stosszahlansatz” or molecular chaos and is fundamental reason for irreversible processes [21]. This is intimately linked with Loschmidt’s paradox, also referred to as the “Arrow of Time”. 1.3.1 Mathematical Modeling Two equations commonly used to describe either the trajectory or an ensemble of trajecto- ries (i.e. a probability distribution) are the Langevin equation and the Fokker-Planck equation, respectively [21–24]. The equation of motion for single particle undergoing some random force is the Langevin equation [25], analogous to Newton’s second law with a stochastic force [21] dx dt = f(x, t)︸ ︷︷ ︸ drift + g(x, t)ηx︸ ︷︷ ︸ diffusion , (1.5) where similar to the generic equation for an SDE, Eq. (1.4), we have expressions for the drift (now attached to a deterministic term) and the diffusion term (stochastic/random/noise) term. 9 The Langevin equation is commonly used to describe the motion of a particle in a potential well with thermal forces or the circuit equation [22, 23]. To describe not just the motion of a single particle interacting with a noisy (or high- dimensional) environment, technologist can use the Fokker-Planck equation to describe an en- semble of particle (or probability distribution) [22]. The Fokker-Planck equation is derived from the master equation via the Kramers-Moyal expansion [26, 27] 2. In one spatial dimension, the Kramers-Moyal expansion is ∂ρ ∂t = ∞∑ n=1 ( − ∂ ∂x )n [Dn(x, t)ρ(x, t)] . (1.6) where: x is the usual spatial(state) variable; t the time; ρ(x, t) the probability density; and Dn the n’th order Kramers-Moyal coefficients. The Fokker-Planck equation in in one spatial dimension is ∂ρ ∂t = ∂ ∂x [ −D1(x)ρ(x, t) + ∂ ∂x D2(x)ρ(x, t) ] , (1.7) with D1(x, t) the drift, and D2(x, t) the diffusion which are the first and second order Kramers- Moyal coefficients. There are no third or higher order derivatives as that results in unphysical negative probability densities. Only in the infinite limit of derivative expansions does one recover nonnegative probability densities3. If the quantity within the square brackets is interpreted as a probability current jρ, then Eq. (1.7) can be rewritten as a continuity equation for the probability 2The Kramers-Moyal expansion is similar to a Taylor series expansion for a function, though here this expansion is applied to the evolution of a probabilistic distribution. 3This infinite limit of derivatives is reminiscent of the BBGKY hierarchy [28] but the limit of taking infinite approximations is where the similarities end. The Kramers-Moyal expansion (which is what this process is called for the Fokker-Planck equation) is an expression for successive approximations to the probability density ρ, whereas the BBGKY hierarchy is for successive higher order of approximation for N -body interactions predicated upon the Boltzmann equation. 10 density ρ ∂ ∂t ρ+ ∂ ∂x jρ = 0 The Fokker-Planck and Langevin equations are related via their drift and diffusion terms. Within this dissertation, we interpret the Langevin equation, Eq. (1.5), in the Stratonovich sense [29]. In the Stratonovich interpretation, the Langevin equation is related to the Fokker-Planck equa- tion’s, Eq. (1.7), Kramers-Moyal coefficients by f(x) = D1(x)−D′2(x)/2 and g(x) = √ D2(x). The second term on the right-hand side of f(x) comes from gradients in the diffusion which give rise to an effective drift term independent of the usual drift coefficient D1(x). A system with multiplicative noise—where D′2(x) 6= 0—exhibits mixing between the stochastic terms and the deterministic part; a nonzero diffusion gradient introduces a stochastic drift term. A system with additive noise—D′2(x) = 0—is simpler to treat and models systems with noise terms that do not depend on system state. The stationary distribution ρ0(x) to the Fokker-Planck equation can be found by setting ∂ρ0/∂t = 0. We then may write D1ρ0 = d dx (D2ρ0) , with the ′ = d/dx and all expressions assumed to be function of x. Expanding the right hand side we have D1ρ0 = D′2ρ0 +D2ρ ′ 0 , which may be rearranged to D1 −D′2 D2 = ρ′0 ρ0 . The above expression has two quantities y′/y that can be written as d/dxln[y(x)]. Substituting 11 that into the above equation we have D1 D2 − d dx ln(D2) = d dx ln(ρ0) , which may be integrated with a dummy variable x′ from −∞ to x. After integrating and solving for ρ0, we obtain the stationary distribution solely in terms of the drift and diffusion coefficients ρ0(x) = C D2(x) exp [∫ x −∞ dy D1(y) D2(y) ] , with C a normalization constant. 1.3.2 Pseudorandom number generators Physical stochastic processes are not readily accessible for simulation. To overcome this, pseudo-random number generators (PRNG) are used to produce numbers that behave as if they were stochastic/random. However, these PRNGs are not truly random but rather are periodic with a very long period as they follow a deterministic set of rules. PRNGs are used as random on time scales short compared to their period length. There are a variety of pseudo-random number generators readily available in common programming languages. Matlab, for example, has the following algorithms: the Mersenne twister; Single instruction, multiple data (SIMD) Mersenne; Combined Multiple Recursive; Multiplicative Fibonacci; Philox 4x32 generator with 10 rounds; and Threefry 4x64 generator with 20 rounds [30]. Figure 1.3 shows sample trajectories for several selected pseudo-random number generators available in Matlab with a random seed value of 1, with each algorithm producing different 12 Figure 1.3: A comparison between selected PRNGs in Matlab. ones. The commonly used Mersenne twister algorithm—the default used by Python’s NumPy module—has a period length of 219937−1 iterations [31]. The algorithm we use in Chapter 2 was Mathematica’s default extended cellular automaton generator “ExtendedCA”. Typical pseudo- random number generator issues are detailed in Press et at. [32]. 1.4 Associated Publications 1.4.1 Publications Below is a list of (to be) published papers that are linked in some way to the work presented in this dissertation either drawn from published work for the majority of chapter materials, or from papers currently in preparation. Papers 1 and 2 are work related to Chapter 2, paper 3 is work related to Chapter 3, and paper 4 is related to Chapter 4. 13 1. Sidra Gibeault, Temitayo N Adeyeye, Liam A Pocher, Daniel P Lathrop, Matthew W Daniels, Mark D Stiles, Jabez J McClelland, William A Borders, Jason T Ryan, Philippe Talatchian, et al. Programmable electrical coupling between stochastic magnetic tunnel junctions. Physical Review Applied, 21(3):034064, 2024. 2. Liam A Pocher, Temitayo N Adeyeye, Sidra Gibeault, Philippe Talatchian, Ursula Ebels, Daniel P Lathrop, Jabez J McClelland, Mark D Stiles, Advait Madhavan, and Matthew W Daniels. Measurement-driven Langevin modeling of superparamagnetic tunnel junctions. Physical Review Applied, 22(1):014057, 2024. 3. Liam A Pocher, Irving Haber, Thomas M Antonsen Jr, and Patrick G O’Shea. Data-Driven Discovery of Beam Centroid Dynamics. in preparation 4. Liam A Pocher, Shiyi Wang, Kevin Hermstein, Brian L. Beaudoin, Dan T. Abell, Thomas M. Antonsen, Irving Haber, and Patrick G O’Shea. Controlling Beam Dynamics with Measurement-Driven Morphology Modulation. in preparation 1.4.2 Conference Proceedings and Talks Below is a list of conference proceedings and talk that are linked in some way to the work presented in this dissertation. Item’s 1, 3, and 6 are related to the work presented in Chapter 2. Items 2, 4, and 5 are related to the work presented in Chapter 3. 1. Matthew W Daniels, William A Borders, Nitin Prasad, Advait Madhavan, Sidra Gibeault, Temitayo Adeyeye, Liam Pocher, Lei Wan, Michael Tran, Jordan A Katine, et al. Neural networks three ways: unlocking novel computing schemes using magnetic tunnel junction 14 stochasticity. In Spintronics XVI, volume 12656, pages 84–94. SPIE, 2023. 2. L. A. Pocher, T. M. Antonsen, L. Dovlatyan, I. Haber, and P. G. O. Optimizing the Dis- covery of Underlying Nonlinear Beam Dynamics. In Proc. NAPAC’22, North American Particle Accelerator Conference, pages 335–338. JACoW Publishing, Geneva, Switzer- land, 11 2022. 3. Liam Pocher, Sidra Gibeault, Advait Madhavan, Mark Stiles, Matthew Daniels, Nhat- Tan Phan, Philippe Talatchian, Ursula Ebels, Daniel Lathrop, et al. Empirical modeling of superparamagnetic magnetic tunnel junctions with application to probabilistic computing. In APS March Meeting Abstracts, volume 2023, pages T02–005, 2023. 4. L. Pocher, I. Haber, T. Antonsen, and P. O’Shea. Optimizing the discovery of underlying nonlinear beam dynamics and moment evolution. In Proc. IPAC’23, number 14 in Inter- national Particle Accelerator Conference, pages 4503–4506. JACoW Publishing, Geneva, Switzerland, 9 2023. 5. L. Pocher et al. Discovering transient models of emittance growth via mode interaction of phase space nonuniformities. In Proc. IPAC’24, number 15 in IPAC’24 - 15th Inter- national Particle Accelerator Conference, pages 921–924. JACoW Publishing, Geneva, Switzerland, 05 2024. 6. Sidra Gibeault, Temitayo N Adeyeye, Liam A Pocher, Daniel P Lathrop, Matthew W Daniels, Mark D Stiles, Jabez J McClelland, William A Borders, Jason T Ryan, Philippe Talatchian, et al. Hardware implementation of a scalable two-spin Ising computer. In Spin- tronics XVII, page PC1311924. SPIE, 2024. 15 Chapter 2: Measurement-driven Langevin modeling of superparamagnetic tun- nel junctions This chapter is based on work in two published papers [33, 34] and conference proceed- ing [35–37]. In Ref. [34,35], I was the first author and was the prime conceptualizer, investigator, and writer. In Ref. [33, 36, 37], I was an auxiliary paper author and played the role of a collabo- rating scientist in the group. 2.1 Motivation Since the widespread adoption of the computer, algorithms have been used to solve com- putational problems. Developing efficient algorithms to solve these problems is of considerable interest. In the broadest sense, there are two classes of computational problems which are solv- able in polynomial-time (P) or non-deterministic polynomial-time (NP). P class computational problems are readily tractable as the size of the problems increases (e.g. sorting numbers or deter- mining if a number is prime [38]). Conversely, NP class problems can quickly become intractable as problem size increases, as their solution time is often exponential or greater. Examples of NP computational problems include (prime) factorization, the traveling salesman, the max-cut prob- lem, circuit layout design, etc [39–42]. Many NP problems are mathematically equivalent and can be recast as the Ising model [43,44]. For example, the max-cut (i.e. graph partitioning) prob- 16 lem can be reformulated into the Ising problem by encoding graph edges as coupling between spins which then define the energy landscape of the Ising Hamiltonian [45, 46]. To illustrate this, consider a graphGwith vertices V and edgesE with a numberN vertices. The graph partitioning problem is solved by separatingG into two subsets minimizing connecting edges [40]. One way to implement this solution is to transform the graph G into the Ising model formulation with the N vertices becoming N spins σi and edges E between vertices Ni and Nj becoming an interaction term Jij . The Ising Hamiltonian is H(σ) = − ∑ i,j Jijσiσj − µ ∑ j hjσj , (2.1) with H the Hamiltonian (i.e. total system energy), σi the spin (±1), Jij the coupling between spin i and spin j (in the traditional Ising problem, only nearest neighbors interact; this is atypical in graph partitioning problems), µ the magnetic moment of the material, and hj the external magnetic field applied at spin σi. In the case of graph-partitioning, the external field at every node hi is set to zero and the sum over all nodes is taken over the edges E. Equation (2.1) is then H(σ) = − ∑ i,j∈E(G) Jijσiσj . (2.2) The sum over all edges weighted by the mutual interaction of spins i and j in Eq. (2.2) can be decomposed into sums over the spins in the two different sets over sets of either different/same 17 sign H(σ) = ∑ same group Jij − ∑ diff. group Jij and using the relation that the sum over all interaction is the sum of the interaction from both the same and different groups we write H(σ) = ∑ ij Jij − 2 ∑ diff. group Jij , which is an expression for the sum of all the weights plus the sum of the edges between different groups. Depending on the coupling between nodes Jij , the second term on the right hand side could decrease or increase system energy. Finding analytic solutions to this problem quickly becomes intractable for increasing N . An appealing technique to solve this problem even with growing intractability with vertices N increasing is simulated annealing [47,48]. Simulated annealing is an optimization technique that stochastically searches between spin configurations σ = (σ1, ..., σN) to minimize the system energy H(σ). The spin configuration can evolve in time if each spin state is probabilistic. A new spin configuration is accepted if it lowers the overall system energy; if the proposed spin configuration raises system energy, it is accepted according to a probability proportional to system temperature. For systems with high (low) temperature, this acceptance probability is high (low). In the context of the graph-partitioning model, the effective system temperature is the relative strength of the interaction weights (deterministic) to the probabilistic evolution (stochastic) of each spin. As interaction strengths increase, spins are coupled more strongly corresponding to lower system temperature. 18 Pseudo-random number generators (PRNG) can be used to search for these optimized en- ergy configurations, however PRNGs produce periodic and cross-correlated bitstreams, an un- desirable feature for computation [49, 50]. Practical random bitstreams can be generated by devices called superparamagnetic tunnel junctions (SMTJ) in a process called stochastic com- puting [50–52]. Individual SMTJs used to solve NP problems such as graph partitioning with algorithms like simulated annealing are probabilistic bits, or p-bits. A collection of interacting p-bits forms a probabilistic computer. Unlike a conventional Boolean bit, which takes only de- terministic 1 and 0 values, a p-bit uses its room temperature instability to encode a probability value. Probabilistic computing with SMTJs uses the instantaneous state from a number random bit streams to sample a distribution of states encoded to match problems like the Ising model and has been applied to a variety of problems including the traveling-salesman [53], combinatorial optimization [54], and integer factorization [55]. SMTJ technological relevance, and the atten- dant need to design scaled-up circuits that include these devices, has made the task of finding appropriate dynamical models an engineering priority. Currently, three classes of models are used for different levels of granularity in representing SMTJs physics. The simplest is the Néel-Brown model [56–58], a two-state Markov model where the transition rates are exponential functions of the current and field applied to a device. These models fit transition rates of the Markov model to experiment [59], and can sometimes be aligned with the measurable parameters of the system in a physics-driven way [60, 61]. Their discrete state spaces, however, hide any intermediate analog dynamics, and experiments to date indicate that this analog magnetic behavior is visibly non-negligible [59, 62–66]. Néel-Brown models implicitly assume an exponential distribution of the dwell times. The most detailed form of modeling is achieved through micromagnetic simulations. Some 19 micromagnetics packages can explicitly solve the stochastic Landau-Lifshitz-Gilbert (sLLG) equation over large magnetic domains using finite-element methods [67]. The simulation of devices in this manner can lead to physically realistic results, but the required computational timescale renders such models inappropriate for single elements in large circuit simulations. The authors of Ref. [68] show that micromagnetic simulations qualitatively reproduce measured two- level fluctuations but are not able to collect enough statistics with the full simulations to make a robust statistical comparison. The intermediate level between these approaches is that of analog compact modeling. Com- pact models – relatively low-dimensional differential equations that capture the essential analog behaviors of a device without full physical realism – have been developed for many nanode- vices [69–72], including magnetic tunnel junctions [73–75]. When SMTJs are incorporated into integrated circuits, modeling the interaction between the SMTJs and the transistors will require a statistically accurate description of the SMTJ dynamics on the time scale of switching events in both, but that description cannot be significantly slower than the modeling of the integrated circuit. Those restrictions force the development and use of a compact model that accurately describes the transition dynamics. A Markov model, while fast, does not accurately capture the dynamics of the transition, whereas a micromagnetic model would capture that dynamics, but is far too slow. Facing these limitations of simple macrospin models, we propose a data-driven method to capture the dynamics of interest in circuit simulations. We turn to generic overdamped 1D Langevin models for the voltage across the device and fit such models to experimental measure- ments. Similar data-driven approaches have been applied in other fields of physics to learn the underlying physics of stochastic processes through a variety of methods [76–78]. We show that 20 our approach leads to high-fidelity matching of the voltage histograms and the drift and diffusion coefficients of the device between model and experiment while maintaining the analog nature of macrospin approaches. We also show that our model correctly predicts the drift dynamics and dwell-time distributions of the experiment without explicitly encoding these in the simulation, confirming a level of self-consistency between the model and the underlying physics. We organize the remainder of this chapter as follows. We give an overview of the approach we use to fit a time trace measured for an SMTJ in Section 2.2 so that we can reproduce a sta- tistically identical time trace in simulation. In Section 2.3 we describe the different statistical measures of such voltage-time traces that provide reductions of the data to relevant dynamical quantities. Section 2.4 introduces an overdamped, 1D Langevin model and describes the method that we use to fit the experimental data. Section 2.5 compares the Langevin modeling to experi- ment and shows that the model reproduces the experimental data used to determine the model and exhibits self-consistency with the assumed Fokker-Planck dynamics. Section 2.6 shows short- time experimental data used in our generation of the data-driven Langevin model. Section 2.7 compares a metric for inertia/memory of the experimental and simulation data, and shows how the experimental data’s memory changes as a function of the subsampling rate. Section 2.8 de- scribes alternate fitting schemes for the our ansatz of D2, the measured noise term. Section 2.9 describes the extraction of mean dwell times from time traces. Finally, in Sec. 2.10, we discuss possible extensions to our compact modeling approach, orienting future research in ways we view as most useful for supporting the device-circuit-system codesign practices needed to realize large-scale computational systems based on SMTJs. 21 2.2 Model Construction Roadmap Our goal is to develop a data-driven, compact, semi-analytic model that produces dynamics that are statistically identical to those of the voltage-time trace. We refer to the model as data- driven because we use some of the statistical properties of the data to produce the model. In contrast to other, physically motivated models, such as a macrospin approximation, our models are not derived from approximations based on device physics. Not being tied to approximations of the physics can be an advantage, as it allows the model to capture behavior that is neglected by physically-motivated models that may be oversimplified compared to experimental reality. Figure 2.1 shows the modeling flowchart for this chapter. From the experiment with the SMTJ device we obtain a voltage-time trace. The statistics of that data are used to determine our model, a one-dimensional equation of motion (directed, overdamped Brownian motion). This equation of motion describes the state of the system with drift (deterministic) and diffusion (stochastic) terms, with parameterizations that can be calculated directly from the experimen- tal voltage-time trace. We integrate the model forward in time to produce a simulated time trace, do the same statistical analysis that we had performed for the experiment, and compare the ex- perimental and simulated statistics. Figure 2.2 shows a representative segment of a voltage-time trace of our experimental de- vice. The models we use to reproduce this behavior are one-dimensional stochastic differential equations whose drift and diffusion parts are determined by both short-time statistics and long- 22 Experimental device Voltage-time trace Histogram, Diffusion (D2) Histogram, D1, D2, dwell times Dwell times, Drift (D1) Langevin equation Voltage-time trace Fit model D1, D2 to expt. statistics C om pare sim ulation against experim ent ex pe rim en t th eo ry m od el Figure 2.1: Visual outline of the model generation process developed in this chapter. Top row: we start with an experimental device from which a voltage-time trace is extracted under fixed experimental conditions. The histogram and the voltage-dependent diffusion coefficient (second Kramers-Moyal coefficient D2) are extracted numerically from the time trace. The drift coeffi- cient and dwell-time distributions are also extracted for later use. Middle row: We fit drift and diffusion characteristics for our model to the histogram and diffusion characteristics found in experiment, in a way that compensates for the high-frequency cutoff of the experimental data. Bottom row: we then use these fitted drift and diffusion characteristics to simulate a Langevin equation from which we extract a voltage-time trace and its attendant statistics. Far right: the suite of statistics from the model is validated against the experiment. In future work with more complex fits, one may need to introduce a self-consistent fitting procedure that uses observed error between theory and experiment to inform iterative refinements of the model (dashed line). 23 time statistics. The resulting equation is the Langevin equation Φ̇ = f(Φ)︸︷︷︸ drift + g(Φ) ηΦ(t)︸ ︷︷ ︸ diffusion , (2.3) where Φ is the dependent variable (voltage), t the independent variable (time), f(Φ) the function characterizing the drift, g(Φ) the function characterizing the diffusion, and ηΦ(t) a Gaussian white noise term 1 with the properties 〈ηΦ(t)ηΦ(t′)〉t = 2δ(t′ − t) and 〈ηΦ(t)〉t = 0. The drift and diffusion terms of the Langevin equation are related to each other through the voltage-time trace histogram, which gives the long-time average of the stationary state of the system for a fixed current flowing through the SMTJ; see Section 2.4 for further details. The one-dimensional Langevin models are completely determined by fitting the histogram and either the first, n = 1, or second, n = 2, Kramers-Moyal coefficient Dn(Φ) = 1 n! lim τ→0 〈[Φ(t+ τ)− Φ(t)]n〉t τ , (2.4) with τ the sampling time, the angle brackets denoting a time average over the voltage-time trace, and [Φ(t+ τ)− Φ(t)] a τ -delayed difference in voltage. The first and second Kramers-Moyal coefficients encode short-time dynamics into our model. The D1 coefficients capture the drift behavior of the system while D2 captures the diffusive behavior. Fitting the histogram and one of the Kramers-Moyal coefficients then determines the functional coefficients in the Langevin model Eq. (2.3), fully characterizing it and allowing us to perform simulations that mimic the 1We note that in reality the driving noise terms likely possess additional structure beyond mere white noise. However, as we will show in Sec. 2.5, the white noise assumption does very well in reproducing the statistics observed in experiment. Physically, the choice of white noise is the unique choice that ensures the dissipative mechanisms of the system are local in time [79]. 24 0 20 40 60 80 100 5 0 5 t ns m V ] ] ]] ] ] Figure 2.2: Typical bi-stable voltage fluctuations of an SMTJ in our experimental setup, measured at a sampling rate of τexp = 50 ps. Shown is a small subset – a 100 ns window – of the full-time 1 ms trace captured in the experiment. The total number of transitions between the two states in the full voltage-time trace is on the order of 3 × 105. The dashed line indicates the boundary between the parallel and antiparallel wells. That value is chosen from the local minimum in the distribution of measured voltages given in Fig. 2.3. The mean of the raw voltage measurement has been removed. 25 experimental measurement. As will be shown in Sec. 2.5 and discussed in Section 2.10, this data-driven approach captures multiple statistical and dynamical metrics with high fidelity. Specifically, it captures the Kramers-Moyal coefficients, the histogram, the dwell time distributions, and the power spectral density. The agreement for the dwell time distributions is perhaps the most significant validation of the model because the experimental input to the model does not directly encode these time scales. This ability to recover higher-order statistics is necessary so that we can use the model to design scaled-up circuits in engineering applications. 2.3 Time Trace Statistics In this section, we describe in more detail the statistical properties that we use to set up the Langevin model. These properties and others are compared to those of simulations of the resulting model in Section 2.5. The first statistical reduction of the voltage-time trace we use is the histogram of the device state over the entire 1 ms measurement window, see Fig. 2.3. We assume that the system giving rise to the voltage-time trace is in quasiequilibrium, so the histogram primarily depends on the effective energy and entropy of the system at each voltage and less on the short-time dynamics of the system. The binning resolution used for the histogram and other statistical properties described below along the voltage axis impacts the fit quality and the observed characteristics. If the bins are too small, the statistical uncertainty prevents good fits; if they are too large, the details of the dynamics are obscured. The bin size we use for the histogram is a compromise between these factors at approximately 200 µV. Each bin captures approximately 50 quantized 26 Experiment Langevin sim. 5 0 5 10 0.00 0.01 0.02 0.03 0.04 0.05 mV P ro ba bi lit y pe r2 00 V bi n 5 0 5 10 1.0 1.2 1.4 1.6 1.8 2.0 S im . E xp . ] ] ] ] Figure 2.3: Histogram of SMTJ state. The black circles give the experimental resultsfor the entire voltage-time trace of 160 ms, part of which is shown in Fig. 2.2, using bins 200 µV wide. The blue rectangles give the results of a simulation, discussed in Section 2.5 using a data-driven Langevin model. Note the excellent recovery of the experimental histogram’s values in the simulation, demonstrating the capacity of the first-order Langevin model to capture coarse-grained statistics. Statistical error bars are smaller than the symbols. The inset plot shows the ratio between the simulation and experiment histogram values. Note the near unity agreement for the entire do- main save where both histogram’s values approach zero. The local minimum near −1 mV is taken as the boundary between the parallel and antiparallel wells. The mean of the raw voltage measurement has been removed. voltage signals out of the approximately 212 unique levels reported by the oscilloscope. Though the histograms we observed above can (together with an assumption of Boltz- mann statistics) tell us about the effective energy landscape of the system, the detailed short-time dynamics generated by thermally induced magnetic fluctuations – which are by construction sep- arate from the conservative forces on the system – remain hidden. To determine their behaviors, we compute conditional moments 2 of the experimental voltage-time trace. The nth order condi- 2These conditional moments are reminiscent of Kolmogorov’s structure functions in turbulence. However, struc- ture functions are conditional moments that use a spatial separation and spatial ensemble average to obtain a scalar value for fluid velocity, whereas the Kramers-Moyal coefficients here are evaluated from a voltage-time trace (which by the ergodic hypothesis is equivalent to an ensemble average) that integrates solely over the probability distribution. 27 tional moment Mn is Mn(Φ, τ) = 〈[Φ(t+ τ)− Φ(t)]n〉t , (2.5) with the angle brackets denoting a time average over an entire trajectory, and Φ(t + τ) − Φ(t) a τ -delayed difference in the system’s state. We capture these conditional moments with the same bins as we use for the histogram. At very small τ , Eq. (2.5) can be used to approximate the nth order Kramers-Moyal coeffi- cient [22,80,81] of the system. Formally, these Kramers-Moyal coefficients are connected to the conditional moments as Dn(Φ) = 1 n! lim τ→0 Mn(Φ, τ) τ . (2.6) TheD1 andD2 terms, which we refer to as the drift and diffusion terms, are the specific Kramers- Moyal coefficients used in our first-order Langevin model and are the only nonzero Kramers- Moyal coefficients required to describe systems obeying the Fokker-Planck equation [82]. Due to the finite resolution inherent in experimental data, we cannot immediately take the limit as τ → 0 in our calculation of the Kramers-Moyal coefficients. We also find in practice that our experimental timestep was not quite small enough to approximate this limit; subsampling our data to compute the Mn at slightly longer timesteps indicates that we are not in the converged regime, and strongly so forM2. As the τ increases from zero, however, finite-time corrections can be used to relate the conditional, time-delayed moments Mn to the underlying Kramers-Moyal coefficients that are needed to construct an analytic Langevin model [83, 84]. These corrections 28 up to second order in the time delay τ are M1 = τD1 + τ 2 2 (D1D ′ 1 +D2D ′′ 1) +O(τ 3) (2.7) and M2 = 2τD2 + τ 2 ( D2 1 +D1D2 +D2D ′′ 2 + 2D2D ′ 1 ) +O(τ 3), (2.8) with a prime denoting differentiation with respect to Φ, and each Kramers-Moyal coefficient and time-delayed moment understood to be a function of Φ. Equation (2.7) and Eq. (2.8) describe a second-order differential system for the Kramers-Moyal coefficients in terms of calculated M1 and M2; these equations cannot be solved analytically in general. Our approach to extracting the Kramers-Moyal coefficients, which we require to parameterize our Langevin model, is to choose a parameterized ansatz for D2; that is, we choose some functional form with a number of free parameters. Combining this ansatz with an analytic fit to the histogram induces an ansatz on D1 via the stationary solution to the Fokker-Planck equation. We then choose the free parameters (see Eq. (2.12) below) by fitting the right-hand-sides of Eq. (2.8) (using our ansatz on the right- hand sides of Eq. (2.7) and Eq. (2.8)) to the M2 extracted from our experimentally measured voltage-time trace. We could fit both M1 and M2 simultaneously, but finding agreement between M1 and the predictions of the model when only M2 is fit argues for the appropriateness of using the one-dimensional Fokker-Planck equation. 2.4 One-dimensional Langevin model As described in the our published work [34], we note that macrospin models fail to capture qualitative metrics – let alone statistical metrics – associated with the experimental data. Yet models that capture the statistics of experimental devices will be required for high-fidelity circuit 29 simulations of SMTJ devices as their attendant technological applications scale. To address this discrepancy, we introduce a first-order Langevin model inspired by existing works on the model- ing of fluctuating bi-stable processes [85,86]. This approach may not be capable of predicting the behavior of uncharacterized devices, but we anticipate it could be used to characterize the devices from a particular manufacturing process and those fit results could be used for circuit simulations of those devices. We take a data-driven approach to determining the details of the one-dimensional Langevin model. The first step is to find an analytic expression for the experimentally determined proba- bility density ρ0. We require an analytic expression because we will need to take its derivative to infer the deterministic forces in the system. To guarantee a positive-definite fit, we assume a Boltzmann distribution and then fit not to the histogram itself but to a dimensionless effective en- ergyUeff(Φ) defined so that ρ0(Φ) = (1/Z) exp(−Ueff(Φ)), whereZ ensures that ∫ dΦρ0(Φ) = 1. A judicious choice of basis function is required to capture the distribution with high fidelity; we scale our data appropriately and then use the Chebvyshev polynomials of the first kind Tn(Φ) which form an orthonormal, complete basis on [−1, 1] and are each bounded between −1 and 1 over this range 3. Figure 2.4 shows progressively higher order fits to Ueff; for the rest of the work presented in this chapter, we use the n = 20 fit for the effective energy and the stationary distribution. For practical applications the choice of the fit order would be a balance between fidelity to the data and speed of calculation. The effective energy in Fig. 2.4, derived from the stationary distribution for the experi- mental data, exhibits several features that are inconsistent with a macrospin model as shown 3Legrendre polynomials were also considered because of their similar properties, but we chose Chebyshev poly- nomials because they obtained more accurate results with fewer total coefficients. 30 [ [ Figure 2.4: Fit to the effective energy. (a) Residues (normalized l2 norm) between the Chebyshev polynomial fits to the data as a function of even order n and (b) sample fits—n = 4, 10, 20— to the experimental effective energy Ueff. Error bars on experimental data indicate statistical uncertainties due to the number of counts per histogram bin; however, these uncertainties are smaller than the symbols. The mean of the raw voltage measurement has been removed. 31 in Ref. [34]. These include asymmetry for the interior/exterior sides of the P/AP states, dis- parate P/AP well widths, and exterior boundaries that are significantly rounded compared to the macrospin results. We believe that these discrepancies may be related to a common cause: the way the macrospin model neglects many degrees of freedom that may play an important role in the dynamics. The asymmetry between the parallel and antiparallel alignments could be due to asymmetry between the configurations due to the non-uniform fringing fields from the fixed layer and synthetic antiferromagnet. The tails in the distribution could have several explanations. At a basic level, it could be that the states with the minimum and maximum resistances are not the lowest energy states. One may also note that for an open system (the SMTJ) in contact with a thermal reservoir it is the free energy, rather than internal energy, that is the relevant quantity [87] for making thermodynamic predictions like the probability distributions. The measured distribu- tion integrates over the unmeasured degrees of freedom (those that do not lie along the fixed layer magnetization) so, even if the strictly parallel and strictly antiparallel states were energy minima, slightly higher energy states may be significantly more prolific so that the peaks in the marginal distrubtion along the voltage, Φ, may be shifted from the energy minima. The desire to capture these features of the histograms and other statistical measures are what motivates us to develop a compact model beyond the macrospin approach. As we have elaborated on above, our approach is to assume a 1D overdamped Langevin equation driven by white noise. The simplicity of this model makes our task tractable, and in fact, we will show in Section 2.5 that the model gives very good agreement with experiment. To determine the parameters for the Langevin model, we start with the Fokker-Plank equa- 32 tion, which describes the evolution of the probability density ρ(Φ, t) as ∂ρ ∂t = ∂ ∂Φ [ −D1(Φ)ρ(Φ, t) + ∂ ∂Φ D2(Φ)ρ(Φ, t) ] , (2.9) with Φ a generic state variable (voltage, in the present work), t the time, D1(Φ, t) the drift, and D2(Φ, t) the diffusion. The terms D1(Φ, t) and D2(Φ, t) are precisely the first- and second-order Kramers-Moyal coefficients from Eq. (2.6). The Kramers-Moyal coefficients may be functions of time in general, but the measured probability distribution ρ0 is stationary on the timescale of the experiment and thus we also assume Dn(Φ, t) are not functions of time. In the steady-state limit ∂ρ/∂t = 0 we have D1(Φ) = D′2(Φ)−D2(Φ) dUeff(Φ) dΦ , (2.10) so that D1 is uniquely determined given the diffusion coefficient and the steady-state distribution of the system. In passing from Eq. (2.9) to Eq. (2.10), we used the fact that the right-hand side of Eq. (2.9) is the divergence of the probability current J . In the steady state limit, ∂ΦJ = 0, and thus the stationary current J0 is constant. For the system to be physically bounded, the constant value of J0 must in fact be zero, which allows us to uniquely solve for D1(Φ) as a function of D2(Φ) and ρ0(Φ). In a system with additive noise – that is, D′2(Φ) = 0 – the drift term would be determined through the derivative of the effective energy, in which case D1 coincides with a typical conser- vative force, up to prefactors. A system with multiplicative noise – where D′2(Φ) 6= 0 – exhibits mixing between the stochastic terms and the deterministic part; here the nonzero diffusion gradi- 33 ent introduces a stochastic drift term. The Langevin equation derived from the Fokker-Plank equation is Φ̇ = f(Φ) + g(Φ) ηΦ(t) (2.11) with Φ the voltage state, f(Φ) the deterministic drift term, g(Φ) the stochastic diffusion term, and ηΦ is white noise with 〈ηΦ(t)ηΦ(t′)〉t = 2δ(t′− t) and 〈ηΦ(t)〉t = 0. We interpret this equation in the Stratonovich sense [29], as one would for the typical construction of the sLLG equation. In the Stratonovich interpretation, the Langevin equation is related to the Fokker-Planck equation’s Kramers-Moyal coefficients by f(Φ) = D1(Φ) − D′2(Φ)/2 and g(Φ) = √ D2(Φ). The second term on the right-hand side of f(Φ) comes from gradients in the diffusion which give rise to an effective drift term independent of the usual drift coefficient D1(Φ). The first step in determining a Langevin model that can reproduce aspects of the experi- mental voltage-time trace in Fig. 2.2 is to compute the conditional moments in Eq. (2.5). These moments are shown in Fig. 2.5. With these experimental conditional moments, we determine the Kramers-Moyal coefficients by making an ansatz D̃2 for the diffusion part of the Fokker-Planck equation, which when combined with the stationary distribution is used to obtain an induced form for D1 via Eq. (2.10). Measurements of M2, shown in Fig. 2.5, inform our choice for D̃2. The observation that the data show larger diffusion in the AP well than in the P well leads us to propose D̃2(Φ;µ) = mΦ + b, (2.12) with µ = (m, b) the fit parameters. In Section 2.8 we show that fitting with a constant D̃2 34 also provides an adequate fit, and introducing more fitting parameters can give an even better fit. To ensure that D2 is positive definite we take D2 = λ log(1 + exp(D̃2)/λ), where we take λ = 1 V2/s, which does not significantly affect D2 > 0. Combining this ansatz for D2 with the analytical expressions for D1 and the stationary distribution ρ0 completely specifies the model. In the next section, we use this model to run stochastic Langevin equation simulations, that is, numerical integrations of Eq. (2.11). We fit the parameters of the ansatz in Eq. (2.12) following the procedure outlined in Sec- tion 2.4. The fit parameters in Eq. (2.12) are determined by first computing an analytic form for D1 from the ansatz for D2, the fit to the histogram, and Eq. (2.10). Then an analytic form for M2 is determined from Eq. (2.8). The parameters of this last expression are adjusted to fit the experimental data. The agreement between the model and the experimental data is shown in Fig. 2.5. The error in the drift and diffusion coefficients were calculated from the standard deviation σi/ √ Ni of the underlying δΦn(Φ) distribution as a function for the binned Φi levels shown in the figure, where σ is the standard deviation and Ni is the number of counts within the ith Φi level. The binning level used for Φi here was approximately 295 µV to have exactly 64 total bins over the sample. The maximum error shown in the plots is observed on the exterior well boundaries where the number ofNi counts is on the order of hundreds of data points, instead of millions nearer the well’s interior. Both drift and diffusion coefficients (that is, the induced ansatzes for Mn), agree well with the conditional moments extracted from experiment except at the extreme voltage values where the statistical certainty is poor. Figure 2.5 shows the functional forms determined for the model. The model is constructed so that the M2/2τ is fit to the experimental data; the consequent agree- ment between M1/τ and the experimental data speaks to the appropriateness of the underlying 35 ] ] ] ] ] ] Figure 2.5: Time-delayed drift coefficient (a) and diffusion coefficient (b), showing experiment (black circles), Langevin simulation (blue squares), computed at a sampling time τsamp = 50 ps equivalent to the experimental measurement time, and the analytic, fitted results that were inserted into the model, Dn (solid green line) and Mn/nτ (dashed red line), where n = 1 for the drift term and n = 2 for the diffusion term. Error bars on the experimental indicate single standard deviation uncertainties in the mean. They are smaller than the plot symbols except at the extreme voltages. Fokker-Planck model as a description of this system. 2.5 Results Our ultimate goal is to develop a stochastic differential equation of motion for SMTJs for use in circuit simulators. Though our current model is restricted to a particular biasing condition, we can still explore how well our model matches the device. We simulate the Langevin equation (Eq. (2.11)) and compare simulation results both to the experimental data used to fit the model and to selected statistical metrics of the experimental data. 36 After obtaining the Kramers-Moyal coefficients D1 and D2, we insert them into Eq. (2.11). We set the integration timestep to a fifth of the experimental sampling rate, dt = τexp/5 = 10 ps. Integration is performed using the Euler-Maruyama method. Simulation histogram results are compared to the experimental histogram in Fig. 2.3, confirming that the first-order, data-driven Langevin model captures this aspect of the experiment well. The histogram displays asymmetri- cal energy-well probabilities and widths, greater-than-exponential decay for the probability den- sity at exterior data boundaries, and a large energy barrier between the two well peaks. We assert that the reproduction of statistical features such as these is key to developing an engineering- appropriate model for the device. For purely analog circuits in particular—such as those pro- posed in Refs. [37] or [88]—correct modeling of these distributions may be crucial for capturing emergent statistics of scaled-up circuitry based on superparamagnetic tunnel junctions. Comparing the conditional moment extracted from experiment and simulation demon- strates that the first-order Langevin model gives dynamics similar to those seen in experiment. Figure 2.5 confirms excellent agreement for M1(Φ, τ) and M2(Φ, τ) calculated for a time delay equal to the experimental sampling time (τsamp = τexp = 50 ps). The statistics that have received perhaps the most attention in the literature are the mean dwell times of the parallel and antiparallel states. These are captured directly by construction in Néel-Brown models, and macrospin models are generally fit to ensure these dwell times are accurate [75]. In circuits and systems where the SMTJ state is binarized into a two-level tele- graph signal, the mean dwell times together with the assumption that the switching events are independent of the past history – the Markov property – completely determine the circuit output. With a voltage threshold chosen to demark the boundary between the P and AP states, the dwell times imbue the data we have already captured in the histogram with a characteristic time scale. 37 Crucially, this timescale information is carried into the model through our fitted ansatz on D2. For the purposes of this work we consider a transition between the P and AP states to occur when the voltage crosses the barrier location determined by the local maximum of the energy landscape in Fig. 2.4. We define a dwell time as the amount of time the system spends in the positive or negative half of state space before crossing that ∂ΦUeff = 0 threshold into the other half of state space. The main panels in Fig. 2.6 show simulation and experimental dwell times for a sampling rate of τsamp = τexp on log-log and semi-log plots for the two distinct P and AP states. These figures show that the simulation captures the full distribution of dwell times for both states. This agreement for the dwell time distributions is perhaps the most significant validation of the model because the experimental input to the model does not directly encode these time scales. The model uses just the short-time dynamics captured by the Kramers-Moyal coefficients and the stationary limit of the time-averaged time series in the form of the histogram. One unusual feature of both experimental and simulated distributions is the crossover from the familiar exponential behavior to ∼ t−3/2 power-law behavior at short times. We attribute this behavior to fluctuations around the threshold we defined between the states. Recall that we chose the threshold between states to correspond to the local maximum of the effective energy landscape; linearizing about this threshold therefore reveals a locally flat energy landscape where the dynamics are driven entirely by stochastic fluctuations. In other words, the dynamics in this small neighborhood around the threshold are given by a random walk. It is well-known that the dwell time (return time) distribution of a random walk is proportional to √ τ/t3, where τ is the timestep of the walk [89]. As τ decreases, more and more probability mass accumulates at smaller and smaller t, dominating the dwell time distribution. This suggests that the transition 38 between the power-law and exponentially distributed dwell times is controlled by the curvature of the energy barrier: the flatter the energy near the threshold, the longer the system can behave as an unbiased random walk. In the physical SMTJ system, we expect some physical mechanism to impose an ultraviolet cutoff—but whatever this cutoff may be clearly resides at a higher frequency than our experiment can access. Since this power-law behavior is simply a property of stochastic dynamics around a threshold, we expect it to be found in other stochastic magnetic systems as well. In our published work [34], we found that the behavior is also reproduced in the macrospin model, but note that it is impossible to capture in a simple Néel-Brown model, which is based on a two-state Markov model that assumes exponential distributions of dwell times; such a system has no notion of dynamics in the barrier, which is the essential cause of the power-law effect. Traditionally, mean dwell times are extracted from experimental dwell time histograms by examining the slope of the exponential dwell time distributions on a semi-log scale. Because of the power-law behavior at small times in our data, these inverse slopes—which we will call characteristic dwell times—are clearly very different than the literal mean dwell times. Yet, merely isolating the slope of the curve where it does appear exponential in Fig. 2.6 does not lead to a physically invariant characteristic dwell time; this slope can change significantly under minor subsampling of the signal, because this changes the probability-dominating power-law part. We elaborate more on the topic of characteristic dwell times in Section 2.9. Finally, we compare the power spectral densities of the experimental signal to those pro- duced by the model. Accurate power spectra are crucial from an electrical engineering context. Unlike the diffusion characteristics and histograms, power spectra are known to exhibit the cor- rect qualitative (Lorentzian) form in macrospin models [67] and in Néel-Brown models [90]. 39 (a) (b) ] ] ] ] ]] Figure 2.6: Dwell time distributions as extracted from experiment (black) and simulation (blue) in (a) a log-log plot (a) and (b) a semi-log plot. The log-log plot uses a sampling time bin size of 100 ps, while the semi-log plot uses a sampling time bin size of 1 ns. The experimental data and simulation both have on the order of 300,000 transitions. The straight-line behavior of both the simulation and experiment for short times in the upper panel indicates a power-law distribution and for long times in the lower panel an exponential distribution. 40 2.6 Kramers-Moyal Coefficient Visualization The Kramers-Moyal coefficients introduced in Section 2.3 are instrumental to our data- driven modeling. These coefficients encode short time information into the model allowing the capturing of both background thermal noise driving SMTJ fluctuations and the deterministic restoring force of the double well potential. Figure 2.7 shows the probability density distribution of the explicit δΦn/n!τ calculated from the experimental data and cuts across the distributions at specified voltage levels where the raw count is the vertical axis. Out[ ] = -5 0 5 10 -100 -50 0 50 100 10.-6 0.00001 0.00010 0.00100 Out[ ] = -5 0 5 10 0 50 100 150 200 250 10.-7 10.-6 0.00001 0.00010 0.00100 0.01000 -50 0 50 1 10 100 1000 104 105 ● ■ ◆ ▲ ▼ ○ 0 50 100 150 1 10 100 1000 104 105 106 (a) (b) (c) (d) Figure 2.7: Drift and diffusion Kramers-Moyal coefficient probability density distributions and cuts across the distribution at specified voltage levels. Panel (a) is the drift term’s probability per bin size 0.916 V2/ms (299 µV bin size for voltage and 3.06 mV/ns bin size for δΦ/τ ). The colorbar shows the a logarithmic scale for the probability density. Panel (b) is a semi-log plot showing 6 voltage cuts across the probability distribution for δΦ/τ . The width of the distribution increases as voltage sweeps from positive to negative, and also drifts from positive δΦ/τ to negative, corresponding to the double well potential. Panel (c) is the diffusion term’s probability per bin size 1.14 V3/s (299 µV bin size for voltage and 3.81 mV2/ns bin size for δΦ/2τ ). Panel (d) is a semi-log plot showing 6 voltage cuts across the probability distribution for δΦ2/τ . 41 The skewness of the Fig. 2.7(a) gives rise to the deterministic drift term D1, while width of that distribution (evaluated explicitly as the average of the square of δΦ) at every voltage Φ gives rise to the stochastic/noise term D2. In Fig. 2.7(b) and Fig. 2.7(d), values that fall below 1 correspond to bins that were not measured in the experimental data. 2.7 Memory The overdamped, first-order Langevin model used throughout this chapter assumes the underlying dynamics are Markovian. To test the validity of this assumption in describing the experimental data, we construct a metric capturing the inertia of observed trajectories. For each time step t collected in experiment, we compute the probability of the system to increase (go up in voltage) or decrease (go down in voltage) at time t + τ conditioned on whether it increased or decreased from time t − τ to t. From these observed quantities we compute four distinct probabilities: up after up P↑↑; down after down P↓↓; up after down P↑↓; and down after up P↓↑; in these expressions, the left arrow shows the second change and the right arrow shows the first change. The sum of P↑↑ and P↓↑ equals one since the system must either increase or decrease in value. Likewise the sum of P↓↓ and P↑↓ also equals one. If the system is Markovian, we expect probabilities that have the same second-step direction will be independent of the direction of their previous steps, and only dependent on the current system state (i.e. P↑↓(Φ) = P↑↑(Φ) and P↓↓(Φ) = P↓↑(Φ)). Figure 2.8 shows the probability for each possible Pij plotted as a function of measured voltage Φ. As expected, the simulation transition probability P↑↓ = P↓↓ and P↑↑ = P↓↑ in Fig. 2.8(b), verifying the memory metric capturing Markovian behavior. On the other hand, the experimental data shown in Fig. 2.8(a) does not have equal transition probabilities, indicating 42 ● ■ ◆ ▲ -5 0 5 10 0.0 0.2 0.4 0.6 0.8 1.0 (a) (b) 5 0 5 10 Figure 2.8: Voltage transition probability Pij(Φ) shown for experiment in panel (a) and simula- tion panel (b). The sum at a particular voltage Φ of all the transition probabilities is always 2 because the voltage moves twice. The time step τ was taken as the experimental measurement time of 50 ps. non-Markovian behavior. To assess the effects of memory and inertia in the experiment and simulation, we para- metrically plot (P↑↓, P↑↑) through a series of voltage bins for the experimental and simulated voltage-time traces in Fig. 2.9. The experimental data exhibits non-Markovian behavior—most strongly so in the barrier crossing region between −5 mV and 5 mV—that cannot be captured with our first-order (memoryless) Langevin model. Inclusion of memory and higher-order differ- ential terms would seem a necessary next step to capture higher fidelity models. We discuss this further in Section 2.10. 2.8 Other parameterizations of D̃2 The ansatz chosen in Section 2.4 only has two fit parameters that can be optimized to fit the experiment’s calculatedM2/2τ Kramers-Moyal coefficient. This section describes the results using a higher and lower number of fitting parameters for the D̃2 ansatz. The higher order fit differs from the linear fit by including a Gaussian that seeks to capture 43 0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 (a) (b) Figure 2.9: Voltage transition probability from Fig. 2.8 plotted parametrically as a function of voltage Φ. Panel (a) shows the experiment in black circle vs Langevin simulation in blue hollow squares measured at the experimental acquisition frequency of 20 GHz (τsamp = 50 ps) with a equiprobability trendline shown in dashed grey. Panel (b) shows the experimental data subsam- pled at τsamp ∈ (50, 250, 500) ps. The black circles are the reproduced experimental data from panel (a). Note that for higher subsampled times how the parametric probability line approaches the dashed, grey equiprobability line. the difference between the Langevin model’s calculated M2/2τ and the experiment’s calculated M2/2τ seen in Fig. 2.5. The analytic form for this fit D̃2(Φ;µ) = mΦ + b + a e(Φ−µ0)/2σ2 . Note we ensure that D2 remains positive, as discussed in the main text. The lower order fit assumes a constant D2 → D2(Φ;µ) = b. Figure 2.10 compares the calculated M2/2τ coefficients from experiment and simulations using the Gaussian plus linear fit, the linear fit, and the constant fit for D̃2. With progressively higher-order fits, the simulation fidelity increases, indicating that the actual noise term in the SMTJ device has a more complicated structure than these simple analytic models. However, model fidelity to the empirical data is remarkable and demonstrates the flexibility and robustness of our data-driven method to capture gross statistics of the STMJ device. In addition to comparing calculated Kramers-Moyal coefficients, we also compare the dwell time distributions of each model. Figure 2.11 shows the dwell time distributions in the 44 ] ] ] ] ]] Figure 2.10: Experimental and Langevin simulation results for the time-delayed, drift panel (a) and diffusion panel (b) coefficients computed at a sampling time τsamp = 50 ps equivalent to the experimental measurement time. The various lines in the figures are as follows: black circles show experimental data; blue squares show first-order, data-driven Langevin simulations with a Gaussian plus linear D̃2; yellow circles show first-order, data-driven Langevin simulations with a linear D̃2 as reproduced from the main chapter body; green triangles show first-order, data-driven Langevin simulations with a constant D̃2. 45 ] ] ] ] ] ] Figure 2.11: Log-log panel (a) and semi-log panel (b) dwell time distribution for the P state τp as calculated in experiment and produced by simulation: black circles show experimental data; the blue, dashed line shows the first-order, data-driven Langevin simulations with a Gaussian plus linear D̃2; the yellow, dotted line shows the first-order, data-driven Langevin simulations with a linear D̃2 as reproduced from the main chapter body; the green, dot-dashed line shows the first-order, data-driven Langevin simulations with a constant D̃2. The line-like behavior of the simulation and experiment in the upper plot at short times indicated a power-law dwell time distribution in that regime. In the lower plot, the line-like behavior of the simulation and experi- ment for longer times indicates an exponential dwell time distribution in that regime. The same sampling time τsamp is used as in Fig. 2.6, but the overall dwell time distribution is only measured over the first 2,000,000 measurement points of the experiment. 46 0.1 0.5 1 5 10 0 2 4 6 8 10 12 14 ns samp ns ] ] ]] Figure 2.12: Mean dwell times of experiment and simulation as a function of sampling time. P state for the experiment and simulations measured using 2× 106 points. This number of points is fewer than used in Fig. 2.6, which is why there is a difference between the experimental curves in the two figures. This was done to ensure magnitude agreement in the dwell time distribution between the experiment and simulations, which were limited in total length because of compu- tational expense. The dwell time distribution is unchanged for these different assumed fits for D2. 2.9 Characteristic dwell times In the discussion of Section 2.5, we described how the short-time power law behavior of the dwell time histograms in experiment to its fitted simulations [Fig. 2.6] can be attributed to the scale-free, fractal-like nature of random walks around the switching threshold at high sampling frequencies. They are not mere artifacts of the experiment, but additional structure that is usually neglected in dwell time analysis of magnetic devices, and structure that we expect to similarly arise in any magnetic nanodevice. Calculating the mean dwell times from the entire distributions gives very small dwell times 47 dominated by these short-time power-law behaviors. Figure 2.12 shows good agreement between the mean dwell times 〈τ〉 extracted from the simulation and from the experiment. Even when subsampled, the simulation and experimental average dwell times agree moderately well. Recall that D2, which is fit to M2 via finite-time corrections, holds all this timescale information in the model. The mean dwell times increases as the subsampling time increases because more and more short-time events are eliminated from the distribution. Generally speaking, one expects that if a dwell time is relevant for a particular application, it is likely the characteristic dwell time of the exponential distribution that is most meaningful. If we ignore the power-law behavior of the very short dwell times, we can extract these from the slope of the exponential in the semi-log plot, panel (b) in Fig. 2.6. The resulting fits give τP,exp = 7.10 ns; τAP,exp = 11.5 ns; τP,sim = 5.95 ns; τAP,sim = 9.51 ns. The one standard deviation ranges for these slopes are τP,exp ± σ = [6.82, 7.39] ns; τAP,exp ± σ = [11.23, 11.85] ns; τP,sim± σ = [5.79, 6.12] ns; and τAP,sim± σ = [9.33, 9.69] ns. The agreement is good even though both simulation dwell time slopes slightly underestimate the corresponding experimental values. It is important to note that different fitting schemes will give different results that lie out- side these ranges, which are strictly determined by the statistics of the points that have been fit. Indeed, it is not clear that these extracted times represent anything of experimental relevance. By a fine-tuned subsampling or RC-filtering of the data, we can artificially flatten the power law be- havior so that the resulting distribution is exponential. Using this procedure to obtain a plausibly exponential distribution gives τP,sim ≈ 25 ns and τAP,sim ≈ 30 ns. Going back to Fig. 2.2 quali- tatively illustrates this effect. Keeping all short-time events gives on the order of 35 crossings of the zero line, which would correspond to mean dwell times of approximately 3 ns. On the other hand, if the rapid fluctuations were filtered out, there would be approximately six transitions for 48 a mean dwell time of about 17 ns. 2.10 Discussion While the sLLG equation for a macropsin is physically motivated and contains terms di- rectly identifiable with distinct physics, it cannot capture certain statistical metrics associated with our experimental device. This is partly due to the sLLG model’s drift and diffusion coeffi- cients; fitting the former to the observed drift would require including sufficiently “data-driven” terms as to lose the simple plausibility of the model, while directly fitting the diffusion is simply impossible in general. Indeed, even single-domain MTJs may show diffusion properties that do not generally agree with the macrospin. Theory indicates that the Gilbert damping (an overall prefactor for D2) in common magnetic materials has strong directional dependence [91], while recent experiments on SMTJs switching suggests that the Suhl instability may create dynamically variable damping mechanisms during single-domain reversal [92]. These considerations suggest that we must move beyond the simple macrospin model to accurately capture SMTJ physics in compact models in general, and the data-driven approach explored in this work is our initial attempt at this enterprise. We show that our data-driven approach captures multiple statistical and dynamical metrics with high fidelity in an experimentall