ABSTRACT

Title of Dissertation: MEASUREMENT, SIMULATION, AND COMPACT
MODELING OF COMPLEX ELECTRON DYNAMICS

Liam Alexander Pocher
Doctor of Philosophy, 2025

Dissertation Directed by: Professor Daniel P. Lathrop
Department of Physics &
Professor Patrick G. O’Shea
Department of Electrical and Computer Engineering

Systems containing large numbers of electrons can exhibit surprisingly complex and rich

dynamics. In this dissertation, we ask: What is the minimum necessary detail in measurement

or data-driven modeling and simulation to capture complex dynamics manifesting from these

systems? To answer this, we integrate experiment, simulation, and theory to understand their

complex dynamics. In this dissertation, we examine two such systems: (i) a superparamagnetic

tunnel junction (SMTJ) and (ii) charged-particle beam dynamics.

We first consider the deterministic-stochastic behavior of a constant-current driven SMTJ,

where we create a measurement-driven, overdamped Langevin model capturing statistical prop-

erties of the device. We show both how this model captures device statistics across time scales

and how it can be refined to capture higher-order behavior.

We next examine the centroid motion of a charged-particle beam and propose a method

for understanding and predicting it using an interpretable, data-driven approach whose output is


directly identifiable to terms in underlying low-dimensional evolution equations. We derive the

evolution equations solely on the basis of data—with no recourse to an underlying first principles

model. We compare and contrast our methodology with both a machine learning technique and a

first principles model, and we show that we can learn interpretable equations for nonlinear beam

dynamics at lower computational cost while achieving comparable accuracy.

Lastly, we investigate the phase space evolution of a charged-particle beam. Accurate

knowledge of the phase space at beam creation is crucial for understanding and predicting beam

dynamics. We measure a velocity space modulation to initialize the phase space of first principle

simulations and capture beam statistics and internal beam structure—resembling a cruciform—

with high fidelity. This contrasts with both employed first principles models—which do not

account for beam structure as they assume a uniform beam cross section—and simulations using

ideal phase space distributions. Finally, we demonstrate sensitivity to beam and lattice parameters

varied within experimental measurement error.


MEASUREMENT, SIMULATION, AND COMPACT
MODELING OF COMPLEX ELECTRON DYNAMICS

by

Liam Alexander Pocher

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2025

Advisory Committee:
Professor Daniel P. Lathrop, (Chair/Co-Advisor)
Professor Patrick G. O’Shea, (Co-Advisor )
Professor Thomas M. Antonsen Jr.
Doctor Irving Haber
Professor Christopher Jarzynski (Dean’s Representative)
Doctor Advait Madhavan


© Copyright by
Liam Alexander Pocher

2025


Acknowledgments

I owe gratitude to all those who made this dissertation possible and because of whom I

persevered through my graduate studies It is impossible to thank everyone, and I apologize to

those I’ve inadvertently excluded.

The first person I would like to thank is the chair of my committee and co-advisor, Daniel

Lathrop, who always supported me through thick and thin on many research projects, the ups

and downs of funding situations, challenges both professional and personal, and who introduced

me my other co-advisor Patrick O’Shea. Just like Dan, Pat has been steadfast in his support,

enabling me to collaborate with the broader accelerator physics community, provided guidance

on technical writing and presentation, and always offered excellent professional advice.

In addition to my two co-advisors, I would also like to thank the rest of my committee

members for agreeing to serve on my committee and each contributed to this dissertation. Ad-

vait Madhavan and Irving Haber in particular have both dedicated significant time and effort to

mentoring me during graduate school tenure. Advait provided the amazing opportunity to work

on an international project at NIST, a project which became the textbook example of an excellent

research collaboration. Irving always pushed me to make models and simulation better agree

with experiment, and use one or the other sniff out errors in both! Thomas Antonsen encouraged

me to publish, provided insight and concrete suggestions on how to make it so, and persevered

through the referee process. Chris Jarzynski provided feedback on my work involving stochastic

ii


processes, and helped steer that work into a fruitful research direction.

Other staff and scientists at the University of Maryland (UMD) and the National Institute of

Standards and Technology (NIST) that were indispensable in my research: Matthew Daniels for

excellent research discussion, writing feedback, and by being a exemplary physicist; Mark Stiles

for clear, insightful explanation of complex technical concepts, amazing presentation critique,

and being a role-model theorist; Brian Beaudoin for his skill as technologist and freely spending

his hard-pressed-for time in the laboratory to push me over the finish line; Dan Abell for precise

editing; Dorothea Brosius for this dissertation’s LATEXtemplate; the rest of the administrative staff

of the Institute for Research in Electronics and Applied Physics (IREAP); and both Santiago

Bernal and David Sutter for their setup of my final experiment.

A boon to any student is the environment created by fellow students who also are pursuing

the same goal of earning a PhD. I would like to say thank to all these students I have had the

privilege to work with and/or be supported by. In no particular order they are: John Rose, Eduardo

Lozano, Heidi Komkov, Ian DesJardin, Ella Vicente, Sidra Gibeault, Elaine Jaross, Alex Pick-

Alaus, and Shiyi (Shelley) Wang.

A significant reason I pursued graduate school and came to the University of Maryland has

to do with prior mentors before graduate school at the Colorado School of Mines (CSM) and Los

Alamos National Laboratory (LANL). Each in their own way caused me to learn something more

about how to become a better scientist and person.

Special thanks is due Jonathan Mace, who supported me as a young physicist since my

undergraduate studies. Travis Peery is another individual I would especially like to thank, as he

along with Jonathan enabled me to successfully apply to graduate school and be admitted to my

desired program, provided advice for years, and always was ready to talk physics over coffee (and

iii


long emails). Other mentors of mine from LANL I feel necessary to acknowledge (and thank)

are Dan Borovina, Michael Murphy, and Nathaniel Morgan. They each contributed in some way

to shaping my view on how to be either good leader and scientist.

My mentors at Colorado School of Mines Nils Tilton and Charles (Chip) Durfee introduced

me to research, sparking my curiosity to be a scientist, and enabling me to establish connections

with scientists at LANL. They also each introduced me to the University of Maryland, and specif-

ically the Chaos program, which set me on the path to UMD years before I applied.

Three hobbies that helped me through graduate school was coffee, yoga, and rock climbing.

Having ready, and often too tempting, access to amazing coffee is all a graduate student can ask

for. The other past time that helped me focus was practicing yoga. The last hobby was rock-

climbing; it regularly provided physical and emotional focusing and release.

Another round of special thanks is due my parents. They are always supportive, and always

gave me more good reasons to keep pursuing this degree.

This work would not have been possible without funding from the US DOE-HEP grant

DE-SC0022009, the National Science Foundation via NSF grant number:CCF2121957, and by

the Agence Nationale de la Recherche StochNet Project ANR-21-CE94-0002-01.

iv


Table of Contents

Acknowledgements ii

Table of Contents v

List of Tables viii

List of Figures ix

List of Abbreviations xi

Chapter 1: Introduction 1
1.1 Examples of Complex Electron Dynamics . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Disseration Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 An Overview of Deterministic Dynamics . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 The Lorenz Attractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 An Overview of Stochastic Dynamics . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.1 Mathematical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Pseudorandom number generators . . . . . . . . . . . . . . . . . . . . . 12

1.4 Associated Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.2 Conference Proceedings and Talks . . . . . . . . . . . . . . . . . . . . . 14

Chapter 2: Measurement-driven Langevin modeling of superparamagnetic tunnel junc-
tions 16

2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Model Construction Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Time Trace Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 One-dimensional Langevin model . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6 Kramers-Moyal Coefficient Visualization . . . . . . . . . . . . . . . . . . . . . 41
2.7 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.8 Other parameterizations of D̃2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.9 Characteristic dwell times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Chapter 3: Data-Driven Discovery of Beam Centroid Dynamics 52
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

v


3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.1 Accelerator Lattice and First Principles Model . . . . . . . . . . . . . . . 55
3.2.2 Proposing Data-Driven, Interpretable Models . . . . . . . . . . . . . . . 60

3.3 SINDy Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3.1 Fit 1 - Periodic Forcing . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.2 Fit 2 - SHO and Periodic Forcing . . . . . . . . . . . . . . . . . . . . . . 66
3.3.3 Fit 3 - SHO, Periodic Forcing, and Nonlinear Interaction . . . . . . . . . 68
3.3.4 Prediction Ability Case Study . . . . . . . . . . . . . . . . . . . . . . . 68

3.4 Comparisons Between Centroid Model, SINDy Fit, and Simulation Data . . . . . 72
3.5 Beam Moment Evolution Equations . . . . . . . . . . . . . . . . . . . . . . . . 75
3.6 Testing for a Necessary but not Sufficient Chaos Condition . . . . . . . . . . . . 80
3.7 Fourier Transforms Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Chapter 4: Investigating Beam Dynamics with Measurement-Driven Initialization 92
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.2.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.3 Beam Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.3.1 Axisymmetric Envelope Model . . . . . . . . . . . . . . . . . . . . . . . 101
4.3.2 Single-Particle Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.4 First Principles Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.4.1 Accept-Reject Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.4.2 Comparing Experiment to Simulations . . . . . . . . . . . . . . . . . . . 116

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Chapter 5: Conclusion 129

Appendix A: Tables of SINDy Model Coefficients 136

Appendix B: Dipole Parameters for UMER Lattice Simulation 139

Appendix C: Solenoid Magnetic Fields 141
C.1 Magnetic Field Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
C.2 Solenoid Magnetic Field Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 143
C.3 Remotely Measuring the Solenoid Field with Larmor Angle Rotation . . . . . . . 143

Appendix D: Configuration Space at the Slit Plane 149
D.1 Axisymmetric Envelope Equation Solution to the Slit Plane . . . . . . . . . . . . 149
D.2 Comparison to Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
D.3 Relating Between Statistical Beam Quantities . . . . . . . . . . . . . . . . . . . 158

Appendix E: Direct Velocity Space Measurement 160
E.1 Pinhole Scan Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
E.2 Pinhole Centroid Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
E.3 Measuring Velocity Space Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . 164

vi


Bibliography 166

vii


List of Tables

3.1 UMER lattice parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Power spectra peaks from Warp data . . . . . . . . . . . . . . . . . . . . . . . . 64

4.1 Experimental Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2 Simulation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

A.1 Coefficients for the various dxc/dz models . . . . . . . . . . . . . . . . . . . . . 137
A.2 Coefficients for the various dyc/dz models . . . . . . . . . . . . . . . . . . . . . 138

B.1 UMER dipole field strengths and locations . . . . . . . . . . . . . . . . . . . . . 140

D.1 Location and descriptions of items in the experimental lattice as shown in Fig. 4.1.
Appendix C details the transfer functions of the solenoids relating applied current
to maximum axial magnetic field along the beam line Bz(r = 0, z). . . . . . . . . 150

E.1 Maximum measured radial divergence r′max and statistical beam size in vleocity
space from Fig. E.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

viii


List of Figures

1.1 Integration of Experiment, Simulation, and Theory Cartoon . . . . . . . . . . . . 3
1.2 Lorenz 1963 Equation Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Matlab PRNG comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 Visual outline of the 1D Langevin model generation process . . . . . . . . . . . 23
2.2 Typical bi-stable SMTJ voltage fluctuations . . . . . . . . . . . . . . . . . . . . 25
2.3 Histogram of SMTJ state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Fit to the effective energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Time-delayed drift coefficient and diffusion coefficients in experiment and simu-

lation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6 Dwell time distributions extracted from experiment and simulation . . . . . . . . 40
2.7 Kramers-Moyal Coefficients Distribution and Voltage Cuts . . . . . . . . . . . . 41
2.8 Voltage transition probability Pij(Φ) as a function of voltage . . . . . . . . . . . 43
2.9 Voltage transition probabilities for simulation and experiment measured at differ-

ent subsampling rates in (P↓↑ ,P↑↑) space . . . . . . . . . . . . . . . . . . . . . . 44
2.10 Experimental and Langevin simulation results for the time-delayed, drift and dif-

fusion coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.11 Dwell time distributions between experiment and simulation . . . . . . . . . . . 46
2.12 Mean dwell times of experiment and simulation as a function of sampling time . 47

3.1 UMER diagram and parametric centroid trajectory over 1 lattice turn . . . . . . . 56
3.2 Hard-edge quadrupole and dipole profile of the UMER lattice . . . . . . . . . . . 59
3.3 Regularized power spectra for centroid trajectory over one turn . . . . . . . . . . 63
3.4 SINDy centroid trajectory training results . . . . . . . . . . . . . . . . . . . . . 65
3.5 SINDy centroid power spectra training results . . . . . . . . . . . . . . . . . . . 69
3.6 Different SINDy optimization hyperparameter results for the the centroid trajec-

tory over three lattice turns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.7 Different SINDy optimization hyperparameter error fraction results . . . . . . . . 70
3.8 Horizontal centroid power spectra comparison between Warp simulation data and

first principle centroid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.9 Parametric plot comparing Warp data, first principles modeling, and SINDy’s result 73
3.10 Two dimensional histogram for centroid trajectory over 100 turns for Warp data,

first principles model, and SINDy fit . . . . . . . . . . . . . . . . . . . . . . . . 74
3.11 Power spectra for the horiztonal centroid for Warp data, first principles model,

and SINDy fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.12 Smoothed horiztonal centroid trajectory difference . . . . . . . . . . . . . . . . . 81
3.13 Exponential slope of horizontal centroid trajectory difference . . . . . . . . . . . 82

ix


3.14 Matrix plot of two dimensional histograms for centroid trajectory over various
length scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.15 Unregularized power spectra for the solution to sample differential equations . . . 85
3.16 Labeled, regularized power spectrum of horizontal centroid in UMER . . . . . . 87

4.1 Beamline Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2 Experimental Beamline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3 Blackbody Cathode Light with and without Filter . . . . . . . . . . . . . . . . . 98
4.4 Experimental Moment Data with Raw Images . . . . . . . . . . . . . . . . . . . 99
4.5 Beam Envelope Comparison between Experiment, Simulation, and Modeling . . 103
4.6 Envelope Comparison between Experiment, Model, and Simulation . . . . . . . 105
4.7 Dipole Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.8 Single-Particle Current Fractions and Trajectories Comparison . . . . . . . . . . 110
4.9 Single-Particle Final Position as Function Initial Conditions . . . . . . . . . . . . 111
4.10 Smoothed, Experimental Pinhole Data Compared to Accept-Reject Sampling Result116
4.11 Accept-Reject Sampling Histograms Using Pinhole Scan Data . . . . . . . . . . 117
4.11 Experiment to Simulation Comparison for PH Results . . . . . . . . . . . . . . . 119
4.11 Experiment to Simulation Comparison for Varying Simulation Hyperparameters . 124
4.12 Phase Space from Simulation Compared to Experiment . . . . . . . . . . . . . . 126

C.1 Axial Profiles of Solenoid M1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
C.2 Br(r, z) and Bz(r, z) Profiles for Solenoid M1 . . . . . . . . . . . . . . . . . . . 144
C.3 Pinhole Beam Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
C.4 Measured versus Theoretical Larmor Angle Rotations for Short Solenoid M1 . . . 147
C.5 Measured versus Theoretical Larmor Angle Rotations for Short Solenoid M2 . . 148

D.1 Beamline Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
D.2 Beam size at the slit plane z = 43.54 cm as a function of solenoid current I1 and I2 151
D.3 Beam-edge slope sensitivity to changes in applied solenoid current along the

waist-line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
D.4 Dimensionless change in beam edge radius R for different envelope model pa-

rameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
D.5 Change in beam edge radius slope R′ for different envelope model parameters . . 154
D.6 Experimental images shown in (I1, I2) parameter space . . . . . . . . . . . . . . 157
D.7 Model Beam Size Validation to Experiment . . . . . . . . . . . . . . . . . . . . 158

E.1 Pinhole Scan Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
E.2 Pinhole Scan Data at the Gatevalve and Boxscreen . . . . . . . . . . . . . . . . . 162
E.3 Pinhole Scan Data at the Box Screen and Gate Valve . . . . . . . . . . . . . . . 163
E.4 Pinhole Scan Data Absolute Centroid Motion and Relative Divergence . . . . . . 164

x


List of Abbreviations

CERN Conseil Européen pour la Recherche Nucléaire
CSM Colorado School of Mines
IREAP Institute for Research in Electronics and Applied Physics
KM Kramers-Moyal
KV Kapchinskij-Vladimirskij
LANL Los Alamos National Laboratory
LHC Large Hadron Collider
ND N-dimensional
NIST National Institute of Standards and Technology
NP non-deterministic polynomial-time (computational problem)
ODE Ordinary Differential Equation
P polynomial-time (computational problem)
PCB Printed Circuit Board
PH Pinhole
PRNG Pseudo-Random Number Generator
SG Semi-Gaussian
SDE Stochastic Differential Equation
SINDy Sparse Identification of Nonlinear Dynamics
SMTJ Superparamagnetic Tunnel Junction
UMD University of Maryland
rms root mean square

xi


Chapter 1: Introduction

This chapter provides a brief overview of the history of the study of deterministic and

stochastic processes, and ties together the—at first glance—disparate chapters of this disserta-

tion. In summary, this dissertation integrates experimental, numerical, and theoretical physics

to answer the question: What is the minimum necessary detail in measurement or data-driven

modeling and simulation to capture complex electron dynamics?

1.1 Examples of Complex Electron Dynamics

Science is the study of the natural world. Many phenomena across great scales in space

and time exhibit behavior manifesting from complex processes: stars gravitationally interacting

in the galactic morass, electrons pinging off molecules in semiconductors, hurricanes brewed

in the Gulf of Mexico, etc. These are examples of deterministic-stochastic processes whose

complex dynamics are best understood through the interdisciplinary interweaving of theoretical,

numerical, and experimental physics.

Examples of deterministic and stochastic process are chaos and noise, respectively. In

this dissertation, we take chaos as the result of deterministic, rule-based behavior whose dy-

namics are exponentially sensitive—on the length scale of the underlying attractor—to perturba-

tions/differences in initial conditions naturally occurring (e.g. measurement error from experi-

1


ment or numerical round off in computation). Noise, on the other hand, while potentially arising

from high-dimensional dynamics (e.g. Brownian motion of particles in a background fluid), can-

not be described with deterministic rules, but rather with underlying statistical moments of the

distribution producing said noise.

The complex electron dynamics studied in this dissertation come from the seemingly dis-

parate fields of probabilistic computing and accelerator physics and exhibit dynamics including

chaos and noise. While these fields are separated by a gulf of application, they yet share the com-

mon thread of being systems containing large number of electrons. The systems of electrons that

we examine are (i) a superparamagnetic tunnel junction (SMTJ) that uses not just electron charge

but also electron spin combined with material properties to thermally induce bi-stable fluctuations

constituting random bit streams for Ising machines, and (ii) an electron beam in either an elec-

tron ring or linear accelerator. We ask: What is the minimum necessary detail necessary in either

measurement or data-driven modeling and simulation to capture complex dynamics? Auxiliary

questions include: What are the limits of these models? Why include (exclude) certain terms?

How can they be integrated with the other disciplines of experiment, simulation, or theory and

can we attribute correct reasons for successes or limitations? In this dissertation, we integrated

experiment, simulation, and theory to understand the complex dynamics manifesting from these

systems. Figure 1.1 shows a cartoon of how each discipline of experimental, computational, and

theoretical physics is necessary to create a foundation for understanding complex dynamics.

2


Ex
pe
rim
en
t Sim

ulation

Theory

Figure 1.1: A cartoon of a sitting-stool (a sample body of scientific work) fully supported by the
interweaving of experimental, numerical, and theoretical physics.

1.1.1 Disseration Outline

In Chapter 2, we pose these questions to the creation of a measurement-driven model for

the mixed deterministic-stochastic behavior of the constant-current driven SMTJ. We use an em-

pirical measurement to seed a 1D Langevin model—a simple stochastic differential equation—

with both short-time and long-time statistics and show how that model breaks down while still

capturing macroscopic statistics and outperforms first principles modeling for accuracy and first

principles simulations in computational expense. We assume a 1D Langevin model and not a

higher order one as the simplest equation to model a system with random forcing.

We then progress in Chapter 3 to using virtual data from a well-validated and -verified

simulation to motivate an interpretable, data-driven model using an algorithm from the nonlinear

dynamics community called Sparse Identification of Nonlinear Dynamics (SINDy). This model

is a coupled system of first order ordinary differential equations which used the underlying data to

populate physically motivated terms in the model to simulate environments where first principles

modeling is inadequate. After we generated the data-driven model, we compared it to an accurate,

first principles model and the simulation and note the various discrepancies and reasons for them.

We evolve our approach in Chapter 4 to not just use empirical data to motivate terms in

3


stochastic/ordinary differential equations, but an entire phase space distribution. We integrate

all three disciplines of experimental, numerical, and theoretical physics to ascertain model, ex-

perimental, and simulation limitations. We initialize the phase space of an electron beam using

experimental measurements and compare data for the beam’s configuration space dynamics1 to

both simulation and theory. The complex structural evolution—resembling a cruciform—in the

experiment is reconstructed in the simulation because of the accurate initialization of the beam’s

initial phase space. This contrasts with both first principles models—which do not account for

beam structure as they assume a uniform beam cross section—and simulations using ideal veloc-

ity distributions. The initial velocity space of a space-charge dominated beam has a large impact

on its internal structural evolution, which this study has shown to be highly sensitive to small

variations in beam parameters within measurement error.

Finally, in Chapter 5 we summarize the conclusions of each chapter, and propose steps for

further research.

1.2 An Overview of Deterministic Dynamics

Humanity has studied the heavens since time immemorial. This study has produced: cre-

ation myths; ancient timekeeping across the world in the form of the Egyptian, Mayan, Chinese,

etc. calendars; and, not the least, celestial mechanics. The predominant Ptolemaic (geocentric)

view of the world was challenged in the Renaissance by scientists such as Galileo, who cham-

pioned the Copernican (heliocentric) model of the solar system predicated upon observational

evidence in the solar system.

1Configuration space (x, y) is but one 2D projection of the 6 unique 2D projection of the total 4D transverse
phase space.

4


Fifty years after Galileo, Newton applied rigorous science to the study of the two gravita-

tional bodies in the mid 17th century. In the late 19th century, Poincaré expanded this analysis

to the three body problem (the sun, earth, and moon), but couldn’t find analytic solutions to the

underlying equations of motion [1]. Instead, he recoursed himself to a geometric approach. Thus,

the study of chaos was born, if it was not yet named as such. Poincaré’s approach to the three-

body problem produced the eponymous Poincaré recurrence theorem, which states that a system

state would return arbitrarily close to its starting position after a sufficiently long time.

Further work in the burgeoning field of chaos and nonlinear dynamics included a technique

generated at the turn of the 20th century: the analysis of stability of dynamical systems, exem-

plified by Lyapunov’s disseration published in 1892 [2]. More foundational work came from

Birkhoff who discovered in 1931 ergodic theorem [3], which became a foundation of modern

statistical mechanics.

Other work in the field of nonlinear dynamics and chaos included fundamental work by

Russian scientists such as Kolmogorov [4], Arnold [5], and Moser [6] which culminated in the

Kolmorgorov-Anrold-Moser (KAM) theorem. The KAM theorem describes how invariant tori

corresponding to integrals (constants) of motion are robust to small perturbations.

Advances in computing also accelerated study of nonlinear problems, prime examples of

such being the Fermi-Pasta-Ulam-Tsinguo (FPUT) problem [7] and Lorenz’s study of convection

in the atmosphere [8]. The FPUT problem was motivated by studying thermalization in nonlinear

dynamical systems, specifically a series of nonlinear harmonic oscillators. The paper’s authors

expected a rapid thermalization of system energy, yet they observed a Poincaré recurrence of

the initial condition’s energy distribution among the system’s normal modes. This sparked a

revolution of interest in nonlinear oscillators continuing to this day.

5


1.2.1 The Lorenz Attractor

The Lorenz system is a the quintessential example of chaos; it was the motivation of the

phrase “the butterfly effect” [9]. The differential system Lorenz derived is

d

dt
x = σ (y − x) (1.1a)

d

dt
y = x (ρ− z)− y (1.1b)

d

dt
z = xy − βz (1.1c)

with x, y, and z being coefficients of low-order modes of the stream function and the temperature

profile, σ being the Prandtl number, ρ being a modified Rayleigh number, and β being a geometric

factor associated with the characteristic period of assumed convection cells. Equations (1.1a)

to (1.1c) are equations describing the colloquially referred to “butterfly” attractor [9], as seen in

Fig. 1.2

10 0 10 20

20

10

0

10

20

10 0 10 20
0

10

20

30

40

20 10 0 10 20
0

10

20

30

40
(a) (b) (c)

Figure 1.2: Lorenz equations solution. Panel (a) is the x−y projection of the total [x(t), y(t), z(t)]
solution, panel (b) is the x−z projection, and panel (c) is the y−z projection. The butterfly shape
is readily visible in pane (b), while the characteristic double lobes of the attractor are visible in
each panel.

6


Deterministic dynamics may be described with point particles with the (non-relativistic)

Newton’s second law

F = m~̈x = m~a , (1.2)

or an entire ensemble through Liouville’s equation (conservative, non conservative systems)

∂

∂t
ρ+

n∑

i

(
∂ρ

∂qi

dqi
dt

+
∂ρ

∂pi

dpi
dt

)
= 0 , (1.3)

where each qi is the coordinate of the i’th particle and each pi is the i’th particle’s momenta

associated with coordinate qi. Equations (1.2) and (1.3)

1.3 An Overview of Stochastic Dynamics

Random, or seeming so, motion has captured imagination for millennia. The poem “De

Rerum Natura” written in the first Century BC by the Roman poet Lucretius [10] discusses the

apparent random motion of dust in sunlight. Lines 114 to 140 of Book II from the Rouse transla-

tion of “De Rerum Natura”:

... apply your scrutiny whenever the sun’s rays are let in and pour their light through

a dark room: you will see many minute particles mingling in many ways throughout

the void in the light itself of the rays ... give attention to these bodies which are

seen to be in turmoil within the sun’s rays, because such turmoil indicates that there

are secret and unseen motions also hidden in matter... then the bodies that form a

small combination and, as one may say, are atoms, nearest to the powers of the first-

beginnings, are set moving, driven by the unseen blows of these, while they in their

7


turn attack those that are a little larger. Thus the movement ascends from the first-

beginnings and by successive degrees emerges upon our senses, so that those bodies

also are moved which we are able to perceive in the sun’s light, yet it does not openly

appear by what blows they are made to do so.

While the motion Lucretius describes is not random, but rather arises from the turbulent motion

of air, it does bear resemblance to the first recorded stochastic dynamics observed by Robert

Brown [11] and quantified by Louis Bachelier [12], Brownian motion (i.e. a Wiener process).

The early field of stochastic dynamics was theoretically developed further by by Einstein [13],

Smoluchowski [14] Ornstein and Uhlenbeck [15], and experimentally by Perrin [16] (winning

Perrin the 1926 Nobel Prize in Physics for ”for his work on the discontinuous structure of matter,

and especially for his discovery of sedimentation equilibrium”.)

Later developments included the work of Ito who developed the calculus of stochastic dy-

namics which may be applied the general equations governing stochastic dynamic called stochas-

tic differential equations (SDEs). A generic form for a first order SDE is

dx = µ(x, t)dt+ σ(x, t)dW , (1.4)

where x is the dependent (state) variable (e.g. the position of a Brownian particle) µ(x, t) is a

drift term which can depend on both the state x and time t and dW represents a diffusion term

with zero expectation value and unit standard deviation. Typically, this diffusion term is taken to

be white noise with 〈W (x, t)W (t′)〉t = 2δ(t′ − t) and 〈W (t)〉t = 0. The prefactor term σ(x, t)

scales the amplitude of the noise. It is typically assumed that at every time t that the state of

the system does not depend on the state any time before. In the limit of no drift, µ(x, t) = 0,

8


Eq. (1.4) becomes a Wiener process reproducing Brownian motion. If the drift term is linear in x,

then Eq. (1.4) becomes an Ornstein-Uhlenbeck process commonly used in mathematical finance

or as the model for a particle in a linear harmonic potential U(x) ∼ x2 with some random driving

force.

Treating the motion of all microscopic variables becomes intractable as the number of par-

ticles increases; typically, macroscopic variables are the desired quantities. The process through

which microscopic detail is washed out has been extensively studied for decades [17–20]. Ran-

domness is required to obtain the important macroscopic variables (voltage or particle position

in fluid) from first principles models of short time scale fluctuating processes (e.g. magnetic

spins in semiconductors, thermal particle velocities). This process has usually been referred to as

“Stosszahlansatz” or molecular chaos and is fundamental reason for irreversible processes [21].

This is intimately linked with Loschmidt’s paradox, also referred to as the “Arrow of Time”.

1.3.1 Mathematical Modeling

Two equations commonly used to describe either the trajectory or an ensemble of trajecto-

ries (i.e. a probability distribution) are the Langevin equation and the Fokker-Planck equation,

respectively [21–24]. The equation of motion for single particle undergoing some random force

is the Langevin equation [25], analogous to Newton’s second law with a stochastic force [21]

dx

dt
= f(x, t)︸ ︷︷ ︸

drift

+ g(x, t)ηx︸ ︷︷ ︸
diffusion

, (1.5)

where similar to the generic equation for an SDE, Eq. (1.4), we have expressions for the drift

(now attached to a deterministic term) and the diffusion term (stochastic/random/noise) term.

9


The Langevin equation is commonly used to describe the motion of a particle in a potential well

with thermal forces or the circuit equation [22, 23].

To describe not just the motion of a single particle interacting with a noisy (or high-

dimensional) environment, technologist can use the Fokker-Planck equation to describe an en-

semble of particle (or probability distribution) [22]. The Fokker-Planck equation is derived from

the master equation via the Kramers-Moyal expansion [26, 27] 2. In one spatial dimension, the

Kramers-Moyal expansion is

∂ρ

∂t
=
∞∑

n=1

(
− ∂

∂x

)n
[Dn(x, t)ρ(x, t)] . (1.6)

where: x is the usual spatial(state) variable; t the time; ρ(x, t) the probability density; and Dn the

n’th order Kramers-Moyal coefficients. The Fokker-Planck equation in in one spatial dimension

is

∂ρ

∂t
=

∂

∂x

[
−D1(x)ρ(x, t) +

∂

∂x
D2(x)ρ(x, t)

]
, (1.7)

with D1(x, t) the drift, and D2(x, t) the diffusion which are the first and second order Kramers-

Moyal coefficients. There are no third or higher order derivatives as that results in unphysical

negative probability densities. Only in the infinite limit of derivative expansions does one recover

nonnegative probability densities3. If the quantity within the square brackets is interpreted as a

probability current jρ, then Eq. (1.7) can be rewritten as a continuity equation for the probability

2The Kramers-Moyal expansion is similar to a Taylor series expansion for a function, though here this expansion
is applied to the evolution of a probabilistic distribution.

3This infinite limit of derivatives is reminiscent of the BBGKY hierarchy [28] but the limit of taking infinite
approximations is where the similarities end. The Kramers-Moyal expansion (which is what this process is called
for the Fokker-Planck equation) is an expression for successive approximations to the probability density ρ, whereas
the BBGKY hierarchy is for successive higher order of approximation for N -body interactions predicated upon the
Boltzmann equation.

10


density ρ

∂

∂t
ρ+

∂

∂x
jρ = 0

The Fokker-Planck and Langevin equations are related via their drift and diffusion terms.

Within this dissertation, we interpret the Langevin equation, Eq. (1.5), in the Stratonovich sense [29].

In the Stratonovich interpretation, the Langevin equation is related to the Fokker-Planck equa-

tion’s, Eq. (1.7), Kramers-Moyal coefficients by f(x) = D1(x)−D′2(x)/2 and g(x) =
√
D2(x).

The second term on the right-hand side of f(x) comes from gradients in the diffusion which give

rise to an effective drift term independent of the usual drift coefficient D1(x). A system with

multiplicative noise—where D′2(x) 6= 0—exhibits mixing between the stochastic terms and the

deterministic part; a nonzero diffusion gradient introduces a stochastic drift term. A system with

additive noise—D′2(x) = 0—is simpler to treat and models systems with noise terms that do not

depend on system state.

The stationary distribution ρ0(x) to the Fokker-Planck equation can be found by setting

∂ρ0/∂t = 0. We then may write

D1ρ0 =
d

dx
(D2ρ0) ,

with the ′ = d/dx and all expressions assumed to be function of x. Expanding the right hand side

we have

D1ρ0 = D′2ρ0 +D2ρ
′
0 ,

which may be rearranged to

D1 −D′2
D2

=
ρ′0
ρ0

.

The above expression has two quantities y′/y that can be written as d/dxln[y(x)]. Substituting

11


that into the above equation we have

D1

D2

− d

dx
ln(D2) =

d

dx
ln(ρ0) ,

which may be integrated with a dummy variable x′ from −∞ to x. After integrating and solving

for ρ0, we obtain the stationary distribution solely in terms of the drift and diffusion coefficients

ρ0(x) =
C

D2(x)
exp

[∫ x

−∞
dy
D1(y)

D2(y)

]
,

with C a normalization constant.

1.3.2 Pseudorandom number generators

Physical stochastic processes are not readily accessible for simulation. To overcome this,

pseudo-random number generators (PRNG) are used to produce numbers that behave as if they

were stochastic/random. However, these PRNGs are not truly random but rather are periodic

with a very long period as they follow a deterministic set of rules. PRNGs are used as random on

time scales short compared to their period length. There are a variety of pseudo-random number

generators readily available in common programming languages. Matlab, for example, has the

following algorithms: the Mersenne twister; Single instruction, multiple data (SIMD) Mersenne;

Combined Multiple Recursive; Multiplicative Fibonacci; Philox 4x32 generator with 10 rounds;

and Threefry 4x64 generator with 20 rounds [30].

Figure 1.3 shows sample trajectories for several selected pseudo-random number generators

available in Matlab with a random seed value of 1, with each algorithm producing different

12


Figure 1.3: A comparison between selected PRNGs in Matlab.

ones. The commonly used Mersenne twister algorithm—the default used by Python’s NumPy

module—has a period length of 219937−1 iterations [31]. The algorithm we use in Chapter 2 was

Mathematica’s default extended cellular automaton generator “ExtendedCA”. Typical pseudo-

random number generator issues are detailed in Press et at. [32].

1.4 Associated Publications

1.4.1 Publications

Below is a list of (to be) published papers that are linked in some way to the work presented

in this dissertation either drawn from published work for the majority of chapter materials, or

from papers currently in preparation. Papers 1 and 2 are work related to Chapter 2, paper 3 is

work related to Chapter 3, and paper 4 is related to Chapter 4.

13


1. Sidra Gibeault, Temitayo N Adeyeye, Liam A Pocher, Daniel P Lathrop, Matthew W

Daniels, Mark D Stiles, Jabez J McClelland, William A Borders, Jason T Ryan, Philippe

Talatchian, et al. Programmable electrical coupling between stochastic magnetic tunnel

junctions. Physical Review Applied, 21(3):034064, 2024.

2. Liam A Pocher, Temitayo N Adeyeye, Sidra Gibeault, Philippe Talatchian, Ursula Ebels,

Daniel P Lathrop, Jabez J McClelland, Mark D Stiles, Advait Madhavan, and Matthew W

Daniels. Measurement-driven Langevin modeling of superparamagnetic tunnel junctions.

Physical Review Applied, 22(1):014057, 2024.

3. Liam A Pocher, Irving Haber, Thomas M Antonsen Jr, and Patrick G O’Shea. Data-Driven

Discovery of Beam Centroid Dynamics. in preparation

4. Liam A Pocher, Shiyi Wang, Kevin Hermstein, Brian L. Beaudoin, Dan T. Abell, Thomas

M. Antonsen, Irving Haber, and Patrick G O’Shea. Controlling Beam Dynamics with

Measurement-Driven Morphology Modulation. in preparation

1.4.2 Conference Proceedings and Talks

Below is a list of conference proceedings and talk that are linked in some way to the work

presented in this dissertation. Item’s 1, 3, and 6 are related to the work presented in Chapter 2.

Items 2, 4, and 5 are related to the work presented in Chapter 3.

1. Matthew W Daniels, William A Borders, Nitin Prasad, Advait Madhavan, Sidra Gibeault,

Temitayo Adeyeye, Liam Pocher, Lei Wan, Michael Tran, Jordan A Katine, et al. Neural

networks three ways: unlocking novel computing schemes using magnetic tunnel junction

14


stochasticity. In Spintronics XVI, volume 12656, pages 84–94. SPIE, 2023.

2. L. A. Pocher, T. M. Antonsen, L. Dovlatyan, I. Haber, and P. G. O. Optimizing the Dis-

covery of Underlying Nonlinear Beam Dynamics. In Proc. NAPAC’22, North American

Particle Accelerator Conference, pages 335–338. JACoW Publishing, Geneva, Switzer-

land, 11 2022.

3. Liam Pocher, Sidra Gibeault, Advait Madhavan, Mark Stiles, Matthew Daniels, Nhat- Tan

Phan, Philippe Talatchian, Ursula Ebels, Daniel Lathrop, et al. Empirical modeling of

superparamagnetic magnetic tunnel junctions with application to probabilistic computing.

In APS March Meeting Abstracts, volume 2023, pages T02–005, 2023.

4. L. Pocher, I. Haber, T. Antonsen, and P. O’Shea. Optimizing the discovery of underlying

nonlinear beam dynamics and moment evolution. In Proc. IPAC’23, number 14 in Inter-

national Particle Accelerator Conference, pages 4503–4506. JACoW Publishing, Geneva,

Switzerland, 9 2023.

5. L. Pocher et al. Discovering transient models of emittance growth via mode interaction

of phase space nonuniformities. In Proc. IPAC’24, number 15 in IPAC’24 - 15th Inter-

national Particle Accelerator Conference, pages 921–924. JACoW Publishing, Geneva,

Switzerland, 05 2024.

6. Sidra Gibeault, Temitayo N Adeyeye, Liam A Pocher, Daniel P Lathrop, Matthew W

Daniels, Mark D Stiles, Jabez J McClelland, William A Borders, Jason T Ryan, Philippe

Talatchian, et al. Hardware implementation of a scalable two-spin Ising computer. In Spin-

tronics XVII, page PC1311924. SPIE, 2024.

15


Chapter 2: Measurement-driven Langevin modeling of superparamagnetic tun-

nel junctions

This chapter is based on work in two published papers [33, 34] and conference proceed-

ing [35–37]. In Ref. [34,35], I was the first author and was the prime conceptualizer, investigator,

and writer. In Ref. [33, 36, 37], I was an auxiliary paper author and played the role of a collabo-

rating scientist in the group.

2.1 Motivation

Since the widespread adoption of the computer, algorithms have been used to solve com-

putational problems. Developing efficient algorithms to solve these problems is of considerable

interest. In the broadest sense, there are two classes of computational problems which are solv-

able in polynomial-time (P) or non-deterministic polynomial-time (NP). P class computational

problems are readily tractable as the size of the problems increases (e.g. sorting numbers or deter-

mining if a number is prime [38]). Conversely, NP class problems can quickly become intractable

as problem size increases, as their solution time is often exponential or greater. Examples of NP

computational problems include (prime) factorization, the traveling salesman, the max-cut prob-

lem, circuit layout design, etc [39–42]. Many NP problems are mathematically equivalent and

can be recast as the Ising model [43,44]. For example, the max-cut (i.e. graph partitioning) prob-

16


lem can be reformulated into the Ising problem by encoding graph edges as coupling between

spins which then define the energy landscape of the Ising Hamiltonian [45, 46].

To illustrate this, consider a graphGwith vertices V and edgesE with a numberN vertices.

The graph partitioning problem is solved by separatingG into two subsets minimizing connecting

edges [40]. One way to implement this solution is to transform the graph G into the Ising model

formulation with the N vertices becoming N spins σi and edges E between vertices Ni and Nj

becoming an interaction term Jij .

The Ising Hamiltonian is

H(σ) = −
∑

i,j

Jijσiσj − µ
∑

j

hjσj , (2.1)

with H the Hamiltonian (i.e. total system energy), σi the spin (±1), Jij the coupling between

spin i and spin j (in the traditional Ising problem, only nearest neighbors interact; this is atypical

in graph partitioning problems), µ the magnetic moment of the material, and hj the external

magnetic field applied at spin σi.

In the case of graph-partitioning, the external field at every node hi is set to zero and the

sum over all nodes is taken over the edges E. Equation (2.1) is then

H(σ) = −
∑

i,j∈E(G)

Jijσiσj . (2.2)

The sum over all edges weighted by the mutual interaction of spins i and j in Eq. (2.2) can be

decomposed into sums over the spins in the two different sets over sets of either different/same

17


sign

H(σ) =
∑

same group

Jij −
∑

diff. group

Jij

and using the relation that the sum over all interaction is the sum of the interaction from both the

same and different groups we write

H(σ) =
∑

ij

Jij − 2
∑

diff. group

Jij ,

which is an expression for the sum of all the weights plus the sum of the edges between different

groups. Depending on the coupling between nodes Jij , the second term on the right hand side

could decrease or increase system energy.

Finding analytic solutions to this problem quickly becomes intractable for increasing N .

An appealing technique to solve this problem even with growing intractability with vertices N

increasing is simulated annealing [47,48]. Simulated annealing is an optimization technique that

stochastically searches between spin configurations σ = (σ1, ..., σN) to minimize the system

energy H(σ). The spin configuration can evolve in time if each spin state is probabilistic. A

new spin configuration is accepted if it lowers the overall system energy; if the proposed spin

configuration raises system energy, it is accepted according to a probability proportional to system

temperature. For systems with high (low) temperature, this acceptance probability is high (low).

In the context of the graph-partitioning model, the effective system temperature is the relative

strength of the interaction weights (deterministic) to the probabilistic evolution (stochastic) of

each spin. As interaction strengths increase, spins are coupled more strongly corresponding to

lower system temperature.

18


Pseudo-random number generators (PRNG) can be used to search for these optimized en-

ergy configurations, however PRNGs produce periodic and cross-correlated bitstreams, an un-

desirable feature for computation [49, 50]. Practical random bitstreams can be generated by

devices called superparamagnetic tunnel junctions (SMTJ) in a process called stochastic com-

puting [50–52]. Individual SMTJs used to solve NP problems such as graph partitioning with

algorithms like simulated annealing are probabilistic bits, or p-bits. A collection of interacting

p-bits forms a probabilistic computer. Unlike a conventional Boolean bit, which takes only de-

terministic 1 and 0 values, a p-bit uses its room temperature instability to encode a probability

value. Probabilistic computing with SMTJs uses the instantaneous state from a number random

bit streams to sample a distribution of states encoded to match problems like the Ising model and

has been applied to a variety of problems including the traveling-salesman [53], combinatorial

optimization [54], and integer factorization [55]. SMTJ technological relevance, and the atten-

dant need to design scaled-up circuits that include these devices, has made the task of finding

appropriate dynamical models an engineering priority.

Currently, three classes of models are used for different levels of granularity in representing

SMTJs physics. The simplest is the Néel-Brown model [56–58], a two-state Markov model where

the transition rates are exponential functions of the current and field applied to a device. These

models fit transition rates of the Markov model to experiment [59], and can sometimes be aligned

with the measurable parameters of the system in a physics-driven way [60, 61]. Their discrete

state spaces, however, hide any intermediate analog dynamics, and experiments to date indicate

that this analog magnetic behavior is visibly non-negligible [59, 62–66]. Néel-Brown models

implicitly assume an exponential distribution of the dwell times.

The most detailed form of modeling is achieved through micromagnetic simulations. Some

19


micromagnetics packages can explicitly solve the stochastic Landau-Lifshitz-Gilbert (sLLG)

equation over large magnetic domains using finite-element methods [67]. The simulation of

devices in this manner can lead to physically realistic results, but the required computational

timescale renders such models inappropriate for single elements in large circuit simulations. The

authors of Ref. [68] show that micromagnetic simulations qualitatively reproduce measured two-

level fluctuations but are not able to collect enough statistics with the full simulations to make a

robust statistical comparison.

The intermediate level between these approaches is that of analog compact modeling. Com-

pact models – relatively low-dimensional differential equations that capture the essential analog

behaviors of a device without full physical realism – have been developed for many nanode-

vices [69–72], including magnetic tunnel junctions [73–75]. When SMTJs are incorporated into

integrated circuits, modeling the interaction between the SMTJs and the transistors will require

a statistically accurate description of the SMTJ dynamics on the time scale of switching events

in both, but that description cannot be significantly slower than the modeling of the integrated

circuit. Those restrictions force the development and use of a compact model that accurately

describes the transition dynamics. A Markov model, while fast, does not accurately capture the

dynamics of the transition, whereas a micromagnetic model would capture that dynamics, but is

far too slow.

Facing these limitations of simple macrospin models, we propose a data-driven method

to capture the dynamics of interest in circuit simulations. We turn to generic overdamped 1D

Langevin models for the voltage across the device and fit such models to experimental measure-

ments. Similar data-driven approaches have been applied in other fields of physics to learn the

underlying physics of stochastic processes through a variety of methods [76–78]. We show that

20


our approach leads to high-fidelity matching of the voltage histograms and the drift and diffusion

coefficients of the device between model and experiment while maintaining the analog nature of

macrospin approaches. We also show that our model correctly predicts the drift dynamics and

dwell-time distributions of the experiment without explicitly encoding these in the simulation,

confirming a level of self-consistency between the model and the underlying physics.

We organize the remainder of this chapter as follows. We give an overview of the approach

we use to fit a time trace measured for an SMTJ in Section 2.2 so that we can reproduce a sta-

tistically identical time trace in simulation. In Section 2.3 we describe the different statistical

measures of such voltage-time traces that provide reductions of the data to relevant dynamical

quantities. Section 2.4 introduces an overdamped, 1D Langevin model and describes the method

that we use to fit the experimental data. Section 2.5 compares the Langevin modeling to experi-

ment and shows that the model reproduces the experimental data used to determine the model and

exhibits self-consistency with the assumed Fokker-Planck dynamics. Section 2.6 shows short-

time experimental data used in our generation of the data-driven Langevin model. Section 2.7

compares a metric for inertia/memory of the experimental and simulation data, and shows how

the experimental data’s memory changes as a function of the subsampling rate. Section 2.8 de-

scribes alternate fitting schemes for the our ansatz of D2, the measured noise term. Section 2.9

describes the extraction of mean dwell times from time traces. Finally, in Sec. 2.10, we discuss

possible extensions to our compact modeling approach, orienting future research in ways we

view as most useful for supporting the device-circuit-system codesign practices needed to realize

large-scale computational systems based on SMTJs.

21


2.2 Model Construction Roadmap

Our goal is to develop a data-driven, compact, semi-analytic model that produces dynamics

that are statistically identical to those of the voltage-time trace. We refer to the model as data-

driven because we use some of the statistical properties of the data to produce the model. In

contrast to other, physically motivated models, such as a macrospin approximation, our models

are not derived from approximations based on device physics. Not being tied to approximations

of the physics can be an advantage, as it allows the model to capture behavior that is neglected

by physically-motivated models that may be oversimplified compared to experimental reality.

Figure 2.1 shows the modeling flowchart for this chapter. From the experiment with the

SMTJ device we obtain a voltage-time trace. The statistics of that data are used to determine our

model, a one-dimensional equation of motion (directed, overdamped Brownian motion). This

equation of motion describes the state of the system with drift (deterministic) and diffusion

(stochastic) terms, with parameterizations that can be calculated directly from the experimen-

tal voltage-time trace. We integrate the model forward in time to produce a simulated time trace,

do the same statistical analysis that we had performed for the experiment, and compare the ex-

perimental and simulated statistics.

Figure 2.2 shows a representative segment of a voltage-time trace of our experimental de-

vice. The models we use to reproduce this behavior are one-dimensional stochastic differential

equations whose drift and diffusion parts are determined by both short-time statistics and long-

22


Experimental
device

Voltage-time
trace

Histogram,
Diffusion (D2)

Histogram, D1,
D2, dwell times

Dwell times,
Drift (D1)

Langevin
equation

Voltage-time
trace

Fit model D1, D2 to expt. statistics

C
om

pare sim
ulation

against experim
ent

ex
pe

rim
en

t
th

eo
ry

 m
od

el

Figure 2.1: Visual outline of the model generation process developed in this chapter. Top row:
we start with an experimental device from which a voltage-time trace is extracted under fixed
experimental conditions. The histogram and the voltage-dependent diffusion coefficient (second
Kramers-Moyal coefficient D2) are extracted numerically from the time trace. The drift coeffi-
cient and dwell-time distributions are also extracted for later use. Middle row: We fit drift and
diffusion characteristics for our model to the histogram and diffusion characteristics found in
experiment, in a way that compensates for the high-frequency cutoff of the experimental data.
Bottom row: we then use these fitted drift and diffusion characteristics to simulate a Langevin
equation from which we extract a voltage-time trace and its attendant statistics. Far right: the
suite of statistics from the model is validated against the experiment. In future work with more
complex fits, one may need to introduce a self-consistent fitting procedure that uses observed
error between theory and experiment to inform iterative refinements of the model (dashed line).

23


time statistics. The resulting equation is the Langevin equation

Φ̇ = f(Φ)︸︷︷︸
drift

+ g(Φ) ηΦ(t)︸ ︷︷ ︸
diffusion

, (2.3)

where Φ is the dependent variable (voltage), t the independent variable (time), f(Φ) the function

characterizing the drift, g(Φ) the function characterizing the diffusion, and ηΦ(t) a Gaussian

white noise term 1 with the properties 〈ηΦ(t)ηΦ(t′)〉t = 2δ(t′ − t) and 〈ηΦ(t)〉t = 0. The drift

and diffusion terms of the Langevin equation are related to each other through the voltage-time

trace histogram, which gives the long-time average of the stationary state of the system for a fixed

current flowing through the SMTJ; see Section 2.4 for further details.

The one-dimensional Langevin models are completely determined by fitting the histogram

and either the first, n = 1, or second, n = 2, Kramers-Moyal coefficient

Dn(Φ) =
1

n!
lim
τ→0

〈[Φ(t+ τ)− Φ(t)]n〉t
τ

, (2.4)

with τ the sampling time, the angle brackets denoting a time average over the voltage-time trace,

and [Φ(t+ τ)− Φ(t)] a τ -delayed difference in voltage. The first and second Kramers-Moyal

coefficients encode short-time dynamics into our model. The D1 coefficients capture the drift

behavior of the system while D2 captures the diffusive behavior. Fitting the histogram and one

of the Kramers-Moyal coefficients then determines the functional coefficients in the Langevin

model Eq. (2.3), fully characterizing it and allowing us to perform simulations that mimic the

1We note that in reality the driving noise terms likely possess additional structure beyond mere white noise.
However, as we will show in Sec. 2.5, the white noise assumption does very well in reproducing the statistics
observed in experiment. Physically, the choice of white noise is the unique choice that ensures the dissipative
mechanisms of the system are local in time [79].

24


0 20 40 60 80 100

5

0

5

t ns

m
V

] ]

]]

]

]

Figure 2.2: Typical bi-stable voltage fluctuations of an SMTJ in our experimental setup, measured
at a sampling rate of τexp = 50 ps. Shown is a small subset – a 100 ns window – of the full-time
1 ms trace captured in the experiment. The total number of transitions between the two states
in the full voltage-time trace is on the order of 3 × 105. The dashed line indicates the boundary
between the parallel and antiparallel wells. That value is chosen from the local minimum in the
distribution of measured voltages given in Fig. 2.3. The mean of the raw voltage measurement
has been removed.

25


experimental measurement.

As will be shown in Sec. 2.5 and discussed in Section 2.10, this data-driven approach

captures multiple statistical and dynamical metrics with high fidelity. Specifically, it captures the

Kramers-Moyal coefficients, the histogram, the dwell time distributions, and the power spectral

density. The agreement for the dwell time distributions is perhaps the most significant validation

of the model because the experimental input to the model does not directly encode these time

scales. This ability to recover higher-order statistics is necessary so that we can use the model to

design scaled-up circuits in engineering applications.

2.3 Time Trace Statistics

In this section, we describe in more detail the statistical properties that we use to set up

the Langevin model. These properties and others are compared to those of simulations of the

resulting model in Section 2.5.

The first statistical reduction of the voltage-time trace we use is the histogram of the device

state over the entire 1 ms measurement window, see Fig. 2.3. We assume that the system giving

rise to the voltage-time trace is in quasiequilibrium, so the histogram primarily depends on the

effective energy and entropy of the system at each voltage and less on the short-time dynamics

of the system. The binning resolution used for the histogram and other statistical properties

described below along the voltage axis impacts the fit quality and the observed characteristics.

If the bins are too small, the statistical uncertainty prevents good fits; if they are too large, the

details of the dynamics are obscured. The bin size we use for the histogram is a compromise

between these factors at approximately 200 µV. Each bin captures approximately 50 quantized

26


Experiment

Langevin sim.

5 0 5 10
0.00

0.01

0.02

0.03

0.04

0.05

mV

P
ro
ba
bi
lit
y
pe
r2
00

V
bi
n

5 0 5 10

1.0
1.2
1.4
1.6
1.8
2.0

S
im
.
E
xp
.

] ]

]

]

Figure 2.3: Histogram of SMTJ state. The black circles give the experimental resultsfor the entire
voltage-time trace of 160 ms, part of which is shown in Fig. 2.2, using bins 200 µV wide. The blue
rectangles give the results of a simulation, discussed in Section 2.5 using a data-driven Langevin
model. Note the excellent recovery of the experimental histogram’s values in the simulation,
demonstrating the capacity of the first-order Langevin model to capture coarse-grained statistics.
Statistical error bars are smaller than the symbols. The inset plot shows the ratio between the
simulation and experiment histogram values. Note the near unity agreement for the entire do-
main save where both histogram’s values approach zero. The local minimum near −1 mV is
taken as the boundary between the parallel and antiparallel wells. The mean of the raw voltage
measurement has been removed.

voltage signals out of the approximately 212 unique levels reported by the oscilloscope.

Though the histograms we observed above can (together with an assumption of Boltz-

mann statistics) tell us about the effective energy landscape of the system, the detailed short-time

dynamics generated by thermally induced magnetic fluctuations – which are by construction sep-

arate from the conservative forces on the system – remain hidden. To determine their behaviors,

we compute conditional moments 2 of the experimental voltage-time trace. The nth order condi-

2These conditional moments are reminiscent of Kolmogorov’s structure functions in turbulence. However, struc-
ture functions are conditional moments that use a spatial separation and spatial ensemble average to obtain a scalar
value for fluid velocity, whereas the Kramers-Moyal coefficients here are evaluated from a voltage-time trace (which
by the ergodic hypothesis is equivalent to an ensemble average) that integrates solely over the probability distribution.

27


tional moment Mn is

Mn(Φ, τ) = 〈[Φ(t+ τ)− Φ(t)]n〉t , (2.5)

with the angle brackets denoting a time average over an entire trajectory, and Φ(t + τ) − Φ(t) a

τ -delayed difference in the system’s state. We capture these conditional moments with the same

bins as we use for the histogram.

At very small τ , Eq. (2.5) can be used to approximate the nth order Kramers-Moyal coeffi-

cient [22,80,81] of the system. Formally, these Kramers-Moyal coefficients are connected to the

conditional moments as

Dn(Φ) =
1

n!
lim
τ→0

Mn(Φ, τ)

τ
. (2.6)

TheD1 andD2 terms, which we refer to as the drift and diffusion terms, are the specific Kramers-

Moyal coefficients used in our first-order Langevin model and are the only nonzero Kramers-

Moyal coefficients required to describe systems obeying the Fokker-Planck equation [82].

Due to the finite resolution inherent in experimental data, we cannot immediately take the

limit as τ → 0 in our calculation of the Kramers-Moyal coefficients. We also find in practice that

our experimental timestep was not quite small enough to approximate this limit; subsampling our

data to compute the Mn at slightly longer timesteps indicates that we are not in the converged

regime, and strongly so forM2. As the τ increases from zero, however, finite-time corrections can

be used to relate the conditional, time-delayed moments Mn to the underlying Kramers-Moyal

coefficients that are needed to construct an analytic Langevin model [83, 84]. These corrections

28


up to second order in the time delay τ are

M1 = τD1 +
τ 2

2
(D1D

′
1 +D2D

′′
1) +O(τ 3) (2.7)

and M2 = 2τD2 + τ 2
(
D2

1 +D1D2 +D2D
′′
2 + 2D2D

′
1

)
+O(τ 3), (2.8)

with a prime denoting differentiation with respect to Φ, and each Kramers-Moyal coefficient and

time-delayed moment understood to be a function of Φ. Equation (2.7) and Eq. (2.8) describe a

second-order differential system for the Kramers-Moyal coefficients in terms of calculated M1

and M2; these equations cannot be solved analytically in general. Our approach to extracting the

Kramers-Moyal coefficients, which we require to parameterize our Langevin model, is to choose

a parameterized ansatz for D2; that is, we choose some functional form with a number of free

parameters. Combining this ansatz with an analytic fit to the histogram induces an ansatz on D1

via the stationary solution to the Fokker-Planck equation. We then choose the free parameters

(see Eq. (2.12) below) by fitting the right-hand-sides of Eq. (2.8) (using our ansatz on the right-

hand sides of Eq. (2.7) and Eq. (2.8)) to the M2 extracted from our experimentally measured

voltage-time trace. We could fit both M1 and M2 simultaneously, but finding agreement between

M1 and the predictions of the model when only M2 is fit argues for the appropriateness of using

the one-dimensional Fokker-Planck equation.

2.4 One-dimensional Langevin model

As described in the our published work [34], we note that macrospin models fail to capture

qualitative metrics – let alone statistical metrics – associated with the experimental data. Yet

models that capture the statistics of experimental devices will be required for high-fidelity circuit

29


simulations of SMTJ devices as their attendant technological applications scale. To address this

discrepancy, we introduce a first-order Langevin model inspired by existing works on the model-

ing of fluctuating bi-stable processes [85,86]. This approach may not be capable of predicting the

behavior of uncharacterized devices, but we anticipate it could be used to characterize the devices

from a particular manufacturing process and those fit results could be used for circuit simulations

of those devices.

We take a data-driven approach to determining the details of the one-dimensional Langevin

model. The first step is to find an analytic expression for the experimentally determined proba-

bility density ρ0. We require an analytic expression because we will need to take its derivative

to infer the deterministic forces in the system. To guarantee a positive-definite fit, we assume a

Boltzmann distribution and then fit not to the histogram itself but to a dimensionless effective en-

ergyUeff(Φ) defined so that ρ0(Φ) = (1/Z) exp(−Ueff(Φ)), whereZ ensures that
∫
dΦρ0(Φ) = 1.

A judicious choice of basis function is required to capture the distribution with high fidelity; we

scale our data appropriately and then use the Chebvyshev polynomials of the first kind Tn(Φ)

which form an orthonormal, complete basis on [−1, 1] and are each bounded between −1 and

1 over this range 3. Figure 2.4 shows progressively higher order fits to Ueff; for the rest of the

work presented in this chapter, we use the n = 20 fit for the effective energy and the stationary

distribution. For practical applications the choice of the fit order would be a balance between

fidelity to the data and speed of calculation.

The effective energy in Fig. 2.4, derived from the stationary distribution for the experi-

mental data, exhibits several features that are inconsistent with a macrospin model as shown

3Legrendre polynomials were also considered because of their similar properties, but we chose Chebyshev poly-
nomials because they obtained more accurate results with fewer total coefficients.

30


[ [

Figure 2.4: Fit to the effective energy. (a) Residues (normalized l2 norm) between the Chebyshev
polynomial fits to the data as a function of even order n and (b) sample fits—n = 4, 10, 20—
to the experimental effective energy Ueff. Error bars on experimental data indicate statistical
uncertainties due to the number of counts per histogram bin; however, these uncertainties are
smaller than the symbols. The mean of the raw voltage measurement has been removed.

31


in Ref. [34]. These include asymmetry for the interior/exterior sides of the P/AP states, dis-

parate P/AP well widths, and exterior boundaries that are significantly rounded compared to the

macrospin results. We believe that these discrepancies may be related to a common cause: the

way the macrospin model neglects many degrees of freedom that may play an important role in

the dynamics. The asymmetry between the parallel and antiparallel alignments could be due to

asymmetry between the configurations due to the non-uniform fringing fields from the fixed layer

and synthetic antiferromagnet. The tails in the distribution could have several explanations. At

a basic level, it could be that the states with the minimum and maximum resistances are not the

lowest energy states. One may also note that for an open system (the SMTJ) in contact with a

thermal reservoir it is the free energy, rather than internal energy, that is the relevant quantity [87]

for making thermodynamic predictions like the probability distributions. The measured distribu-

tion integrates over the unmeasured degrees of freedom (those that do not lie along the fixed layer

magnetization) so, even if the strictly parallel and strictly antiparallel states were energy minima,

slightly higher energy states may be significantly more prolific so that the peaks in the marginal

distrubtion along the voltage, Φ, may be shifted from the energy minima.

The desire to capture these features of the histograms and other statistical measures are

what motivates us to develop a compact model beyond the macrospin approach. As we have

elaborated on above, our approach is to assume a 1D overdamped Langevin equation driven by

white noise. The simplicity of this model makes our task tractable, and in fact, we will show in

Section 2.5 that the model gives very good agreement with experiment.

To determine the parameters for the Langevin model, we start with the Fokker-Plank equa-

32


tion, which describes the evolution of the probability density ρ(Φ, t) as

∂ρ

∂t
=

∂

∂Φ

[
−D1(Φ)ρ(Φ, t) +

∂

∂Φ
D2(Φ)ρ(Φ, t)

]
, (2.9)

with Φ a generic state variable (voltage, in the present work), t the time, D1(Φ, t) the drift, and

D2(Φ, t) the diffusion. The terms D1(Φ, t) and D2(Φ, t) are precisely the first- and second-order

Kramers-Moyal coefficients from Eq. (2.6). The Kramers-Moyal coefficients may be functions

of time in general, but the measured probability distribution ρ0 is stationary on the timescale of

the experiment and thus we also assume Dn(Φ, t) are not functions of time. In the steady-state

limit ∂ρ/∂t = 0 we have

D1(Φ) = D′2(Φ)−D2(Φ)
dUeff(Φ)

dΦ
, (2.10)

so that D1 is uniquely determined given the diffusion coefficient and the steady-state distribution

of the system. In passing from Eq. (2.9) to Eq. (2.10), we used the fact that the right-hand side

of Eq. (2.9) is the divergence of the probability current J . In the steady state limit, ∂ΦJ = 0, and

thus the stationary current J0 is constant. For the system to be physically bounded, the constant

value of J0 must in fact be zero, which allows us to uniquely solve for D1(Φ) as a function of

D2(Φ) and ρ0(Φ).

In a system with additive noise – that is, D′2(Φ) = 0 – the drift term would be determined

through the derivative of the effective energy, in which case D1 coincides with a typical conser-

vative force, up to prefactors. A system with multiplicative noise – where D′2(Φ) 6= 0 – exhibits

mixing between the stochastic terms and the deterministic part; here the nonzero diffusion gradi-

33


ent introduces a stochastic drift term.

The Langevin equation derived from the Fokker-Plank equation is

Φ̇ = f(Φ) + g(Φ) ηΦ(t) (2.11)

with Φ the voltage state, f(Φ) the deterministic drift term, g(Φ) the stochastic diffusion term, and

ηΦ is white noise with 〈ηΦ(t)ηΦ(t′)〉t = 2δ(t′− t) and 〈ηΦ(t)〉t = 0. We interpret this equation in

the Stratonovich sense [29], as one would for the typical construction of the sLLG equation. In

the Stratonovich interpretation, the Langevin equation is related to the Fokker-Planck equation’s

Kramers-Moyal coefficients by f(Φ) = D1(Φ) − D′2(Φ)/2 and g(Φ) =
√
D2(Φ). The second

term on the right-hand side of f(Φ) comes from gradients in the diffusion which give rise to an

effective drift term independent of the usual drift coefficient D1(Φ).

The first step in determining a Langevin model that can reproduce aspects of the experi-

mental voltage-time trace in Fig. 2.2 is to compute the conditional moments in Eq. (2.5). These

moments are shown in Fig. 2.5. With these experimental conditional moments, we determine the

Kramers-Moyal coefficients by making an ansatz D̃2 for the diffusion part of the Fokker-Planck

equation, which when combined with the stationary distribution is used to obtain an induced

form for D1 via Eq. (2.10). Measurements of M2, shown in Fig. 2.5, inform our choice for D̃2.

The observation that the data show larger diffusion in the AP well than in the P well leads us to

propose

D̃2(Φ;µ) = mΦ + b, (2.12)

with µ = (m, b) the fit parameters. In Section 2.8 we show that fitting with a constant D̃2

34


also provides an adequate fit, and introducing more fitting parameters can give an even better

fit. To ensure that D2 is positive definite we take D2 = λ log(1 + exp(D̃2)/λ), where we take

λ = 1 V2/s, which does not significantly affect D2 > 0. Combining this ansatz for D2 with the

analytical expressions for D1 and the stationary distribution ρ0 completely specifies the model.

In the next section, we use this model to run stochastic Langevin equation simulations, that is,

numerical integrations of Eq. (2.11).

We fit the parameters of the ansatz in Eq. (2.12) following the procedure outlined in Sec-

tion 2.4. The fit parameters in Eq. (2.12) are determined by first computing an analytic form

for D1 from the ansatz for D2, the fit to the histogram, and Eq. (2.10). Then an analytic form

for M2 is determined from Eq. (2.8). The parameters of this last expression are adjusted to fit

the experimental data. The agreement between the model and the experimental data is shown

in Fig. 2.5. The error in the drift and diffusion coefficients were calculated from the standard

deviation σi/
√
Ni of the underlying δΦn(Φ) distribution as a function for the binned Φi levels

shown in the figure, where σ is the standard deviation and Ni is the number of counts within the

ith Φi level. The binning level used for Φi here was approximately 295 µV to have exactly 64

total bins over the sample. The maximum error shown in the plots is observed on the exterior

well boundaries where the number ofNi counts is on the order of hundreds of data points, instead

of millions nearer the well’s interior.

Both drift and diffusion coefficients (that is, the induced ansatzes for Mn), agree well with

the conditional moments extracted from experiment except at the extreme voltage values where

the statistical certainty is poor. Figure 2.5 shows the functional forms determined for the model.

The model is constructed so that the M2/2τ is fit to the experimental data; the consequent agree-

ment between M1/τ and the experimental data speaks to the appropriateness of the underlying

35


] ]

]

]

]

]

Figure 2.5: Time-delayed drift coefficient (a) and diffusion coefficient (b), showing experiment
(black circles), Langevin simulation (blue squares), computed at a sampling time τsamp = 50 ps
equivalent to the experimental measurement time, and the analytic, fitted results that were inserted
into the model, Dn (solid green line) and Mn/nτ (dashed red line), where n = 1 for the drift
term and n = 2 for the diffusion term. Error bars on the experimental indicate single standard
deviation uncertainties in the mean. They are smaller than the plot symbols except at the extreme
voltages.

Fokker-Planck model as a description of this system.

2.5 Results

Our ultimate goal is to develop a stochastic differential equation of motion for SMTJs for

use in circuit simulators. Though our current model is restricted to a particular biasing condition,

we can still explore how well our model matches the device. We simulate the Langevin equation

(Eq. (2.11)) and compare simulation results both to the experimental data used to fit the model

and to selected statistical metrics of the experimental data.

36


After obtaining the Kramers-Moyal coefficients D1 and D2, we insert them into Eq. (2.11).

We set the integration timestep to a fifth of the experimental sampling rate, dt = τexp/5 = 10 ps.

Integration is performed using the Euler-Maruyama method. Simulation histogram results are

compared to the experimental histogram in Fig. 2.3, confirming that the first-order, data-driven

Langevin model captures this aspect of the experiment well. The histogram displays asymmetri-

cal energy-well probabilities and widths, greater-than-exponential decay for the probability den-

sity at exterior data boundaries, and a large energy barrier between the two well peaks. We assert

that the reproduction of statistical features such as these is key to developing an engineering-

appropriate model for the device. For purely analog circuits in particular—such as those pro-

posed in Refs. [37] or [88]—correct modeling of these distributions may be crucial for capturing

emergent statistics of scaled-up circuitry based on superparamagnetic tunnel junctions.

Comparing the conditional moment extracted from experiment and simulation demon-

strates that the first-order Langevin model gives dynamics similar to those seen in experiment.

Figure 2.5 confirms excellent agreement for M1(Φ, τ) and M2(Φ, τ) calculated for a time delay

equal to the experimental sampling time (τsamp = τexp = 50 ps).

The statistics that have received perhaps the most attention in the literature are the mean

dwell times of the parallel and antiparallel states. These are captured directly by construction

in Néel-Brown models, and macrospin models are generally fit to ensure these dwell times are

accurate [75]. In circuits and systems where the SMTJ state is binarized into a two-level tele-

graph signal, the mean dwell times together with the assumption that the switching events are

independent of the past history – the Markov property – completely determine the circuit output.

With a voltage threshold chosen to demark the boundary between the P and AP states, the dwell

times imbue the data we have already captured in the histogram with a characteristic time scale.

37


Crucially, this timescale information is carried into the model through our fitted ansatz on D2.

For the purposes of this work we consider a transition between the P and AP states to occur when

the voltage crosses the barrier location determined by the local maximum of the energy landscape

in Fig. 2.4. We define a dwell time as the amount of time the system spends in the positive or

negative half of state space before crossing that ∂ΦUeff = 0 threshold into the other half of state

space.

The main panels in Fig. 2.6 show simulation and experimental dwell times for a sampling

rate of τsamp = τexp on log-log and semi-log plots for the two distinct P and AP states. These

figures show that the simulation captures the full distribution of dwell times for both states. This

agreement for the dwell time distributions is perhaps the most significant validation of the model

because the experimental input to the model does not directly encode these time scales. The

model uses just the short-time dynamics captured by the Kramers-Moyal coefficients and the

stationary limit of the time-averaged time series in the form of the histogram.

One unusual feature of both experimental and simulated distributions is the crossover from

the familiar exponential behavior to ∼ t−3/2 power-law behavior at short times. We attribute

this behavior to fluctuations around the threshold we defined between the states. Recall that we

chose the threshold between states to correspond to the local maximum of the effective energy

landscape; linearizing about this threshold therefore reveals a locally flat energy landscape where

the dynamics are driven entirely by stochastic fluctuations. In other words, the dynamics in this

small neighborhood around the threshold are given by a random walk. It is well-known that

the dwell time (return time) distribution of a random walk is proportional to
√
τ/t3, where τ is

the timestep of the walk [89]. As τ decreases, more and more probability mass accumulates at

smaller and smaller t, dominating the dwell time distribution. This suggests that the transition

38


between the power-law and exponentially distributed dwell times is controlled by the curvature

of the energy barrier: the flatter the energy near the threshold, the longer the system can behave

as an unbiased random walk.

In the physical SMTJ system, we expect some physical mechanism to impose an ultraviolet

cutoff—but whatever this cutoff may be clearly resides at a higher frequency than our experiment

can access. Since this power-law behavior is simply a property of stochastic dynamics around a

threshold, we expect it to be found in other stochastic magnetic systems as well. In our published

work [34], we found that the behavior is also reproduced in the macrospin model, but note that

it is impossible to capture in a simple Néel-Brown model, which is based on a two-state Markov

model that assumes exponential distributions of dwell times; such a system has no notion of

dynamics in the barrier, which is the essential cause of the power-law effect.

Traditionally, mean dwell times are extracted from experimental dwell time histograms by

examining the slope of the exponential dwell time distributions on a semi-log scale. Because

of the power-law behavior at small times in our data, these inverse slopes—which we will call

characteristic dwell times—are clearly very different than the literal mean dwell times. Yet,

merely isolating the slope of the curve where it does appear exponential in Fig. 2.6 does not lead

to a physically invariant characteristic dwell time; this slope can change significantly under minor

subsampling of the signal, because this changes the probability-dominating power-law part. We

elaborate more on the topic of characteristic dwell times in Section 2.9.

Finally, we compare the power spectral densities of the experimental signal to those pro-

duced by the model. Accurate power spectra are crucial from an electrical engineering context.

Unlike the diffusion characteristics and histograms, power spectra are known to exhibit the cor-

rect qualitative (Lorentzian) form in macrospin models [67] and in Néel-Brown models [90].

39


(a)

(b)

]

]

]

]

]]

Figure 2.6: Dwell time distributions as extracted from experiment (black) and simulation (blue)
in (a) a log-log plot (a) and (b) a semi-log plot. The log-log plot uses a sampling time bin size of
100 ps, while the semi-log plot uses a sampling time bin size of 1 ns. The experimental data and
simulation both have on the order of 300,000 transitions. The straight-line behavior of both the
simulation and experiment for short times in the upper panel indicates a power-law distribution
and for long times in the lower panel an exponential distribution.

40


2.6 Kramers-Moyal Coefficient Visualization

The Kramers-Moyal coefficients introduced in Section 2.3 are instrumental to our data-

driven modeling. These coefficients encode short time information into the model allowing the

capturing of both background thermal noise driving SMTJ fluctuations and the deterministic

restoring force of the double well potential. Figure 2.7 shows the probability density distribution

of the explicit δΦn/n!τ calculated from the experimental data and cuts across the distributions at

specified voltage levels where the raw count is the vertical axis.

Out[ ] =

-5 0 5 10
-100

-50

0

50

100

10.-6

0.00001

0.00010

0.00100

Out[ ] =

-5 0 5 10

0

50

100

150

200

250

10.-7

10.-6

0.00001

0.00010

0.00100

0.01000

-50 0 50
1

10

100

1000

104

105

●

■

◆

▲

▼

○

0 50 100 150
1

10

100

1000

104
105
106

(a) (b)

(c) (d)

Figure 2.7: Drift and diffusion Kramers-Moyal coefficient probability density distributions and
cuts across the distribution at specified voltage levels. Panel (a) is the drift term’s probability
per bin size 0.916 V2/ms (299 µV bin size for voltage and 3.06 mV/ns bin size for δΦ/τ ). The
colorbar shows the a logarithmic scale for the probability density. Panel (b) is a semi-log plot
showing 6 voltage cuts across the probability distribution for δΦ/τ . The width of the distribution
increases as voltage sweeps from positive to negative, and also drifts from positive δΦ/τ to
negative, corresponding to the double well potential. Panel (c) is the diffusion term’s probability
per bin size 1.14 V3/s (299 µV bin size for voltage and 3.81 mV2/ns bin size for δΦ/2τ ). Panel
(d) is a semi-log plot showing 6 voltage cuts across the probability distribution for δΦ2/τ .

41


The skewness of the Fig. 2.7(a) gives rise to the deterministic drift term D1, while width

of that distribution (evaluated explicitly as the average of the square of δΦ) at every voltage Φ

gives rise to the stochastic/noise term D2. In Fig. 2.7(b) and Fig. 2.7(d), values that fall below 1

correspond to bins that were not measured in the experimental data.

2.7 Memory

The overdamped, first-order Langevin model used throughout this chapter assumes the

underlying dynamics are Markovian. To test the validity of this assumption in describing the

experimental data, we construct a metric capturing the inertia of observed trajectories. For each

time step t collected in experiment, we compute the probability of the system to increase (go up

in voltage) or decrease (go down in voltage) at time t + τ conditioned on whether it increased

or decreased from time t − τ to t. From these observed quantities we compute four distinct

probabilities: up after up P↑↑; down after down P↓↓; up after down P↑↓; and down after up P↓↑;

in these expressions, the left arrow shows the second change and the right arrow shows the first

change. The sum of P↑↑ and P↓↑ equals one since the system must either increase or decrease in

value. Likewise the sum of P↓↓ and P↑↓ also equals one. If the system is Markovian, we expect

probabilities that have the same second-step direction will be independent of the direction of

their previous steps, and only dependent on the current system state (i.e. P↑↓(Φ) = P↑↑(Φ) and

P↓↓(Φ) = P↓↑(Φ)). Figure 2.8 shows the probability for each possible Pij plotted as a function of

measured voltage Φ. As expected, the simulation transition probability P↑↓ = P↓↓ and P↑↑ = P↓↑

in Fig. 2.8(b), verifying the memory metric capturing Markovian behavior. On the other hand,

the experimental data shown in Fig. 2.8(a) does not have equal transition probabilities, indicating

42


● ■

◆ ▲

-5 0 5 10
0.0

0.2

0.4

0.6

0.8

1.0 (a) (b)

5 0 5 10

Figure 2.8: Voltage transition probability Pij(Φ) shown for experiment in panel (a) and simula-
tion panel (b). The sum at a particular voltage Φ of all the transition probabilities is always 2
because the voltage moves twice. The time step τ was taken as the experimental measurement
time of 50 ps.

non-Markovian behavior.

To assess the effects of memory and inertia in the experiment and simulation, we para-

metrically plot (P↑↓, P↑↑) through a series of voltage bins for the experimental and simulated

voltage-time traces in Fig. 2.9. The experimental data exhibits non-Markovian behavior—most

strongly so in the barrier crossing region between −5 mV and 5 mV—that cannot be captured

with our first-order (memoryless) Langevin model. Inclusion of memory and higher-order differ-

ential terms would seem a necessary next step to capture higher fidelity models. We discuss this

further in Section 2.10.

2.8 Other parameterizations of D̃2

The ansatz chosen in Section 2.4 only has two fit parameters that can be optimized to fit

the experiment’s calculatedM2/2τ Kramers-Moyal coefficient. This section describes the results

using a higher and lower number of fitting parameters for the D̃2 ansatz.

The higher order fit differs from the linear fit by including a Gaussian that seeks to capture

43


0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0
0.0

0.2

0.4

0.6

0.8

1.0 (a) (b)

Figure 2.9: Voltage transition probability from Fig. 2.8 plotted parametrically as a function of
voltage Φ. Panel (a) shows the experiment in black circle vs Langevin simulation in blue hollow
squares measured at the experimental acquisition frequency of 20 GHz (τsamp = 50 ps) with a
equiprobability trendline shown in dashed grey. Panel (b) shows the experimental data subsam-
pled at τsamp ∈ (50, 250, 500) ps. The black circles are the reproduced experimental data from
panel (a). Note that for higher subsampled times how the parametric probability line approaches
the dashed, grey equiprobability line.

the difference between the Langevin model’s calculated M2/2τ and the experiment’s calculated

M2/2τ seen in Fig. 2.5. The analytic form for this fit D̃2(Φ;µ) = mΦ + b + a e(Φ−µ0)/2σ2 . Note

we ensure that D2 remains positive, as discussed in the main text. The lower order fit assumes a

constant D2 → D2(Φ;µ) = b.

Figure 2.10 compares the calculated M2/2τ coefficients from experiment and simulations

using the Gaussian plus linear fit, the linear fit, and the constant fit for D̃2. With progressively

higher-order fits, the simulation fidelity increases, indicating that the actual noise term in the

SMTJ device has a more complicated structure than these simple analytic models. However,

model fidelity to the empirical data is remarkable and demonstrates the flexibility and robustness

of our data-driven method to capture gross statistics of the STMJ device.

In addition to comparing calculated Kramers-Moyal coefficients, we also compare the

dwell time distributions of each model. Figure 2.11 shows the dwell time distributions in the

44


]

]

]

]

]]

Figure 2.10: Experimental and Langevin simulation results for the time-delayed, drift panel (a)
and diffusion panel (b) coefficients computed at a sampling time τsamp = 50 ps equivalent to the
experimental measurement time. The various lines in the figures are as follows: black circles
show experimental data; blue squares show first-order, data-driven Langevin simulations with a
Gaussian plus linear D̃2; yellow circles show first-order, data-driven Langevin simulations with a
linear D̃2 as reproduced from the main chapter body; green triangles show first-order, data-driven
Langevin simulations with a constant D̃2.

45


]

]

]

]

] ]

Figure 2.11: Log-log panel (a) and semi-log panel (b) dwell time distribution for the P state τp
as calculated in experiment and produced by simulation: black circles show experimental data;
the blue, dashed line shows the first-order, data-driven Langevin simulations with a Gaussian
plus linear D̃2; the yellow, dotted line shows the first-order, data-driven Langevin simulations
with a linear D̃2 as reproduced from the main chapter body; the green, dot-dashed line shows
the first-order, data-driven Langevin simulations with a constant D̃2. The line-like behavior of
the simulation and experiment in the upper plot at short times indicated a power-law dwell time
distribution in that regime. In the lower plot, the line-like behavior of the simulation and experi-
ment for longer times indicates an exponential dwell time distribution in that regime. The same
sampling time τsamp is used as in Fig. 2.6, but the overall dwell time distribution is only measured
over the first 2,000,000 measurement points of the experiment.

46


0.1 0.5 1 5 10
0

2

4

6

8

10

12

14

ns

samp ns

]

]
]]

Figure 2.12: Mean dwell times of experiment and simulation as a function of sampling time.

P state for the experiment and simulations measured using 2× 106 points. This number of points

is fewer than used in Fig. 2.6, which is why there is a difference between the experimental curves

in the two figures. This was done to ensure magnitude agreement in the dwell time distribution

between the experiment and simulations, which were limited in total length because of compu-

tational expense. The dwell time distribution is unchanged for these different assumed fits for

D2.

2.9 Characteristic dwell times

In the discussion of Section 2.5, we described how the short-time power law behavior of

the dwell time histograms in experiment to its fitted simulations [Fig. 2.6] can be attributed to the

scale-free, fractal-like nature of random walks around the switching threshold at high sampling

frequencies. They are not mere artifacts of the experiment, but additional structure that is usually

neglected in dwell time analysis of magnetic devices, and structure that we expect to similarly

arise in any magnetic nanodevice.

Calculating the mean dwell times from the entire distributions gives very small dwell times

47


dominated by these short-time power-law behaviors. Figure 2.12 shows good agreement between

the mean dwell times 〈τ〉 extracted from the simulation and from the experiment. Even when

subsampled, the simulation and experimental average dwell times agree moderately well. Recall

that D2, which is fit to M2 via finite-time corrections, holds all this timescale information in the

model. The mean dwell times increases as the subsampling time increases because more and

more short-time events are eliminated from the distribution.

Generally speaking, one expects that if a dwell time is relevant for a particular application,

it is likely the characteristic dwell time of the exponential distribution that is most meaningful.

If we ignore the power-law behavior of the very short dwell times, we can extract these from

the slope of the exponential in the semi-log plot, panel (b) in Fig. 2.6. The resulting fits give

τP,exp = 7.10 ns; τAP,exp = 11.5 ns; τP,sim = 5.95 ns; τAP,sim = 9.51 ns. The one standard

deviation ranges for these slopes are τP,exp ± σ = [6.82, 7.39] ns; τAP,exp ± σ = [11.23, 11.85] ns;

τP,sim± σ = [5.79, 6.12] ns; and τAP,sim± σ = [9.33, 9.69] ns. The agreement is good even though

both simulation dwell time slopes slightly underestimate the corresponding experimental values.

It is important to note that different fitting schemes will give different results that lie out-

side these ranges, which are strictly determined by the statistics of the points that have been fit.

Indeed, it is not clear that these extracted times represent anything of experimental relevance. By

a fine-tuned subsampling or RC-filtering of the data, we can artificially flatten the power law be-

havior so that the resulting distribution is exponential. Using this procedure to obtain a plausibly

exponential distribution gives τP,sim ≈ 25 ns and τAP,sim ≈ 30 ns. Going back to Fig. 2.2 quali-

tatively illustrates this effect. Keeping all short-time events gives on the order of 35 crossings of

the zero line, which would correspond to mean dwell times of approximately 3 ns. On the other

hand, if the rapid fluctuations were filtered out, there would be approximately six transitions for

48


a mean dwell time of about 17 ns.

2.10 Discussion

While the sLLG equation for a macropsin is physically motivated and contains terms di-

rectly identifiable with distinct physics, it cannot capture certain statistical metrics associated

with our experimental device. This is partly due to the sLLG model’s drift and diffusion coeffi-

cients; fitting the former to the observed drift would require including sufficiently “data-driven”

terms as to lose the simple plausibility of the model, while directly fitting the diffusion is simply

impossible in general. Indeed, even single-domain MTJs may show diffusion properties that do

not generally agree with the macrospin. Theory indicates that the Gilbert damping (an overall

prefactor for D2) in common magnetic materials has strong directional dependence [91], while

recent experiments on SMTJs switching suggests that the Suhl instability may create dynamically

variable damping mechanisms during single-domain reversal [92]. These considerations suggest

that we must move beyond the simple macrospin model to accurately capture SMTJ physics in

compact models in general, and the data-driven approach explored in this work is our initial

attempt at this enterprise.

We show that our data-driven approach captures multiple statistical and dynamical metrics

with high fidelity in an experimentall