ABSTRACT

Title of Dissertation: MACHINE LEARNING FOR PREDICTING
NON-STATIONARY DYNAMICAL SYSTEMS,
AND GLOBAL SUBSEASONAL-TO-SEASONAL
WEATHER PHENOMENA

Dhruvit Paresh Patel
Doctor of Philosophy, 2024

Dissertation Directed by: Professor Edward Ott
Department of Physics

In this thesis, we are interested in modeling (1) the long-term statistical behavior of non-

stationary dynamical systems, and (2) global weather patterns on the Subseasonal-to-Seasonal

(S2S) time scale (2 weeks - 6 months). The first part of this thesis is primarily concerned with the

situation where we have available to us measured time series data of the past states of the system

of interest, and in some cases, a (perhaps inaccurate) scientific-knowledge-based model of the

system. The central problem here lies in predicting a future behavior of the system that may be

fundamentally different than that observed in the measured time series of its past. We develop

machine learning -based methods for accomplishing this task and test it in various challenging

scenarios (e.g., predicting future abrupt changes in dynamics mediated by bifurcations encoun-

tered that were not included in the training data). We also investigate the effects of dynamical

noise in the training data on the predictability of such systems. For the second part of this thesis,

we modify a machine learning -based global climate model to predict various weather phenom-


ena on the S2S time scale. The model is a hybrid between a purely data-driven machine learning

component and an atmospheric general circulation model.

We begin by formulating a purely machine learning approach that utilizes the measured

time series of the past states of the target non-stationary system, as well as knowledge of the

time-dependence of the non-stationarity inducing time-dependent system parameter. We demon-

strate that this method can enable the prediction of the future behavior of the non-stationary

system even in situations where the future behavior is qualitatively and quantitatively different

from the behavior in the training data. For situations where the training data contains dynamical

noise, we develop a scheme to enable the trained machine learning model to predict trajecto-

ries which mimic the effects of dynamical noise on typical trajectories of the target system. We

test our methods on the discrete time logistic map, the continuous time Lorenz system, and the

spatiotemporal Kuramoto-Sivashinsky system, and for a variety of non-stationary scenarios.

Next, we study the ability of our approach to not only extrapolate to previously unseen dy-

namics, but also to regions previously unexplored by the training data of the target system’s state

space. We find that while machine learning models can exhibit some capabilities to extrapolate

in state space, they fail quickly as the amount of extrapolation required increases (as expected

of any purely data-driven extrapolation method). We explore ways in which such failures can be

mitigated. For instance, we show that a hybrid model which combines machine learning with

a knowledge-based component can provide substantial improvements in extrapolation. We test

our methods on the Ikeda map, the Lorenz system, and the Kuramoto-Sivashinsky system, under

challenging scenarios (e.g., predicting future hysteretic transitions in dynamical behavior).

Finally, we modify a machine learning -based hybrid global climate model to forecast

global weather patterns on the S2S time scale. Predicting on this time scale is crucial for many


domains (e.g., land, water, and energy management), yet it remains a difficult period to obtain

useful predictions for. We demonstrate that our model has useful skill in predicting a number

of phenomena including global precipitation anomalies, the El Nino Southern Oscillation and its

related teleconnections, and various equatorial waves.


MACHINE LEARNING FOR PREDICTING NON-STATIONARY
DYNAMICAL SYSTEMS,

AND GLOBAL SUBSEASONAL-TO-SEASONAL WEATHER
PHENOMENA

by

Dhruvit Paresh Patel

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2024

Advisory Committee:
Professor Edward Ott, Chair/Advisor
Professor Brian Hunt
Professor Michelle Girvan
Professor Rajarshi Roy
Professor Thomas M. Antonsen, Jr.


© Copyright by
Dhruvit Paresh Patel

2024


Foreword

This dissertation contains work that was co-authored with faculty members and colleagues

at the University of Maryland and at other institutions. The author of this thesis made substantial

contributions to the relevant aspects of the jointly authored work included in the dissertation.

ii


Dedication

To Shanta. For everything.

iii


Acknowledgments

I have met and interacted with countless people, and many of these interactions have helped

me, directly or indirectly, to arrive at this milestone. I would like to deeply thank all such people.

First and foremost, I would like to thank my advisor Professor Edward Ott for guiding me

through this research journey consisting of many challenging problems, and helping instill in me

foundational research skills that I will carry with me from here onwards. He has helped shape

my understanding of how to conduct, manage, and present research in fundamental ways. He

has always been sensitive to my needs as a student, a researcher, and a person that has a life and

commitments outside of academics. I sincerely wish for every young researcher to have a mentor

as competent and understanding as him.

Next, I would like to thank Professor Brian Hunt and Professor Michelle Girvan for always

providing useful and meaningful inputs and feedback for my work. Their insights have often

encouraged me to reformulate my own understanding of core research concepts in very beneficial

ways. I have learned a lot from them and I hope I was of, at least some, utility to them in return.

My many thanks to them and to Professors Rajarshi Roy and Thomas Antonsen for spending

their valuable time and effort in reviewing this dissertation manuscript and in being on my thesis

defense committee.

I have had the honor and pleasure of collaborating with many knowledgeable researchers.

I would like to give special thanks to Alex Wikner for the many helpful discussions and for

iv


introducing me to our mutual advisor, Professor Ott. I owe many thanks to Professor Istvan

Szunyogh at Texas A & M University and Troy Arcomano, work with whom has helped me

shape my future research trajectory. In addition, I would also like to thank Andrew Pomerance,

Declan Norton, Ravi Chepuri, and Shan Leng for their helpful suggestions and comments on this

research.

Many thanks are due to Josiland Chambers and Nancy Boone for helping me to seamlessly

navigate the requirements of the graduate program and for facilitating so many of the processes

that were a necessary part of this journey.

I would also like to thank Professor Ki-Yong Kim for being my mentor in the earlier part

of this journey; if it weren’t for him, I would likely not be here at the University of Maryland.

I would also like to thank Scott Hancock, Robert Schwartz, and Daniel Woodbury for dealing

so patiently with my inexperience and mentoring me in the ways of experimental physics. Any

words I can think of to thank them would fall short of describing the gratitude I feel towards

them.

This journey was made so much more feasible, enjoyable, meaningful, and possible thanks

to the deep bonds of friendship I made along the way with incredibly smart and talented peers:

Adam, Alex, Andrew, Dawid and Aneta, Jake, Jared, Martin, Matt, Nick, Stefano, and Zach. I

want to especially thank my friends from home from whom I have not only acquired so much

comfort and support during this program, but with whom I have grown up and experienced the

different stages of life so far: Ankeet, Eshil, John, Mithulesh, Nikhil, Nisarg, Panth, Priyan,

Rohan, Sahil, Shivam, and Swapneel. You guys have been and continue to be an immense source

of joy in my life.

This section would be entirely incomplete, and this thesis utterly worthless to me, without

v


acknowledging my family: Shanta, Baa, Pappa, Mummi, Sneha, and Jatin. I will not attempt to

express my gratitude to you all in words, because I’m afraid that any attempt at that would do

immeasurable injustice to how much I owe you.

vi


Table of Contents

Foreword ii

Dedication iii

Acknowledgements iv

Table of Contents vii

List of Tables x

List of Figures xiii

List of Abbreviations xxi

Chapter 1: Introduction and Background 1
1.1 The “What” and the “Why” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 A Short History of Numerical Climate (Change) Modeling. . . . . . . . . . . . . 3
1.3 Challenges Faced by Conventional Climate Models . . . . . . . . . . . . . . . . 6
1.4 An Alternative Approach: Machine Learning . . . . . . . . . . . . . . . . . . . 7
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5.1 Machine Learning for Modeling Non-Stationary Dynamical Systems . . . 8
1.5.2 How Well Can Machine Learning Models Extrapolate Dynamics? . . . . 9
1.5.3 A Hybrid Machine Learning -based Model for Subseasonal-to-Seasonal

Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Chapter 2: Using Machine Learning To Predict Statistical Properties of Non-Stationary
Dynamical Processes 11

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Review of Reservoir Computing (RC) and Its Use for Prediction of Stationary

Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 The Non-Stationary Logistic Map . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Application of RC to Climate Prediction of the Non-Stationary Logistic Map . . . 25

2.4.1 The Noiseless Non-stationary Logistic Map . . . . . . . . . . . . . . . . 25
2.4.2 The Noisy Non-stationary Logistic Map . . . . . . . . . . . . . . . . . . 35

2.5 Application of RC to Predicting the Lorenz 63 System . . . . . . . . . . . . . . . 38
2.6 Application of RC to Predicting the Low-Dimensional Dynamics of the Kuramoto-

Sivashinsky System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

vii


2.7 Application of RC to Predicting the High-Dimensional Dynamics of the Kuramoto-
Sivashinsky System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.10 Data Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Chapter 3: Using Machine Learning to Anticipate Tipping Points and Extrapolate to
Post-Tipping Dynamics of Non-Stationary Dynamical Systems 64

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Reservoir Computing Background . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.2.1 Setup and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.2 Choosing Hyperparameters for Prediction of Non-Stationary Systems . . 73

3.3 Predicting the Tipping Point and Post-Tipping-Point Dynamics in Non-Stationary
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3.1 Predicting tipping in the Lorenz System . . . . . . . . . . . . . . . . . . 78
3.3.2 Predicting tipping points and evolution of extreme event frequency in the

non-stationary Ikeda Map . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.3.3 Predicting tipping in the Kuramoto-Sivashinsky equations . . . . . . . . 91
3.3.4 Hysteretic tipping in the Lorenz System: Hybrid ML/knowledge-based

prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.5 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.6 Data Availability Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.7 Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.7.1 Choosing appropriate hyperparameters for non-stationary systems . . . . 106
3.7.2 Reservoir Computer Setup for the Ikeda Map Example, Sec. 3.3.2 . . . . 109
3.7.3 Hybrid Reservoir Computer Setup and training for the Lorenz system,

Sec. 3.3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.7.4 Hyperparameters used for the examples in the main text . . . . . . . . . . 111

Chapter 4: Exploring the Potential of a Hybrid Model That Combines Machine Learn-
ing With Physics-Based Modeling for Subseasonal-to-Seasonal (S2S) Pre-
diction. 121

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

4.2.1 The Global and Local State Vectors . . . . . . . . . . . . . . . . . . . . 124
4.2.2 ML Model Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.2.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.2.4 Synchronization and Prediction . . . . . . . . . . . . . . . . . . . . . . . 130
4.2.5 Experimental Design and Choice of Model Hyperparameters . . . . . . . 131

4.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

4.4.1 Forecast Verification Metrics . . . . . . . . . . . . . . . . . . . . . . . . 136
4.4.2 El Nino-Southern Oscillation (ENSO) and its Teleconnections . . . . . . 137
4.4.3 Global Precipitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

viii


4.4.4 Equatorial Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.5 Conclusions and Outlooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.6 Data Availability Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Chapter 5: Concluding Remarks 149
A.1 Analysis of the Noise-Injection Scheme . . . . . . . . . . . . . . . . . . . . . . 154
A.2 Comment on the numerical solution of Eqs. (3.6 a-c) with dynamical noise . . . . 156

Bibliography 157

ix


List of Tables

2.1 Hyperparameters for predicting the logistic map. . . . . . . . . . . . . . . . . . . 27
2.2 Hyperparameters for predicting the noiseless Lorenz system. . . . . . . . . . . . 42
2.3 Hyperparameters for predicting the noisy Lorenz system. . . . . . . . . . . . . . 42
2.4 Hyperparameters for predicting the noiseless Kuramoto-Sivashinsky system. . . . 49
2.5 Hyperparameters for predicting the noisy Kuramoto-Sivashinsky system. . . . . . 52
2.6 Hyperparameters for predicting the noiseless high-dimensional Kuramoto-Sivashinsky

system dynamics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.7 Hyperparameters for predicting the noisy high-dimensional Kuramoto-Sivashinsky

system dynamics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.1 Hyperparameters for predicting the noiseless non-stationary Lorenz system in
Fig. 3.3. The spectral radius was searched on the grid [0.50, 0.60, 0.70, 0.80, 0.90, 1.00].
The input coupling strength was searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00].
The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir ac-
tivation bias was searched on the grid [0.00, 1.00]. The Tikhonov regulariza-
tion parameter was searched over a logarithmically spaced interval of 10 grid
points between 1 × 10−13 and 1 × 10−8. The slope of the linear control signal
was searched over a logarithmically spaced interval of 10 grid points between
1×10−6 and 1×10−4. The observational noise strength was searched on the grid
[1× 10−5, 5× 10−5, 1× 10−4, 5× 10−4]. . . . . . . . . . . . . . . . . . . . . . 112

3.2 Hyperparameters for predicting the noiseless non-stationary Lorenz system (in
Fig. 3.5) with an insufficient parameter-sweep. The spectral radius and input
coupling strength were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The
leakage parameter was searched on the grid [0.50, 1.00]. The reservoir activation
bias was searched on the grid [0.00, 1.00]. The Tikhonov regularization parameter
was searched over a logarithmically spaced interval of 10 grid points between
1×10−13 and 1×10−8. The slope of the linear control signal was searched over a
logarithmically spaced interval of 10 grid points between 1× 10−6 and 1× 10−4.
The observational noise strength was searched on the grid [1×10−5, 5×10−5, 1×
10−4, 5× 10−4]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

x


3.3 Hyperparameters for predicting the noisy non-stationary Lorenz system (in Fig.
3.5) with an insufficient parameter-sweep. The spectral radius and input cou-
pling strength were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leak-
age parameter was searched on the grid [0.50, 1.00]. The reservoir activation bias
was searched on the grid [0.00, 1.00]. The Tikhonov regularization parameter
was searched over a logarithmically spaced interval of 10 grid points between
1× 10−13 and 1× 10−8. The slope of the linear control signal was searched over
a logarithmically spaced interval of 10 grid points between 1× 10−6 and 1× 10−4.114

3.4 Hyperparameters for predicting the Ikeda map. The spectral radius and input
coupling strength of the parameter input to the reservoir were searched on the
grid [0.10, 0.25, 0.50, 0.75, 1.00], and the input coupling strength of the dynamical
variable input to the reservoir was searched on the grid [0.10, 0.25, 0.50, ..., 2.00, 2.25].
The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir ac-
tivation bias was searched on the grid [0.00, 1.00]. The Tikhonov regulariza-
tion parameter was searched over a logarithmically spaced interval of 10 grid
points between 1 × 10−11 and 1 × 10−6. The slope of the linear control signal
was searched over a logarithmically spaced interval of 10 grid points between
1×10−7 and 1×10−5. The observational noise strength was searched on the grid
[1× 10−5, 5× 10−5, ..., 1× 10−2, 5× 10−2]. . . . . . . . . . . . . . . . . . . . . 115

3.5 Hyperparameters for predicting the Kuramoto-Sivashinsky equation. The spectral
radius and input coupling strengths were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00].
The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir ac-
tivation bias was searched on the grid [0.00, 1.00]. The Tikhonov regularization
parameter was searched over a logarithmically spaced interval of 10 grid points
between 1 × 10−13 and 1 × 10−8. The slope of the linear control signal was
searched on the grid [1× 10−7, 1× 10−6, 1× 10−5, 1× 10−4]. . . . . . . . . . . 116

3.6 Hyperparameters for predicting the noiseless Lorenz system (subcritical Hopf
bifurcation). The spectral radius and input coupling strength were searched on
the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leakage parameter was searched on
the grid [0.50, 1.00]. The reservoir activation bias was searched on the grid
[0.00, 1.00]. The Tikhonov regularization parameter was searched over a loga-
rithmically spaced interval of 10 grid points between 1 × 10−13 and 1 × 10−8.
The slope of the linear control signal was searched over a logarithmically spaced
interval of 10 grid points between 1×10−6 and 1×10−4. The observational noise
strength was searched on the grid [1× 10−5, 5× 10−5, 1× 10−4, 5× 10−4]. . . . 117

3.7 Hyperparameters for predicting the noisy Lorenz system (subcritical Hopf bi-
furcation). The spectral radius and input coupling strength were searched on
the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leakage parameter was searched on
the grid [0.50, 1.00]. The reservoir activation bias was searched on the grid
[0.00, 1.00]. The Tikhonov regularization parameter was searched over a loga-
rithmically spaced interval of 10 grid points between 1 × 10−13 and 1 × 10−8.
The slope of the linear control signal was searched over a logarithmically spaced
interval of 10 grid points between 1×10−6 and 1×10−4. The observational noise
strength was searched on the grid [1× 10−5, 5× 10−5, 1× 10−4, 5× 10−4]. . . . 118

xi


3.8 Hyperparameters for predicting the noisy Lorenz system (subcritical Hopf bi-
furcation) using the hybrid ML model. The spectral radius and input coupling
strength were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leakage
parameter was searched on the grid [0.50, 1.00]. The reservoir activation bias
was searched on the grid [0.00, 1.00]. The Tikhonov regularization parameter
was searched over a logarithmically spaced interval of 10 grid points between
1× 10−13 and 1× 10−8. The slope of the linear control signal was searched over
a logarithmically spaced interval of 10 grid points between 1× 10−6 and 1× 10−4.119

3.9 Hyperparameters for predicting the Lorenz equations in Fig. 3.15. The spectral
radius and input coupling strength were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00].
The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir ac-
tivation bias was searched on the grid [0.00, 1.00]. The Tikhonov regulariza-
tion parameter was searched over a logarithmically spaced interval of 10 grid
points between 1 × 10−13 and 1 × 10−8. The slope of the linear control signal
was searched over a logarithmically spaced interval of 10 grid points between
1×10−6 and 1×10−4. The observational noise strength was searched on the grid
[1× 10−5, 5× 10−5, 1× 10−4, 5× 10−4]. . . . . . . . . . . . . . . . . . . . . . 120

xii


List of Figures

2.1 Schematic of the reservoir computer. (a) The “open-loop” configuration used for
training the output weight matrix, Wout. An input sequence, u(t), is coupled into
a reservoir of nodes via a matrix Win. The excited reservoir state r(t) is mapped
to an output, ũ(t), via a matrix of adjustable weights Wout. (b) The “closed-
loop” configuration used for prediction. The reservoir output at one time step is
fed as input to the reservoir at the next time step. The dashed red box denotes the
“reservoir system”, made up of Win, the reservoir, and Wout. . . . . . . . . . . . 19

2.2 (a) Bifurcation diagram of the logistic map. (b) Zoomed-in view of the period 3
window of the logistic map bifurcation diagram. . . . . . . . . . . . . . . . . . . 22

2.3 A trajectory evolving under the non-stationary logistic map (a) given by Eq. (2.3)
for γ = 10−5 (c) given by Eq. (2.3) for γ = 5 × 10−5 and (e) given by Eq.
(2.4) for γ = 10−5 and ϵ = 0.002. (b) and (d) show a zoomed-in view of the red
boxed areas in (a) and (c), respectively, near the period doubling bifurcation from
a period 3 orbit to a period 6 orbit. . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4 (a) Comparison of the actual (black dots) and the reservoir-predicted (red dots)
trajectories of the logistic map for Setup 1. (b) The logistic map for λ = 3.8
(black) and the map obtained from the reservoir-predicted trajectory (red dots)
of (a). (c) and (d) compare the actual and the reservoir-predicted trajectories for
γ = 10−5 and γ = 5× 10−5, respectively, for Setup 2. . . . . . . . . . . . . . . . 25

2.5 (a) Open-loop training phase and (b) closed-loop prediction phase of Setup 2.
The rectangular boxes, as for the dashed rectangles in Figs. 2.1(a,b), contain the
RC system consisting of the input layer (Win), the reservoir, and the output layer
(Wout). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.6 (a) Γ(λ) for a set of λ in the period 3 window and ∆λ = 0.001. (b) Comparison of
the approximate cumulative probability distributions obtained from real trajecto-
ries (solid black, purple, and green curves) and predicted trajectories (dashed red,
yellow, and blue curves) for λ∗

1 = 3.825, λ∗
2 = 3.845, and λ∗

3 = 3.865 (vertical
dashed lines in (a)). The numerical approximations to fa(x, λ) and fp(x, λ) are
taken as the fraction of sample values less than x. The agreement between the ac-
tual and predicted cumulative distributions indicates that the reservoir-predicted
trajectories accurately represent the dynamics of the different regimes (i.e., the
pre-transition chaos, the periodic orbit, and the post-transition chaos). . . . . . . 32

xiii


2.7 (a) Time-dependence of λ. (b) Comparison of an actual trajectory of the sys-
tem (black dots) and the corresponding predicted trajectory (red dots). (c) Com-
parison of the approximate cumulative probability distributions obtained from
real trajectories (solid black, purple, and green curves) and predicted trajectories
(dashed red, yellow, and blue curves) for θ∗1 = π, θ∗2 = 1.5π, and θ∗3 = 2.25π
(λ∗

1 = 3.8253, λ∗
2 = 3.8111, and λ∗

3 = 3.8858, respectively) corresponding to the
vertical green dashed lines in (d). (d) Γ(θ) over the prediction interval. . . . . . . 34

2.8 A comparison of trajectories evolving under the noisy logistic map (black dots)
and those predicted by the reservoir (red dots that overlay the black dots) for two
noise levels (a) ϵ = 0.0005 and (b) ϵ = 0.002. . . . . . . . . . . . . . . . . . . . 36

2.9 (a) Diagrammatic representation of the action of the noisy logistic map, using the
intermediate variable vn. (b) Diagram depicting how to approximate the dynam-
ics of the noisy logistic map using a trained reservoir. . . . . . . . . . . . . . . . 37

2.10 (a,b) Comparison of trajectories evolving under the noisy logistic map (black
dots) and those predicted by the reservoir (red dots that overlay the black dots)
operating under the modified scheme with noise-injection (Fig. 2.9(b)) for two
noise levels in the actual system (a) ϵ = 0.0005 and (b) ϵ = 0.002. (c) Γ(λ)
over the prediction interval. (d) Approximate cumulative probability distributions
from actual trajectories (solid curves) and predicted trajectories (dashed curves)
for λ∗

1 = 3.8325, λ∗
2 = 3.845, and λ∗

3 = 3.865 and ∆λ = 0.001 for ϵ = 0.002. . . 38
2.11 (a) Lorenz system bifurcation diagram for the interval ρ = [25, 200]. (b) a

zoomed-in view of the window in the dotted red oval of (a): (1) indicates the
crisis transition from a one-piece chaotic attractor to a two-piece chaotic attractor
(2) (and the translucent yellow region) indicates the region of pairwise chaotic
band splittings (3) (and the translucent blue region) shows the region of period
halving cascade (4) indicates the intermittency transition from a period 2 orbit to
chaos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.12 a) (zm+1 − zm) versus ρ for the true system (black dots) compared with that pre-
dicted by the reservoir (red dots). (b) Zoomed-in view of the region enclosed by
the dashed blue oval in (a). (c) Cumulative probability distributions of z max-
ima for the true trajectories (solid curves) and the reservoir-predicted trajectories
(dashed curves) for ρ∗1 = 139, ρ∗2 = 149, ρ∗3 = 159, ρ∗4 = 169.4, and ∆ρ = 0.5.
(d) Γ(ρ) over the prediction interval. The vertical dashed lines indicate the points
at which the cumulative probability distributions are shown in (c). . . . . . . . . 43

2.13 (zm+1−zm) versus ρ from numerical solution of the Lorenz equations (black dots)
compared with the reservoir prediction (red dots that overlay the black dots). (b)
Zoomed-in view of the region enclosed by the dashed blue oval in (a). (c) Cu-
mulative probability distributions of zm from an ensemble of true (solid curves)
and reservoir-predicted trajectories (dashed curves) for ρ∗1 = 139, ρ∗2 = 149,
ρ∗3 = 159, ρ∗4 = 168.5, and ∆ρ = 0.5. (d) Γ(ρ) versus ρ over the prediction
interval (for case with ρ(t) decreasing in time). The vertical dashed lines indicate
the ρ values for which the cumulative probability distributions are shown in (c).
(e) Γ(ρ) for the case of the same noisy Lorenz system, but with ρ(t) increasing
with time (γ = 0.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

xiv


2.14 (a) ρ(t) as a function of t. (b) (zm+1− zm) versus γt for the true trajectory (black
dots) and for the reservoir prediction (red dots that overlay the black dots). (c) A
zoomed-in view of the oval region indicated in (b). (d) Γ(θ) over the prediction
interval. The vertical dashed lines indicate the points at which the cumulative
probability distributions are shown in (e). (e) Cumulative distributions of the map
obtained from the real trajectory (solid curves) and from the reservoir-predicted
trajectory (dashed curves) at θ∗1 = π, θ∗2 = 1.8π, and θ∗3 = 2.9π with ∆θ = 0.05. . 46

2.15 (a) Bifurcation diagram of the stationary Kuramoto-Sivashinsky system (Eq. (2.10)).
(b) Comparison of the actual a6(tn) (black dots) for a numerically integrated tra-
jectory of Eq. (2.13) and the corresponding reservoir-predicted a6(tn) (red dots
overlaying the black dots) for a linear drift in the parameter κ. (c) Γ(κ) over the
period 5 prediction window. The dashed vertical green lines indicate the points at
which the cumulative probability distributions are shown in (d). (d) Cumulative
probability distributions of the map obtained from the numerical solutions of Eq
(2.13) (solid curves) and from the reservoir predicted trajectories (dashed curves)
at κ∗

1 = 0.0298400, κ∗
2 = 0.0298500, κ∗

3 = 0.0298565, and ∆κ = 5 × 10−7 . (e)
a6(tn) versus κ for a numerically determined trajectory of Eq. (2.13) (black dots)
and that predicted by the trained reservoir (red dots). (f) Γ(κ) over the prediction
window for the case in (e). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.16 A typical solution of the stationary KS system (Eqs. (2.16)) for κ = 0.0816.
Time is plotted horizontally, x vertically, and the value of w is color-coded (and
normalized by its root-mean-square over time). . . . . . . . . . . . . . . . . . . 54

2.17 Bifurcation diagram of the stationary KS system (Eq. (2.16)) obtained by observ-
ing the value of w at the 40th spatial grid point (corresponding to x = 3.9270) for
w = 0 at the 1st spatial grid point (corresponding to x = 0.0982) . . . . . . . . . 54

2.18 (a) A typical solution of the non-stationary, noiseless KS system (Eqs. (2.16)
and (2.17)) obtained by numerical integration (top panel) and the correspond-
ing reservoir-computer-predicted orbit (bottom panel) trained on the numerically
integrated trajectory for t < 0. Horizontal axis shows the time and the verti-
cal axis shows x. (b) Poincare map of w40 of the numerically integrated KS
trajectory and reservoir-computer-predicted trajectory in (a) shown in black and
red dots, respectively. (c) Cumulative probability distributions of the Poincare
map points obtained from the numerical solutions of Eqs. (2.16) and (2.17)
(solid curves) and from the reservoir predicted trajectories (dashed curves) at
t∗1 = 25, t∗2 = 83.5, t∗3 = 125 (which correspond to κ∗

1 ≈ 0.08034, κ∗
2 ≈ 0.07948,

and κ∗
3 ≈= 0.07888, respectively) and ∆t = 8.4. (d) Γ(t) versus t over the pre-

diction window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

xv


2.19 (a) A typical solution of the non-stationary, noisy KS system (Eqs. (2.16) and
(2.17)) obtained by numerical integration (top panel) and the corresponding reservoir-
computer-predicted orbit (bottom panel) trained on the numerically integrated tra-
jectory for t < 0. Horizontal axis shows the time and the vertical axis shows x.
(b) Poincare map of w40 of the numerically integrated KS trajectory and reservoir-
computer-predicted trajectory in (a) shown in black and red dots, respectively. (c)
Cumulative probability distributions of the Poincare map points obtained from
the numerical solutions of Eqs. (2.16) and (2.17) (solid curves) and from the
reservoir predicted trajectories (dashed curves) at t∗1 = 25, t∗2 = 66.8, t∗3 = 125
(which correspond to κ∗

1 ≈ 0.08034, κ∗
2 ≈ 0.07973, and κ∗

3 ≈= 0.07888, respec-
tively) and ∆t = 8.4. (d) Γ(t) versus t over the prediction window. . . . . . . . . 59

3.1 Partition of the training data into the long validation set (t ∈ [−T3,−T2)), training
set (t ∈ [−T2,−T1]), and short validation set (t ∈ (−T1, 0]) for the proposed
hyperparameter optimization scheme. . . . . . . . . . . . . . . . . . . . . . . . 73

3.2 The three-dimensional trajectory of the noiseless non-stationary Lorenz system
(given by Eqs. (3.6) for ρ0 = 154, ρ1 = 8, and τ = 100 over the time interval
t ∈ [−600, 120]) from a randomly chosen initial condition. The red and black
curves correspond to the periodic trajectory for t ≤ 0 and the chaotic trajectory
for t > 0, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3.3 (a) Γ(t) versus time over the prediction window. Vertical dashed blue lines la-
beled (c), (d), (e), and (f) indicate the locations for which the true and predicted
z′m point cumulative probability distributions are shown in panels (c), (d), (e), and
(f), respectively. (b) The z′m points of an example true system trajectory (black
dots) and the corresponding ML prediction (red dots). The vertical dashed green
line indicates the starting point of the ML prediction. The vertical dashed blue
lines indicate the same times as in panel (a). Panels (c), (d), (e), and (f) show the
true (solid black curves) and ML-predicted (dashed red curves) z′m cumulative
probability distributions at t = 5, 60, 80, and 110, respectively. . . . . . . . . . . 82

3.4 (a) Γ(t) versus time over the prediction window. Vertical dashed blue lines la-
beled (c), (d), (e), and (f) indicate the locations for which the true and predicted
z′m point cumulative probability distributions are shown in panels (c), (d), (e), and
(f), respectively. (b) The z′m points of an example true system trajectory (black
dots) and the corresponding ML prediction (red dots). The vertical dashed green
line indicates the starting point of the ML prediction. The vertical dashed blue
lines indicate the same as in panel (a). Panels (c), (d), (e), and (f) show the true
(solid black curves) and ML-predicted (dashed red curves) z′m point cumulative
probability distributions at t = 50, 120, 200, and 300, respectively. . . . . . . . . 84

xvi


3.5 (a) Γ(t) vs time over the prediction window. Vertical dashed blue lines labeled
(c), (d), (e), and (f) indicate the locations for which the true and predicted z′m
point cumulative probability distributions are shown in panels (c), (d), (e), and
(f), respectively. (b) The z′m points of an example true system trajectory (black
dots) and the corresponding ML prediction (red dots). The vertical dashed green
line indicates the starting point of the ML prediction. The vertical dashed blue
lines indicate the same as in panel (a). Panels (c), (d), (e), and (f) show the
true (solid black curves) and ML-predicted (dashed red curves) z′m map point
cumulative probability distributions at t = 50, 120, 200, and 300, respectively. . . 86

3.6 Phase portraits of the snapshot attractor of the true system at (a) n = −15000, (b)
n = −5000, (c) n = 2500, (e) n = 6000, (g) n = 12500 and corresponding ML
prediction at (d) n = 2500, (f) n = 6000, and (h) n = 125000. . . . . . . . . . . 90

3.7 (a) Γ(n) vs n over the prediction window. Vertical dashed blue lines labeled (c),
(d), (e), and (f) indicate the locations for which the true and predicted y′n cumula-
tive probability distributions are shown in panels (c), (d), (e), and (f), respectively.
(b) x′

n (top) and y′n (bottom) of an example true system trajectory (black dots) and
the corresponding ML prediction (red dots). The vertical dashed green line in-
dicates the starting point of the ML prediction. The vertical dashed blue lines
indicate the same as in panel (a). Panels (c), (d), (e), and (f) show the true (solid
black curves) and ML-predicted (dashed red curves) y′n cumulative probability
distributions at n = 2500, 6000, 10000, and 12500, respectively. . . . . . . . . . . 92

3.8 Bifurcation diagram of the stationary Kuramoto-Sivashinsky equation for κ ∈
[0.076, 0.0816] obtained by the w40-Poincare map points. . . . . . . . . . . . . . 93

3.9 (a) (Top panel) Γ(t) over the prediction window. Vertical dashed blue lines la-
beled (b), (c), (d), and (e) indicate the locations for which the true and predicted
w′

40-Poincare map point cumulative probability distributions are shown in pan-
els (b), (c), (d), and (e), respectively. (Middle Panel) An example of a typical
numerically integrated system trajectory. (Bottom panel) The ML-prediction cor-
responding to the example trajectory in the middle panel. (b), (c), (d), and (e)
show the true (solid black curves) and ML-predicted (dashed red curves) w′

40-
Poincare map point cumulative probability distributions at t = 7.51, 37.64, 60.23,
and 97.88, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

3.10 (a) (Top panel) Γ(t) over the prediction window. Vertical dashed blue lines la-
beled (b), (c), (d), and (e) indicate the locations for which the true and predicted
w′

40-Poincare map point cumulative probability distributions are shown in pan-
els (b), (c), (d), and (e), respectively. (Middle Panel) An example of a typical
numerically integrated system trajectory. (Bottom panel) The ML-prediction cor-
responding to the example trajectory in the middle panel. (b), (c), (d), and (e)
show the true (solid black curves) and ML-predicted (dashed red curves) w′

40-
Poincare map point cumulative probability distributions at t = 7.51, 30.10, 60.23,
and 97.88, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.11 (a) For ρ slightly less than the critical bifurcation value an attracting chaotic or-
bit of the Lorenz system (black curve) coexists with two fixed point attractors
(blue dots) (representing the convective fluid flow patterns of the Lorenz model
in panels (b) and (c)). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

xvii


3.12 (a) z′(t) of an example true system orbit (black curve) and of the correspond-
ing ML prediction (red curve) for the case of the noiseless Lorenz system. The
vertical green dashed line indicates the start of the ML prediction. To describe
panel (b), we first define the quantity z′m as z-coordinate values that occur when
the noisy trajectory crosses the surface x = βz. This surface corresponds to the
surface of section of the chaotic noiseless system at local z(t) maxima of the
noiseless trajectory, which, from Eq. 3.6, occur at dz/dt = xy − βz = 0. Panel
(b) shows the z′m values of an example true noisy Lorenz system orbit (black dots)
versus t and of the corresponding ML prediction (red dots). The vertical green
dashed line marks the start of the ML prediction, and the vertical blue dashed line
marks the time near which the ML-predicted trajectory leaves the plotted range
due to instability of the ML prediction dynamics. Note that, in these panels, the
two noiseless fixed points (x = y = ±

√
β(ρ− 1), z = ρ − 1) have the same

z-coordinate and would appear as a single curve. . . . . . . . . . . . . . . . . . . 100
3.13 Stationary bifurcation diagrams showing the hysteretic subcritical Hopf bifurca-

tion for the true noiseless Lorenz system (top panel) and the inaccurate knowledge-
based model (bottom panel) for ρ ∈ [22.5, 45], where the bifurcation plots are
constructed by plotting z∗ against ρ. For fixed values of ρ, z∗ are either the max-
imum z value over long trajectories (for orbits on the chaotic attractor) or the z
value for the fixed point attractor. The bottom horizontal axis labeled t gives the
times at which the non-stationary system ρ value passes through the ρ values on
the ρ axis immediately above this t axis. . . . . . . . . . . . . . . . . . . . . . . 102

3.14 (a) (Top panel) Γ(t) over the prediction window. Vertical dashed blue lines la-
beled (b), (c), (d), and (e) indicate the locations for which the true and predicted
z′m point cumulative probability distributions are shown in panels (b), (c), (d),
and (e), respectively. (See caption of Fig. 3.12(b) for the definition of z′m.) (Bot-
tom Panel) The z′m points of a typical true system trajectory (black dots), and
the corresponding ML/knowledge-based model prediction (red dots). (b), (c),
(d), and (e) show the true (solid black curves) and ML/knowledge-based model
predicted (dashed red curves) z′m point cumulative probability distributions at
t = 10, 40, 65, and 110, respectively. . . . . . . . . . . . . . . . . . . . . . . . . 104

3.15 (a) Γ(t) over the prediction window. Vertical dashed blue lines labeled (c), (d),
(e), and (f) indicate the locations for which the true and predicted z′m point cumu-
lative probability distributions are shown in panels (c), (d), (e), and (f), respec-
tively. (b) The z′m points of an example true system trajectory (black dots) and the
corresponding ML prediction (red dots). The vertical dashed green line indicates
the starting point of the ML prediction. The vertical dashed blue lines indicate the
same as in panel (a). (c), (d), (e), and (f) show the true (solid black curves) and
ML-predicted (dashed red curves) z′m point cumulative probability distributions
at t = 20, 60, 85, and 110, respectively. . . . . . . . . . . . . . . . . . . . . . . . 107

3.16 Data-driven-knowledge-based hybrid ML model architecture. . . . . . . . . . . . 110

4.1 Illustration of the geographic domain decomposition for a local input vector u(in)
l

(with grid points enclosed by a red box), and a local output vector u(out)
l (with

grid points enclosed by a blue box). . . . . . . . . . . . . . . . . . . . . . . . . 125

xviii


4.2 Model architecture flow chart. The notation is explained in Sec. 4.2.1. Red
boxes represent operations that are performed locally (and in parallel) on each
of the L = 1, 152 sub-domains. Blue boxes represent operations for which the
outputs are global state vectors. The appropriate input data from (ua

G(t),v
a
G(t))

((uH
G (t),vG(t))) during training (prediction) is fed into SPEEDY, the atmospheric

ML, and the ocean ML. For both the hybrid atmospheric and the ML-only ocean
models, local forecasts (uH

l (t +∆t),vl(t +∆t′)) are stitched together to obtain
the global forecasts (uH

G (t + ∆t),vG(t + ∆t′)). During prediction, the global
forecasts are fed back in as inputs at the next time step. In practice, the ocean
model time step is larger than the atmospheric model time step and for time steps
where the ocean model is not advanced, the ocean model state is held constant. . 127

4.3 The all-season Pearson correlation coefficients and root-mean-square errors be-
tween the predicted and observation-based Nino 3.4 Index (a), Nino 3 Index (b),
and Nino 4 Index (c) as a function of lead time in months. Solid (dashed) blue
curves show the PCC for our model’s (persistence model’s) predictions. Dashed
green curve shows the RMSE for the climatology model’s predictions. Solid
(dashed) red curves show the RMSE for our model’s (persistence model’s) pre-
dictions. (d), (e), and (f) show the PCC for our model’s prediction as a function
of the prediction lead time (in months) and the prediction start month for Nino
3.4 Index, Nino 3 Index, and Nino 4 Index, respectively. . . . . . . . . . . . . . . 139

4.4 The top row shows the all-season concurrent PCC between the Nino 3.4 Index
and the 3-monthly averaged SST anomalies. The bottom row shows the all-season
lagged PCC between the Nino 3.4 Index and the 3-monthly averaged SST anoma-
lies lagging by 3 months. The left column (with panels titled “Hybrid”) shows
the correlation between the predicted Nino 3.4 Index and predicted SST anomaly
values and the right column (with panels titled “ERA5”) shows it for observation-
based data. Black dots indicate grid points for which the PCC was found to be
statistically significant at the 95% confidence level. . . . . . . . . . . . . . . . . 141

4.5 The left (right) column, with the label “Hybrid” (“ERA5”) shows the Pearson
correlation coefficients between the precipitation anomalies and the Nino 3.4
Index predicted by our model (obtained from ERA5 data) at a prediction lead
time of 1 season. The rows, in order from the top-most to the bottom-most, cor-
respond to results for the December-January-February (DJF), March-April-May
(MAM), June-July-August (JJA), and September-October-November (SON) sea-
sons. Black dots indicate grid points for which the PCC was found to be statisti-
cally significant at the 95% confidence level. . . . . . . . . . . . . . . . . . . . . 143

4.6 The Pearson correlation coefficients between the precipitation anomalies pre-
dicted by our model and those obtained from ERA5 data at prediction lead times
of 1, 2, 3, and 4 weeks, from left to right in that order. . . . . . . . . . . . . . . . 144

xix


4.7 The top (bottom) row shows the Wheeler-Kiladis diagram obtained by using the
precipitation in the 15◦S-15◦N latitude band predicted by our model (obtained
from ERA5 data). The left (right) column shows the normalized symmetric (anti-
symmetric) components of the wavenumber-frequency analysis. The horizontal
axis of each plot represents the zonal wavenumber and the vertical axis represents
the frequency (cycles per day). The solid black lines indicate solutions to the
dispersion relations from an analytical model of shallow water waves. . . . . . . 146

1 (a) and (b) show comparisons of ζ̃ (black lines), ζ (blue line), and ζ∆ (green line)
for different noise levels (widths of ζ(ϵ)), 0.002 and 0.0005, respectively. . . . . . 154

xx


List of Abbreviations

AGCM Atmospheric General Circulation Model
CFSv2 (NCEP’s) Coupled Forecast System Model version 2
CHyPP Combined Hybrid-Parallel Prediction
CMIP Coupled Model Intercomparison Project
DARPA Defense Advanced Research Projects Agency
DJF December-January-February
DL Deep Learning
E3SM (Department of Energy’s) Energy Exascale Earth System Model
ECMWF European Centre for Medium-Range Weather Forecasts
ENSO El Nino Southern Oscillation
ER Equatorial Rossby
ERA5 ECMWF Reanalysis v5
GCM General Circulation Model
HPC High Performance Computing
IFS (ECMWF’s) Integrated Forecasting System
IPCC Intergovernmental Panel of Climate Change
JJA June-July-August
KS Kuramoto-Sivashinsky (equation)
MAM March-April-May
MJO Madden-Julian Oscillation
ML Machine Learning
NCEP National Centers for Environmental Prediction
NSF National Science Foundation
NWP Numerical Weather Prediction
ORAS5 Ocean Reanalysis System 5
PCC Pearson Correlation Coefficient
RC Reservoir Computing
RMSE Root-Mean-Square-Error
S2S Subseasonal-to-Seasonal
SON September-October-November
SOTA State-of-the-art
SPEEDY Simplified Parameterization, primitive-Equation Dynamics (model)
SST Sea Surface Temperature

xxi


Chapter 1: Introduction and Background

1.1 The “What” and the “Why”

This thesis consists of two parts:

1. The first part is concerned with developing, implementing, and testing machine learning

(ML) -based approaches for modeling the long-term time evolution of non-stationary dy-

namical systems.

2. The second part deals with implementing and assessing a ML-based approach for predict-

ing the global atmospheric and oceanic dynamics on the Subseasonal-to-Seasonal (S2S)

time scale (2 weeks - 6 months) [1, 2].

In this thesis, a “non-stationary” system refers to a system which is itself changing with

time due to, e.g., time-dependent system parameters. While modeling such systems is a generally

important problem in many fields, the ultimate goal of the work in this thesis is to devise and test

methods for their eventual potential application to the monumental challenge of predicting global

climate change. In this way, the two parts of this thesis are related since we would like to devise

methods for predicting non-stationary dynamical systems that can then be extended/applied to

models (like the one used in the second part of this thesis) for predicting the global atmospheric

and oceanic dynamics under climate change.

1


Many real-world systems are non-stationary, and therefore may exhibit widely different

behaviors, on time scales of interest. For example, the dynamics of financial markets may change

significantly over days and months due to politics, policy, or even illegal activities such as market

manipulation [3–6], engineering systems may undergo structural/mechanical/electrical failures

over periods of time shorter than the expected system lifetime due to sudden or gradual changes

in the environment [7–13], and ecosystems may experience abrupt changes due to human activ-

ity [14–23]. Depending on the context, predicting such behavioral shifts in systems sufficiently

ahead of time is critical for mitigating financial damage, promoting societal well-being, and pre-

serving the natural world.

The terrestrial climate system, defined by the Intergovernmental Panel on Climate Change

(IPCC) as consisting of “... the atmosphere, the hydrosphere, the cryosphere, the lithiosphere

and the biosphere and the interactions between them” [24], is a prominent example of a non-

stationary system. This non-stationarity, whether attributed to natural internal processes or ex-

ternal forcings such as volcanic eruptions and persistent anthropogenic activities (e.g., green-

house gas emissions, and deforestation), is termed climate change [24]. The effects of climate

change, such as an increase in the frequency of extreme weather events (e.g., extreme precipita-

tion, heat waves, droughts, floods, and tropical cyclones), are already being felt in many parts of

the world [25]. For instance, in the United States, there are now three times as many heatwaves

per year as there were in the 1960s - and on average, they occur for longer duration and with

greater intensity [26]. In addition, simultaneously occurring extreme events can compound to

amplify each other’s severity and such compounding events may become more common under

climate change [27–29]. Climate change is expected to have a significant impact on key sectors

like water (e.g., due to increased drought/flood risks), food (e.g., due to the impacts of altered

2


precipitation patterns on agriculture), and health (e.g., increased deaths due to natural disasters

and increased spread of vector-borne diseases) [30, 31]. In addition to an increase in the fre-

quency of extreme weather events in many regions of the world, climate change may also drive

various sub-systems of the Earth system past tipping points [32] beyond which the dynamics of

those sub-systems may change abruptly (e.g., the shutdown of the thermohaline circulation in

the Atlantic Ocean [33, 34]). Thus, planning for long-term climate adaptation requires us to be

able to predict the Earth’s climate in the far future when the atmospheric and oceanic conditions

may be significantly different from those of the present and those observed in the past. At the

same time, effective planning for mitigating damages due to weather extremes also requires fore-

casting weather conditions at lead times in the S2S time scale. Being able to predict at this time

scale is especially crucial for land, water, and energy management. For instance, S2S forecasts

are heavily relied upon for management of fisheries, water supply, agricultural resources, energy

generation, coastal flooding preparedness, and wildfire risk [35, 36].

This thesis aims to investigate crucial aspects related to both of these issues by developing

methods that can enable prediction of the long-term statistics of non-stationary dynamical sys-

tems (with the aim of eventually applying those methods for generating climate projections), and

developing a global climate model to perform predictions on the S2S time scale.

1.2 A Short History of Numerical Climate (Change) Modeling.

Since one of the objectives of this thesis is to devise and test ML methods for modeling

non-stationary systems (which is central to modeling climate change), and the ultimate goal of

this research is to extend the results in this thesis to the problem of climate change prediction,

3


we will begin by giving a brief history of numerical climate modeling. The aim of this is for us

to arrive at an understanding of some of the challenges faced by current state-of-the-art (SOTA)

conventional climate models and how the work presented in this thesis can help guide future

research.

The early stages of numerical climate modeling mirrored that of numerical weather pre-

diction (NWP), which itself closely shadowed the developments in computer technology. In the

early 1900s, Norwegian physicist Vilhelm Bjerknes and American meteorologist Cleveland Abbe

independently recognized that principles of hydrodynamics and thermodynamics can be used to

forecast the time evolution of the atmospheric state [37, 38]. They independently formulated a

two-step approach: a diagnostic step to obtain information about the initial state, and a prognos-

tic step to use governing equations to advance the initial state to a future state [39]. In 1908,

Felix Exner made a first attempt at using the laws of physics to produce a quantitative weather

forecast by developing a reduced model with restricted capabilities. Shortly thereafter, Lewis

Fry Richardson published the first description of a method to directly solve differential equa-

tions to produce weather forecasts [39, 40]. The computing technology necessary to implement

such a scheme, however, lagged behind by a couple of decades. As a result, Richardson’s work

was largely forgotten by his contemporaries. Several decades later, advances in technology for

making observations of the atmosphere (e.g., the invention of the radiosonde), and advances in

the theory of meteorology, numerical analysis (e.g., the design of numerically stable algorithms),

and the design of digital computers all led to the first weather prediction using a computer [41].

Due to his contributions, Richardson is known by many in the atmospheric science community

as the father of modern climate models. Indeed, he had key insights that would be critical to

climate modeling, e.g., while components of the Earth system such as the ocean may be ignored

4


for weather forecasting, they would need proper treatment when addressing climate modeling.

One might erroneously assume that the chronology of climate models is such that the sim-

pler models were developed first and then, successively, more complex models were formulated.

Some of the first climate models (developed in the 1960s) were in fact atmospheric general cir-

culation models (AGCMs) derived directly from NWP models [40]. Scientists interested in ex-

panding these models to encompass hemispheric or global domains (combined with studies con-

cerning the radiative and thermal equilibrium of the Earth system) led to the development of the

Radiative Convective models [42]. Later on, Budyko [43] and Sellers [44] proposed Energy Bal-

ance Models, which were much simpler and required fewer computing resources (owing to their

coarser grids and simpler parameterizations instead of solving dynamical flow equations). These

relatively lightweight models enabled scientists to perform long-term climate studies, some of

which led to the discovery that there may exist multiple stable climate states [40]. They were

also used to probe the climate response to atmospheric disturbances like volcanic eruptions and

an increase in the atmospheric CO2 concentration levels [45]. With increasing computing capa-

bilities, by the 1980s it appeared as though AGCMs were beginning to replace simpler climate

models. By the 1990s, AGCM ensemble methods were being used to study internal climate

variability [46].

As computer technology has advanced, so has the complexity of modern climate models.

For instance, more detailed components such as multi-layered oceans, atmospheric chemistry,

and carbon budgeting are now commonly added to create coupled ocean-atmosphere general

circulation models (GCMs). Despite such innovation, in order to feasibly run GCMs for climate

time scale integration times, modern GCMs are still simplifications of our knowledge of oceanic

and atmospheric dynamics. Such simplifications manifest as, e.g., parameterization schemes for

5


processes which evolve on temporal and spatial scales that are shorter or longer than the scales

which are explicitly accounted for in developing the model. Such decisions usually involve a

number of tunable parameters that must be determined from theory or observations.

1.3 Challenges Faced by Conventional Climate Models

Substantial progress has been made in developing GCMs in the last several decades, and

GCMs now encode numerous highly complicated processes in the Earth system involving the

atmosphere, the ocean, and the marine and terrestrial biosphere [40, 47]. In addition, modern

GCMs are often operated at much higher spatial resolutions (by at least a factor of about 10) in

the horizontal and vertical directions than the model used by Manabe and Wetherald in 1975 to

study the effects of doubling the concentration of atmospheric CO2 [48]. Despite such advances,

many scientists are wondering if GCMs are really the way forward [47]. Encoding the differ-

ent processes into GCMs is done by introducing highly complex parameterization schemes with

many tunable parameters. Tuning these parameters is a computationally challenging task (some-

times requiring on the order of 100 years of simulations [47]), and often a source of significant

parameteric uncertainty. In addition, structural uncertainty can be introduced if the form of the

parameterization is itself flawed [47]. Indeed, uncertainty bounds due to parameteric and struc-

tural uncertainties have increased in the last two climate model assessment cycles [49]. Faced

with such challenges, many scientists in the field are suggesting that further improvements in

spatial resolution are need (e.g., to resolve small-scale turbulence), but this would require a sig-

nificant boost to current computing capabilities [50]. To make matters worse, once such a model

is developed and tuned, in order to make useful climate projections, it will need to produce an

6


ensemble forecast on climate time scales. Large ensemble sizes may be required for character-

izing the probability of weather/climate extremes in the future. For instance, while conventional

climate models (like GCMs) typically use only a small number of ensemble members to make

climate projections (e.g., Coupled Model Intercomparison Project (CMIP) models typically use

3-10 members), much larger ensemble sizes (on the order of at least several hundred) are needed

to sample internal variability at regional spatial scales [51, 52]. In addition, because many future

hypothetical climate scenarios need to be tested for climate adaptation planning, it is not feasible

to use conventional climate models for this task given their computational costs. (The models

used for S2S forecasting are typically similar to conventional climate models [53], and therefore

they inherit many of the same problems described above for climate models.)

1.4 An Alternative Approach: Machine Learning

Recent advances in ML have enabled the creation of deep learning (DL) weather forecast-

ing models that are on par (and in some cases superior) in performance to physics-based NWP

models in the short- to medium-range (7-14 days) regime, and can require up to 10, 000x fewer

computing resources during inference [54–57]. In principle, this can allow one to efficiently

generate an ensemble of long-term climate projections to obtain climate statistics. In practice,

however, there are limitations. Most SOTA DL weather forecasting models are not long-term sta-

ble, or at least physical, and so their behavior at long integration times has not been studied [58].

Even if they are modified to be long-term stable, there is no guarantee, given their purely data-

driven nature, that they will produce accurate long-term statistics of the Earth’s climate system.

We note, however, a class of physics/ML -based hybrid models have been proposed which are

7


long-term stable and are able to produce realistic climate statistics [59, 60]. While these mod-

els are able to generate long-term statistics that match past observed climate data, it is generally

unclear how well ML models will perform when they will be required to extrapolate to future cli-

mate scenarios for which the atmospheric and oceanic dynamics are significantly different from

those the models were trained on. In other words, while ML-based models offer a computa-

tionally cheap alternative for climate modeling, little attention has been given to their ability to

extrapolate to out-of-distribution dynamics - a key component of extrapolating to future climate

scenarios.

1.5 Thesis Outline

In this thesis, we attempt to execute the following two research agendas: (1) devise ML-

based methods for modeling the long-term statistical properties of non-stationary dynamical sys-

tems, test the limitations of such methods, and when they fail, investigate how can these failures

be addressed, and (2) modify a physics/ML -based hybrid global climate model for generating

useful forecasts on the S2S time scale.

1.5.1 Machine Learning for Modeling Non-Stationary Dynamical Systems

In Chapter 2, we develop and test ML-based methods for predicting the long-term statisti-

cal properties of non-stationary dynamical systems. We study the utility of our devised methods

in scenarios where the future behavior of the target system to be predicted is fundamentally dif-

ferent (e.g., periodic orbits) from the behavior on which the ML models were trained on (e.g.,

chaotic orbits) due to non-stationarity induced by the time-dependence of a parameter of the sys-

8


tem. We also investigate the effect on the predictability of such non-stationary systems when

dynamical noise is present in the training data. We introduce a method that, for dynamical noise

of small amplitudes, enables our trained models to reproduce the effects of that dynamical noise

on typical trajectories of the system during inference. We begin by illustrating our methods on the

one-dimensional discrete-time logistic map [61]. We further test our methods on the continuous-

time three-dimensional Lorenz system [62], and the spatiotemporal Kuramoto-Sivashinsky sys-

tem [63, 64]. This chapter was published in the journal Chaos [65].

1.5.2 How Well Can Machine Learning Models Extrapolate Dynamics?

In Chapter 3, we begin by exploring the limitations of our methods from Chapter 2 when

faced with the combined difficulties of having to generalize to previously unseen dynamics (as

done in Chapter 2), and to generalize to previously unexplored regions of the system state space.

We find that while the ML methods we devised have some utility in the first case, they can

easily fail in the second case, especially (as one would expect with any method designed for ex-

trapolation) when the amount of extrapolation required increases. We test these scenarios using

the three-dimensional Lorenz system [62], the two-dimensional Ikeda map [66], and the spa-

tiotemporal Kuramoto-Sivashinksy system [63, 64]. In addition, we introduce a hybrid approach

to modeling non-stationary systems that combines an ML component with a, possibly faulty,

knowledge-based component. We find that such a hybrid approach can substantially improve the

trained model’s ability to generate useful predictions even in situations which require significant

amounts of extrapolation. We also explore the effect of dynamical noise on the predictability of

future tipping points and extrapolation to post-tipping point dynamics, and find that the presence

9


of dynamical noise can be beneficial. This chapter was published in the journal Chaos [67].

1.5.3 A Hybrid Machine Learning -based Model for Subseasonal-to-Seasonal

Forecasting

In Chapter 4, we modify a physics/ML -based hybrid global climate model developed by

Arcomano et al. (see [59]) and assess its utility in generating forecasts at the S2S time scale. The

model consists of an atmospheric component that is a hybrid between a low-resolution AGCM

and an ML model, and an ML-only ocean component. The two components are coupled by

exchanging various variables, e.g., the atmospheric component receives the sea surface tempera-

tures (SSTs) from the ocean model as input, while the ocean component receives the two compo-

nents of the horizontal wind vector, temperature, specific humidity, and surface pressure from the

lowest atmospheric model level as input. It was previously found that such a hybrid model can

not only perform better than either the physics-based or ML-only version operating alone [68],

but can also capture processes not captured by the physics-based model component [59]. We

investigate the ability of our model to capture various ocean-atmopshere phenomenon on the S2S

time scale, e.g., the El Nino Southern Oscillation (ENSO) and its teleconnections, precipitation

anomalies, and equatorial waves. We find that our model has useful skill in predicting such

phenomenon on the S2S time scale.

10


Chapter 2: Using Machine Learning To Predict Statistical Properties of Non-

Stationary Dynamical Processes:

System Climate, Regime Transitions, and the Effect of Stochasticity

2.1 Introduction

Predicting the time evolution of a dynamical system is a central requirement for enabling

many applications. Moreover, it is often the case that first principles knowledge of the dynamical

system to be predicted is unavailable. Instead, one may have access to measured time series data

of the previous states of the system. For situations where the system to be predicted is stationary

in time, machine learning has proven to be effective in using available time series data to accom-

plish prediction tasks [69–94]. In this stationary context, particularly for systems with chaotic

behavior, it is useful to partition the problem of prediction into (i) short-term prediction of the

state of the system, and (ii) long-term prediction/replication of the statistical/ergodic dynamical

behavior of the system. That is, even after short-term prediction fails (e.g., due to the sensitive

dependence of chaos), we may still desire to replicate the statistical properties of the orbits of

the real system. We have in mind the situation, in which, following an initial transient, the orbit

from an initial condition goes to an “attractor”, i.e., a subset of the state space on which the orbit

continues to reside, while, perhaps, evolving chaotically. In such a stationary situation, a mea-

11


surement of the state at a random instant of time can be thought of as corresponding to a sample

taken from an underlying probability distribution which is invariant under the flow of time. For

such systems, the ergodic theorem allows us to characterize the long-term statistical properties of

the orbits by equating the time average of a state function over a single typical orbit to the state

space average of that function taken with respect to an invariant measure [95]. If we succeed in

predicting trajectories which have the same long-term statistical properties as those produced by

trajectories of the actual system, then we say we have succeeded in replicating the “climate” of

that system.

In contrast to the situation described in the above paragraph, we are interested in prediction

of non-stationary dynamical systems, i.e., systems which themselves vary with time. However,

in the case of a non-stationary system, ergodicity no longer holds. Thus, for non-stationary

systems, we cannot meaningfully define a representative statistical quantity in the above way

[96], and we wish to extend our concept of ”climate” to the non-stationary case. To do this we

consider an ensemble of trajectories originating from many randomly chosen initial conditions

in the past. At any subsequent given time, after a sufficiently large convergence time has passed

since initialization, the ensemble trace out a so-called ”snapshot attractor” [96–98]. This attractor

possesses a probability distribution that (under appropriate general conditions) is independent of

the set of initial conditions chosen for the ensemble and thus provides an appropriate measure

representing the probability distribution of states on the snapshot attractor [96]. (The collection

of all snapshot attractors (at every time instance) is referred to, in mathematical and, terrestrial

climate-related literature, as the pullback attractor of the system [99]). This time-dependent

snapshot probability distribution can then be used to obtain expectation values of state-dependent

functions thereby defining the time-dependent climate of a non-stationary system.

12


The goal of this chapter will be to devise and test machine learning approaches for the

prediction of the future evolution of the climate of non-stationary dynamical systems.

We note that, although machine learning has previously been extensively investigated and

applied for the task of short-term state prediction of stationary dynamical systems (especially

chaotic systems) (see Refs. [69–71,82–91]), and has also been shown to be effective for replicat-

ing the climate of stationary systems [72, 73, 92, 93], machine learning has previously received

little attention as a general approach to the important task of predicting climate evolution in

non-stationary dynamical systems. For recent papers on using machine learning for short-term

state prediction of non-stationary systems, see Refs. [88, 94, 100]. In the context of our stated

goal, an important issue is that of “regime transitions”. Specifically, for different system param-

eters stationary dynamical systems can generally exhibit a variety of different types of motions

with different climates, and, correspondingly, slowly time varying non-stationary systems can

experience regime transitions where, once the system crosses some critical condition, the sys-

tem’s behavior rapidly becomes qualitatively and quantitatively different. There are a variety of

mechanisms for regime transitions ranging from basic bifurcations of periodic orbits (e.g., Hopf,

saddle-node, and period doubling bifurcations) to transitions of chaotic attractors (e.g., period

doubling cascades [101, 102], intermittency [103], and crises [104, 105]). For a very recent work

on using machine learning to predict system collapse and chaotic transients associated with crisis

transitions, see Ref. [106].

Thus, in addition to slow, temporally continuous climate change, regime transitions are

important in many fields. For example, in the context of terrestrial climate, reference is typi-

cally made to “tipping points” representing abrupt transitions in the Earth’s climate [32, 107].

Geological records indicate that the Earth has experienced such transitions in the past; for ex-

13


ample, Greenland ice-core records indicate a series of abrupt climatic events in the Last Glacial

period [108], and there is growing concern that anthropogenic activities such as greenhouse gas

emissions could drive the climate system to a regime transition [109, 110], such as the possible

shutdown of the thermohaline circulation in the Atlantic Ocean [33, 34], and the potential tran-

sition from a wet to a dry Indian summer monsoon [111]. Such regional transitions could also

affect the climate of geographically distant locations via teleconnection [112–114]. Naturally,

both temporally continuous behavior evolution, as well as regime change events are of immense

consequence, not only for atmospheric climate, but also in many other dynamical contexts, e.g.,

economics [115, 116], ecology [14–23, 117, 118], medicine [119, 120], technology [11, 12], and

sociology [121].

The goal of predicting the climate of non-stationary systems from time series state data, in

principle, requires some knowledge of the future evolution of the underlying dynamical system

that produces the data. Such knowledge might be directly available or derivable from the available

past state time series data. The need for this type of knowledge can be seen by considering the

example where the underlying system time dependence experiences an abrupt change such that

the system after the change is unrelated to the system before the change, thus making predictions

of the state changes past the time of the abrupt system change untenable. With this in mind, we

restrict our considerations to the case where the underlying system evolution is smooth and slow

compared to the time scale (τs) of the state evolution (τns ≫ τs, where τns is the ”non-stationarity

time scale” over which the underlying system evolves). We emphasize that we are interested in

climate prediction over the long time scale τns. In this chapter we consider the scenario where the

non-stationarity of the system to be predicted is due to time dependent system parameters where

the time dependence is known, although the system itself is otherwise unknown. Discussion of

14


the importance this scenario in application is given in Sec. 2.4.1. (For simplicity, in what follows,

we will only consider the case where there is one relevant time-dependent system parameter.)

The main conclusion of the work in this chapter is that machine learning offers a surpris-

ingly promising avenue for using measurements of past system state time series to accomplish

the important general goal of predicting the long-term dynamical behavior (climate) of non-

stationary dynamical systems, including prediction of apparently continuous change and of more

sudden regime transitions.

In what follows, we use Reservoir Computing [122–124] as our machine learning platform

of choice for practical reasons. Reservoir computing is a computationally inexpensive framework

since it requires training only its output layer via a simple linear regression. The resulting reduced

training time allows for relatively rapid testing of various methodologies and parameter regimes.

However, we expect other types of machine learning, e.g., deep learning [125], to also work well

for this task via the methods we develop and test in this chapter.

Another issue we address is that of predicting the time evolution of a non-stationary dy-

namical system evolving under the influence of dynamical noise, which is often unavoidable in

real situations. We present a simple method which uses the residual error distribution from the

training procedure to inject appropriately determined noise into the reservoir prediction scheme

to perturb the reservoir output in a way which mimics the effects of the dynamical noise present

in the real system, thus enabling prediction of the noise-influenced climate.

The rest of the chapter is organized as follows. In Sec. 2.2 we give a brief overview

of Reservoir Computing and of a widely employed basic scheme for using it for prediction of

stationary dynamical systems. In Sec. 2.3 we discuss noiseless and noisy non-stationary versions

of the logistic map and their behaviors. In Sec. 2.4.1, using the noiseless non-stationary logistic

15


map, we introduce and test our method. In Sec. 2.4.2 we generalize the method of Sec. 2.4.1 to

include dynamical noise. Sections 2.5, 2.6 and 2.7 test our methods on non-stationary versions of

the Lorenz 63 system [62] and the Kuramoto-Sivashinsky system [63, 64], respectively. Section

2.8 presents conclusions. Referring to Secs. 2.4-2.7, we note that the example systems tested

include a simple discrete time map (Sec. 2.4), a low-dimensional continuous time system (three

ordinary differential equations; Sec. 2.5), and a partial differential equation potentially exhibiting

spatiotemporal chaos (Sec. 2.6 and 2.7).

2.2 Review of Reservoir Computing (RC) and Its Use for Prediction of Station-

ary Systems

Reservoir computing (e.g., see the review paper [122]) is a machine learning approach

that has proven to be effective in tasks involving dynamical time series data. Although there

are various ways of implementing RC [126–131], in this chapter we consider the widely used

implementation which utilizes a network of neuron-like nodes with recurrent connections, the

“reservoir”, to process temporal/sequential data [123,124]. The main idea is to drive the reservoir

with an input signal and form a trainable linear combination of the nonlinear response of the

individual network nodes to obtain an output sequence which closely approximates some desired

sequence.

For illustrative and background purposes, in this section we describe the application of

reservoir computing to predicting the time evolution of an unknown stationary system. Given a

set of measurements of the state of a system from some time in the past t = −T up to some time

labeled t = 0, we desire to predict the state of the system for t > 0. Even after prediction of the

16


state of the system fails, we may still desire that the predicted trajectory replicates the climate of

the actual system in the long-time limit (Refs. [72, 73]).

We denote the state of the dynamical system (from which the measurements are obtained)

at time t by a K dimensional vector u(t) = [u1(t), u2(t), ..., uK(t)]
T . If our reservoir consists

of N nodes, then the K dimensional input vector at each iteration is coupled into the reservoir

via a (N × K) dimensional input-to-reservoir coupling matrix, Win (see Fig. 2.1). We chose

to construct this matrix by randomly selecting one element of each row to be a nonzero number

chosen randomly from a uniform distribution on the interval [−χ, χ]. Each reservoir node i has

a scalar state ri(t). The state of the reservoir at time t is denoted by the N dimensional vector

r(t) = [r1(t), r2(t), ..., rN(t)]
T . The reservoir evolves dynamically given the linearly coupled

input and a network adjacency matrix A, according to

r(t+∆t) = tanh (Ar(t) +Winu(t)), (2.1)

where the hyperbolic tangent function is applied to each element of its vector argument. The

directed adjacency matrix is a randomly generated sparse matrix such that it has a predetermined

average number of connections per node, ⟨d⟩. For our task the desired output time sequence is of

the same dimension as the input sequence, so a (K×N) dimensional reservoir-to-output coupling

matrix Wout is used to linearly map the reservoir state vector r(t) to the output denoted ũ at the

next time step, t + ∆t. That is, the output is the K-vector, Woutr(t) = ũ(t + ∆t). These steps

are diagrammatically represented in Fig. 1(a). The matrix elements of Wout, are determined by

the training so that the output of the reservoir computer closely approximates the desired output

for −T ≤ t ≤ 0. To do this, we minimize a “cost function”,

17


∑
−T≤t≤0

||Woutr(t)− u(t+∆t)||2 + α||Wout||2,

where we include a Tikhonov regularization parameter α to prevent overfitting by penaliz-

ing large output weights. This minimization is a simple linear regression. The set of coefficients

of Wout which minimizes this L2 norm is taken to be the “trained” set of output weights for the

network. The choice of α and other “hyperparameters” (such as the average degree ⟨d⟩ and spec-

tral radius ρs of A (i.e., the largest eigenvalue magnitude of A), as well as the maximal strength

of the input-to-reservoir coupling χ are empirically chosen to minimize the discrepancy between

a target prediction and the reservoir-prediction over a validation data set (e.g., by performing a

grid search over the space of these hyperparameters). Further, we constrain our search by re-

stricting our attention to a set of hyperparameters which satisfy the “echo state condition” (see,

e.g., Ref. [123]) which requires that, starting from any two different reservoir states r1 and r2,

iterations of Eq. (2.1), after a transient phase, yields the same time series for the reservoir state

r(t), i.e., the reservoir eventually “forgets” its initial condition, meaning that its evolution de-

pends only on its input u(t) and is independent of its initial state. If N is sufficiently large it can

be expected that the output ũ(t + ∆t) will be extremely close to the desired output u(t + ∆t),

i.e., the reservoir will give an accurate one time step prediction.

Once we have trained the reservoir on previous measurements of the system states, using

the “open-loop” configuration for −T ≤ t ≤ 0 shown in Fig. 2.1(a), we predict the system

evolution for t > 0 by using the “closed-loop” configuration shown in Fig. 2.1(b). In this

configuration, the output from the reservoir computer at one step is fed back as input at the

next iteration. Thus, on the first iteration the input is u(t) and (neglecting the small difference

18


Figure 2.1: Schematic of the reservoir computer. (a) The “open-loop” configuration used for
training the output weight matrix, Wout. An input sequence, u(t), is coupled into a reservoir
of nodes via a matrix Win. The excited reservoir state r(t) is mapped to an output, ũ(t), via a
matrix of adjustable weights Wout. (b) The “closed-loop” configuration used for prediction. The
reservoir output at one time step is fed as input to the reservoir at the next time step. The dashed
red box denotes the “reservoir system”, made up of Win, the reservoir, and Wout.

between u(t) and ũ(t)) the output is ideally u(t + ∆t), which, when fed back, yields an output

u(t+ 2∆t), which, when fed back, yields an output u(t+ 3∆t), and so on. However, due to the

small deviation between u(t) and ũ(t), errors build up with multiple circuits of the feedback loop,

and eventually the prediction fails. Nevertheless, it is observed that this scheme often produces

predictions that are useful for a limited time. One notable point is that, because the reservoir

network is “recurrent” (i.e., has directed links forming closed paths cycling between nodes), the

reservoir system retains memory of its past states and inputs.

As mentioned in the introduction, Sec. 2.1, we are interested in both discrete-time dynami-

cal systems (maps; Secs. 2.3 and 2.4), as well as continuous-time dynamical systems (flows; Sec.

2.5, 2.6, and 2.7). In the case of maps t is an integer t = n (n = 0, 1, 2, ...) and the reservoir time

step is ∆t = 1. In the case of flows we will choose ∆t to be small compared to the characteristic

time for variation of u(t) so that the predicted discrete outputs, separated by ∆t, give a good

19


representation of the continuous time evolution.

It is important to note (e.g., Sec 2.4) that the closed-loop configuration of Fig. 2.1(a) is itself

a stationary dynamical system whose dynamics, through the training of Wout, has essentially been

devised so as to mimic the dynamics of the unknown system producing the measurement data (see

Ref. [72]).

In Sec. 2.4.1, we modify this set-up (applicable for the prediction of stationary dynamical

evolution processes) to the application of non-stationary, possibly stochastic, processes.

2.3 The Non-Stationary Logistic Map

In this section we consider the one-dimensional logistic map which provides a simple plat-

form for formulating, illustrating, and testing the application of RC to the task of predicting the

time evolution of a non-stationary discrete time dynamical system. Because of its simplicity,

the logistic map allows us to rapidly test various methodologies and techniques. Furthermore, we

will subsequently show (Secs. 2.5, 2.6 and 2.7) that our results and methods, demonstrated for the

simple example of the logistic map, carry over to other, less simple systems. Before we demon-

strate the reservoir computer’s ability to predict the non-stationary climate of the logistic map,

we briefly review three variations of this system: the stationary logistic map, a non-stationary lo-

gistic map, and a stochastic non-stationary logistic map. We demonstrate that the dynamics and

regime transitions have different characteristics for each of these three variations of this system.

We will first briefly review the behavior of the stationary logistic map system,

xn+1 = λxn(1− xn) (2.2)

20


for different, but fixed values of λ. We randomly initiate a trajectory in the interval (0, 1), allow

it to evolve under the action of the map for fixed λ, and consider only its long-term behavior

(after any transients associated with the choice of initial condition have died away). Plotting the

iterations of x in this long-term limit for each value of λ, we generate a bifurcation diagram,

as shown in Fig. 2.2(a). For λ ∈ (1, 3), a typical initial condition x0 ∈ (0, 1) will evolve to a

fixed point attractor, x∗ = 1 − 1/λ. As λ is increased through λ = 3, the stationary point x∗

bifurcates to a period 2 orbit. As λ is increased further, the attracting orbit undergoes a period

doubling cascade [101, 102] terminating at λ = λ∞ ≈ 3.569.... In the range λ∞ < λ < 4, chaos

occurs, accompanied by an infinite number of windows of different periodicity, e.g., see Ref.

[132] (Sec. 2.2). For this section, in order to conveniently address a variety of regime transition

types, we focus on the period 3 window λ ∈ [3.82, 3.87], as shown in Fig. 2.2(b). As λ increases

through a critical value λc1 ≈ 3.828..., the period 3 window is initiated from a chaotic situation

below λc1 through an intermittency transition [103] associated with a period 3 tangent bifurcation.

As λ increases beyond λc2 , this period 3 orbit undergoes a period doubling cascade leading to

chaos cycling through three bands (with interspersed windows). This three-piece chaotic attractor

widens for larger λ until it collides with the unstable period 3 orbit generated at the start of the

window. This collision results in a crisis transition [104,133] causing an abrupt broadening of the

three-piece chaotic attractor into a single-band chaotic attractor, thereby terminating the period 3

window (at λc3).

In the tests and examples in the rest of this chapter, we will often consider situations with

windows, like the period three window of Fig. 2.2(b). We emphasize, however, that we do this,

not necessarily because we want to treat windows, but, rather because, as noted above for the

case of Fig. 2.2(b), windows contain within them a variety of general types of common regime

21


transitions (intermittency transitions [103], period doubling cascades, crises [104, 133]). Thus

windows provide a convenient platform for studying and testing the effectiveness of our methods

when regime transitions are present.

Figure 2.2: (a) Bifurcation diagram of the logistic map. (b) Zoomed-in view of the period 3
window of the logistic map bifurcation diagram.

Now we modify the stationary system to create a non-stationary situation where the param-

eter λ drifts linearly with time,

xn+1 = λnxn(1− xn) (2.3a)

λn+1 = λn + γ, (2.3b)

(Later in this chapter we will consider the case where the linear drift with time, Eq. (2.3b), is

replaced by a nonlinear drift with time.) To discuss the dynamics of this system, we consider

the situation where λn varies very slowly (|γ| ≪ λn). For γ = 10−5 Figure 2.3(a) shows a

typical orbit evolved under the action of Eq. (2.3) over the interval of the period 3 window.

Notice that the period doubling bifurcation, from period 3 to period 6, looks different, more

“abrupt” (see blow-up in Fig. 2.3(b)), in this non-stationary case than in the stationary case

22


shown in Fig. 2.2(b). This can be understood as follows. Just before the bifurcation, the orbit

is approximately moving on the period 3 attractor. When λn increases past λc2 , the period 3

orbit points become unstable, but the trajectory remains, for a few iterations, very close to now

unstable period 3 orbit, and then, after a few more time steps, the trajectory becomes sufficiently

displaced from the unstable period 3 orbit (since λn is changing), and is repelled from the period

3 orbit, quickly moving to the now stable period 6 orbit. Thus, we obtain the observed delayed,

but rapid, transition from a period 3 orbit to a period 6 orbit. As the drift rate of λ increases,

the transition from a period 3 orbit to a period 6 orbit is further delayed, and the details of the

period doubling cascade become less discernable and are eventually washed out, as seen from

Figs. 2.3(c,d) which apply for γ = 5 × 10−5. The main point illustrated in Figs. 2.3(a-d) is that

non-stationarity can qualitatively alter the dynamical behavior of a system, especially during a

regime transition.

Similarly, dynamical noise can also significantly alter the climate and features of regime

transitions. To illustrate this, we consider the non-stationary stochastic logistic map [134], given

by Eq. (2.4),

xn+1 = λnxn(1− xn) + εn (2.4a)

λn+1 = λn + γ (2.4b)

where ϵn (the dynamical noise) is a number randomly and independently chosen from a distri-

bution ζ(ϵ). For simplicity, we will consider the case in which the noise distribution is uniform

on the interval [−ϵ, ϵ]. As shown in Ref. [135] for the stationary system bifurcation diagram the

dynamical noise acts to blur the fine structure of the period doubling cascade (near λ∞) and ap-

pears to cause the period doubling cascade to terminate after a few period doublings. Turning to

23


Figure 2.3: A trajectory evolving under the non-stationary logistic map (a) given by Eq. (2.3) for
γ = 10−5 (c) given by Eq. (2.3) for γ = 5 × 10−5 and (e) given by Eq. (2.4) for γ = 10−5 and
ϵ = 0.002. (b) and (d) show a zoomed-in view of the red boxed areas in (a) and (c), respectively,
near the period doubling bifurcation from a period 3 orbit to a period 6 orbit.

the non-stationary case, in Fig. 2.3(e) we have plotted a typical trajectory in the period 3 window

evolving under the action of Eq. (2.4) with a noise of strength ϵ = 0.002 and λ drifting linearly

at a rate γ = 10−5. We see that the dynamical noise “blurs” the period doubling bifurcation from

a period 3 orbit to a period 6 orbit. We also see, before the occurrence of the internal crisis, at the

upper λ range of the window, that the noisy trajectory intermittently leaves (and returns) to the

region of the state space which corresponds to the three-piece chaotic attractor of the stationary

system. Thus, dynamical noise can cause significant change in features of regime transitions,

including blurring of the noiseless sharp location of the point at which a transition occurs.

We desire to accurately predict and replicate, from past time series data, the effects of both

noiseless and noisy non-stationarity on the climate associated with time evolution of the system.

24


2.4 Application of RC to Climate Prediction of the Non-Stationary Logistic

Map

Figure 2.4: (a) Comparison of the actual (black dots) and the reservoir-predicted (red dots) tra-
jectories of the logistic map for Setup 1. (b) The logistic map for λ = 3.8 (black) and the map
obtained from the reservoir-predicted trajectory (red dots) of (a). (c) and (d) compare the actual
and the reservoir-predicted trajectories for γ = 10−5 and γ = 5× 10−5, respectively, for Setup 2.

2.4.1 The Noiseless Non-stationary Logistic Map

Now we turn to our stated goal: given a finite set of measurements of the state of an un-

known non-stationary dynamical system from some time in the past t = −T up to some time

t = 0, predict the climate and regime transitions for t > 0. More precisely, when the predic-

25


tion is performed over an ensemble of trajectories, each initiated from a randomly chosen initial

condition, we want our predicted ensemble of trajectories to replicate the non-stationary climate

of the actual system. For work previously applying initial condition ensembles for studying and

understanding non-stationary dynamics see Refs. [136–138]. First, we demonstrate how naively

applying RC using the measurements of the previous states of the system as the only input to the

reservoir (i.e., following the procedure of Sec. 2.2) fails to accomplish our goal. We will call

this setup “Setup 1”. Considering the noiseless system given by Eq. (2.3) for γ = 10−5, we train

the reservoir in the “open-loop” configuration (Fig. 2.1(a)) using a set of training data, {xn} for

n ≤ 0 (1000 data points), and then allow the reservoir to autonomously predict a trajectory, xn

for n > 0, forward in time using the “closed-loop” configuration (Fig. 2.1(b)). Table 2.1 displays

the set of hyperparameters used in all tasks in this chapter involving the prediction of the logistic

map. Figure 2.4(a) compares the actual trajectory of the system (black dots) and the reservoir

predicted trajectory (red dots). We see that the reservoir predicted trajectory does not undergo

any of the regime transitions that the actual trajectory undergoes. Instead, the reservoir-predicted

trajectory appears to evolve according to a fixed stationary map. Going back to the description of

the prediction in the stationary case, Sec. 2.2, this is to be expected because the memory of the

reservoir is short compared to the timescale for change of λn (i.e., γ−1) and the training produces

a single fixed Wout which, when used in the closed-loop prediction configuration, produces a

stationary reservoir dynamical system. As such, we can think of the training procedure of Fig.

2.1(a) acting on non-stationary training data as choosing a Wout that in some sense yields the sta-

tionary system that is most consistent with the non-stationary training data −T < t < 0. Figure

2.4(b) shows the map traced out by the reservoir-predicted trajectory (red dots) and by the actual

trajectory (black dots) from Fig. 2.4(a).

26


Table 2.1: Hyperparameters for predicting the logistic map.

Reservoir size N 500
Average number of connections per node ⟨d⟩ 1
Spectral radius of the reservoir’s adjacency matrix ρs 0.55
Maximal coupling strength of the input to the
reservoir

χ 0.23

Tikhonov regularization parameter α 2.15× 10−11

In order to take into account non-stationarity, we must break the stationarity of the closed-

loop dynamical system of Fig. 2.1. Instead of providing only the state of the dynamical system

as input to the reservoir, we also provide the reservoir with the value of the driving parameter λn

of the system during both the training and the prediction. We call this setup, illustrated in Fig.

2.5, “Setup 2”. (A similar scheme was very recently independently formulated in the pre-print

Ref. [106] for studying crises.) We have coupled this parameter to the reservoir in the same way

we couple the measured states of the dynamical system, i.e., each reservoir node is coupled to a

single randomly selected element of the state vector u(t) or to the driving parameter. Using Setup

2, we will now demonstrate the successful application of RC to the prediction of the noiseless

non-stationary logistic map. With an additional modification to this method, we will later extend

its capabilities to successfully predict the stochastic logistic map as well.

We assume we have a set of measurements of the state of the system, as well as knowledge

of the time variation of the parameter, {xn, λn}n=0
n=−T , from some time in the past n = −T to the

present n = 0, and we desire to predict the future climate of this non-stationary system. We note

that the task of prediction where non-stationarity is due to time variation of a known parameter

is often of great practical interest. For instance, suppose we are trying to predict how the Earth’s

climate will change over time and we have a set of measurements which describe the climate

from the past as well as of relevant quantities that we think may directly influence the dynamical

27


Figure 2.5: (a) Open-loop training phase and (b) closed-loop prediction phase of Setup 2. The
rectangular boxes, as for the dashed rectangles in Figs. 2.1(a,b), contain the RC system consisting
of the input layer (Win), the reservoir, and the output layer (Wout).

behavior of the terrestrial climate system, such as atmospheric CO2 concentration [109,110]. We

may be interested in forecasting how the climate would change in the future for different rates of

increase of CO2 concentration in the atmosphere. In that case we may treat CO2 concentration

levels as a known time dependent driving parameter of the system. With this in mind, we train

the reservoir, as shown in Fig. 2.5(a), and then we employ the “closed-loop” configuration of

Fig. 2.5(b) to predict the future states of the system. For the case with γ = 10−5, ϵ = 0, the

results are shown in Fig. 2.4(c). The black dots represent the actual trajectory evolving under

Eqs. (2.3a,b), while the red dots represent the trajectory predicted by the reservoir computer. The

reservoir computer was trained on a portion of the actual trajectory corresponding to the interval

λ ≈ [3.81, 3.82] (T = 1000 data points). Although the reservoir quickly loses the ability to

predict the precise time evolution of the true system orbit, as expected for a chaotic orbit, we see

that it was able to replicate and predict the climate, including the various regime transitions: the

intermittency transition near λ = 3.828 from a chaotic orbit to a stable period three orbit; the

28


cascade of period doubling bifurcations starting at λ ≈ 3.845 leading to chaos near λ = 3.857;

and the crisis transition at the upper boundary of the period 3 window (λ ≈ 3.856). Notice that

the reservoir-predicted trajectory was able to accurately capture the effects of non-stationarity, for

example, in the sudden transition from a period 3 orbit to a period 6 orbit discussed previously.

We have also considered the reverse bifurcations (obtained as λn decreases with time) and they

are well-predicted by the machine learning method; however, we omit these figures here but will

show the corresponding figures for the next example (the Lorenz system, Sec. 2.5). Figure 2.4(d)

shows a similar result, but here the system is drifting at a faster rate (γ = 5× 10−5). We see that

this faster drift alters the details of the transition quite a bit (e.g., we cannot clearly see a period

doubling cascade), while, the reservoir, this time trained on T = 250 data points (over a similar

range of λ as for the previous case), again accurately mimics the dynamics.

We can, more quantifiably, verify that the long-term dynamical behavior predicted by the

reservoir computer is indeed characteristic of that of the true system as follows. We initiate a

trajectory xn from an initial condition randomly selected from the interval (0, 1). We train the

reservoir over a portion of the actual trajectory and then predict forward in time. We repeat

this many times, each time using a trajectory initiated from a different randomly chosen initial

condition. (The number of iterations of this step is made large enough that Γ (defined below)

appears to be well converged, i.e., Γ does not change appreciably when there is an increase in the

number of iterates.) After prediction has started, for each of the actual and predicted trajectories,

we collect those points in the trajectory which are contained in the segment of the trajectory where

λ ∈ [λ∗, λ∗ + ∆λ] for a predetermined small ∆λ and set of λ∗. For each such interval we then

record the orbit points for the actual and predicted trajectories, from which we approximat