ABSTRACT Title of Dissertation: MACHINE LEARNING FOR PREDICTING NON-STATIONARY DYNAMICAL SYSTEMS, AND GLOBAL SUBSEASONAL-TO-SEASONAL WEATHER PHENOMENA Dhruvit Paresh Patel Doctor of Philosophy, 2024 Dissertation Directed by: Professor Edward Ott Department of Physics In this thesis, we are interested in modeling (1) the long-term statistical behavior of non- stationary dynamical systems, and (2) global weather patterns on the Subseasonal-to-Seasonal (S2S) time scale (2 weeks - 6 months). The first part of this thesis is primarily concerned with the situation where we have available to us measured time series data of the past states of the system of interest, and in some cases, a (perhaps inaccurate) scientific-knowledge-based model of the system. The central problem here lies in predicting a future behavior of the system that may be fundamentally different than that observed in the measured time series of its past. We develop machine learning -based methods for accomplishing this task and test it in various challenging scenarios (e.g., predicting future abrupt changes in dynamics mediated by bifurcations encoun- tered that were not included in the training data). We also investigate the effects of dynamical noise in the training data on the predictability of such systems. For the second part of this thesis, we modify a machine learning -based global climate model to predict various weather phenom- ena on the S2S time scale. The model is a hybrid between a purely data-driven machine learning component and an atmospheric general circulation model. We begin by formulating a purely machine learning approach that utilizes the measured time series of the past states of the target non-stationary system, as well as knowledge of the time-dependence of the non-stationarity inducing time-dependent system parameter. We demon- strate that this method can enable the prediction of the future behavior of the non-stationary system even in situations where the future behavior is qualitatively and quantitatively different from the behavior in the training data. For situations where the training data contains dynamical noise, we develop a scheme to enable the trained machine learning model to predict trajecto- ries which mimic the effects of dynamical noise on typical trajectories of the target system. We test our methods on the discrete time logistic map, the continuous time Lorenz system, and the spatiotemporal Kuramoto-Sivashinsky system, and for a variety of non-stationary scenarios. Next, we study the ability of our approach to not only extrapolate to previously unseen dy- namics, but also to regions previously unexplored by the training data of the target system’s state space. We find that while machine learning models can exhibit some capabilities to extrapolate in state space, they fail quickly as the amount of extrapolation required increases (as expected of any purely data-driven extrapolation method). We explore ways in which such failures can be mitigated. For instance, we show that a hybrid model which combines machine learning with a knowledge-based component can provide substantial improvements in extrapolation. We test our methods on the Ikeda map, the Lorenz system, and the Kuramoto-Sivashinsky system, under challenging scenarios (e.g., predicting future hysteretic transitions in dynamical behavior). Finally, we modify a machine learning -based hybrid global climate model to forecast global weather patterns on the S2S time scale. Predicting on this time scale is crucial for many domains (e.g., land, water, and energy management), yet it remains a difficult period to obtain useful predictions for. We demonstrate that our model has useful skill in predicting a number of phenomena including global precipitation anomalies, the El Nino Southern Oscillation and its related teleconnections, and various equatorial waves. MACHINE LEARNING FOR PREDICTING NON-STATIONARY DYNAMICAL SYSTEMS, AND GLOBAL SUBSEASONAL-TO-SEASONAL WEATHER PHENOMENA by Dhruvit Paresh Patel Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2024 Advisory Committee: Professor Edward Ott, Chair/Advisor Professor Brian Hunt Professor Michelle Girvan Professor Rajarshi Roy Professor Thomas M. Antonsen, Jr. © Copyright by Dhruvit Paresh Patel 2024 Foreword This dissertation contains work that was co-authored with faculty members and colleagues at the University of Maryland and at other institutions. The author of this thesis made substantial contributions to the relevant aspects of the jointly authored work included in the dissertation. ii Dedication To Shanta. For everything. iii Acknowledgments I have met and interacted with countless people, and many of these interactions have helped me, directly or indirectly, to arrive at this milestone. I would like to deeply thank all such people. First and foremost, I would like to thank my advisor Professor Edward Ott for guiding me through this research journey consisting of many challenging problems, and helping instill in me foundational research skills that I will carry with me from here onwards. He has helped shape my understanding of how to conduct, manage, and present research in fundamental ways. He has always been sensitive to my needs as a student, a researcher, and a person that has a life and commitments outside of academics. I sincerely wish for every young researcher to have a mentor as competent and understanding as him. Next, I would like to thank Professor Brian Hunt and Professor Michelle Girvan for always providing useful and meaningful inputs and feedback for my work. Their insights have often encouraged me to reformulate my own understanding of core research concepts in very beneficial ways. I have learned a lot from them and I hope I was of, at least some, utility to them in return. My many thanks to them and to Professors Rajarshi Roy and Thomas Antonsen for spending their valuable time and effort in reviewing this dissertation manuscript and in being on my thesis defense committee. I have had the honor and pleasure of collaborating with many knowledgeable researchers. I would like to give special thanks to Alex Wikner for the many helpful discussions and for iv introducing me to our mutual advisor, Professor Ott. I owe many thanks to Professor Istvan Szunyogh at Texas A & M University and Troy Arcomano, work with whom has helped me shape my future research trajectory. In addition, I would also like to thank Andrew Pomerance, Declan Norton, Ravi Chepuri, and Shan Leng for their helpful suggestions and comments on this research. Many thanks are due to Josiland Chambers and Nancy Boone for helping me to seamlessly navigate the requirements of the graduate program and for facilitating so many of the processes that were a necessary part of this journey. I would also like to thank Professor Ki-Yong Kim for being my mentor in the earlier part of this journey; if it weren’t for him, I would likely not be here at the University of Maryland. I would also like to thank Scott Hancock, Robert Schwartz, and Daniel Woodbury for dealing so patiently with my inexperience and mentoring me in the ways of experimental physics. Any words I can think of to thank them would fall short of describing the gratitude I feel towards them. This journey was made so much more feasible, enjoyable, meaningful, and possible thanks to the deep bonds of friendship I made along the way with incredibly smart and talented peers: Adam, Alex, Andrew, Dawid and Aneta, Jake, Jared, Martin, Matt, Nick, Stefano, and Zach. I want to especially thank my friends from home from whom I have not only acquired so much comfort and support during this program, but with whom I have grown up and experienced the different stages of life so far: Ankeet, Eshil, John, Mithulesh, Nikhil, Nisarg, Panth, Priyan, Rohan, Sahil, Shivam, and Swapneel. You guys have been and continue to be an immense source of joy in my life. This section would be entirely incomplete, and this thesis utterly worthless to me, without v acknowledging my family: Shanta, Baa, Pappa, Mummi, Sneha, and Jatin. I will not attempt to express my gratitude to you all in words, because I’m afraid that any attempt at that would do immeasurable injustice to how much I owe you. vi Table of Contents Foreword ii Dedication iii Acknowledgements iv Table of Contents vii List of Tables x List of Figures xiii List of Abbreviations xxi Chapter 1: Introduction and Background 1 1.1 The “What” and the “Why” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 A Short History of Numerical Climate (Change) Modeling. . . . . . . . . . . . . 3 1.3 Challenges Faced by Conventional Climate Models . . . . . . . . . . . . . . . . 6 1.4 An Alternative Approach: Machine Learning . . . . . . . . . . . . . . . . . . . 7 1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5.1 Machine Learning for Modeling Non-Stationary Dynamical Systems . . . 8 1.5.2 How Well Can Machine Learning Models Extrapolate Dynamics? . . . . 9 1.5.3 A Hybrid Machine Learning -based Model for Subseasonal-to-Seasonal Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2: Using Machine Learning To Predict Statistical Properties of Non-Stationary Dynamical Processes 11 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Review of Reservoir Computing (RC) and Its Use for Prediction of Stationary Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 The Non-Stationary Logistic Map . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Application of RC to Climate Prediction of the Non-Stationary Logistic Map . . . 25 2.4.1 The Noiseless Non-stationary Logistic Map . . . . . . . . . . . . . . . . 25 2.4.2 The Noisy Non-stationary Logistic Map . . . . . . . . . . . . . . . . . . 35 2.5 Application of RC to Predicting the Lorenz 63 System . . . . . . . . . . . . . . . 38 2.6 Application of RC to Predicting the Low-Dimensional Dynamics of the Kuramoto- Sivashinsky System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 vii 2.7 Application of RC to Predicting the High-Dimensional Dynamics of the Kuramoto- Sivashinsky System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.10 Data Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Chapter 3: Using Machine Learning to Anticipate Tipping Points and Extrapolate to Post-Tipping Dynamics of Non-Stationary Dynamical Systems 64 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2 Reservoir Computing Background . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.2.1 Setup and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.2.2 Choosing Hyperparameters for Prediction of Non-Stationary Systems . . 73 3.3 Predicting the Tipping Point and Post-Tipping-Point Dynamics in Non-Stationary Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.3.1 Predicting tipping in the Lorenz System . . . . . . . . . . . . . . . . . . 78 3.3.2 Predicting tipping points and evolution of extreme event frequency in the non-stationary Ikeda Map . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3.3 Predicting tipping in the Kuramoto-Sivashinsky equations . . . . . . . . 91 3.3.4 Hysteretic tipping in the Lorenz System: Hybrid ML/knowledge-based prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.5 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 3.6 Data Availability Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 3.7 Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 3.7.1 Choosing appropriate hyperparameters for non-stationary systems . . . . 106 3.7.2 Reservoir Computer Setup for the Ikeda Map Example, Sec. 3.3.2 . . . . 109 3.7.3 Hybrid Reservoir Computer Setup and training for the Lorenz system, Sec. 3.3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.7.4 Hyperparameters used for the examples in the main text . . . . . . . . . . 111 Chapter 4: Exploring the Potential of a Hybrid Model That Combines Machine Learn- ing With Physics-Based Modeling for Subseasonal-to-Seasonal (S2S) Pre- diction. 121 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.2.1 The Global and Local State Vectors . . . . . . . . . . . . . . . . . . . . 124 4.2.2 ML Model Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.2.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 4.2.4 Synchronization and Prediction . . . . . . . . . . . . . . . . . . . . . . . 130 4.2.5 Experimental Design and Choice of Model Hyperparameters . . . . . . . 131 4.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 4.4.1 Forecast Verification Metrics . . . . . . . . . . . . . . . . . . . . . . . . 136 4.4.2 El Nino-Southern Oscillation (ENSO) and its Teleconnections . . . . . . 137 4.4.3 Global Precipitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 viii 4.4.4 Equatorial Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4.5 Conclusions and Outlooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 4.6 Data Availability Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Chapter 5: Concluding Remarks 149 A.1 Analysis of the Noise-Injection Scheme . . . . . . . . . . . . . . . . . . . . . . 154 A.2 Comment on the numerical solution of Eqs. (3.6 a-c) with dynamical noise . . . . 156 Bibliography 157 ix List of Tables 2.1 Hyperparameters for predicting the logistic map. . . . . . . . . . . . . . . . . . . 27 2.2 Hyperparameters for predicting the noiseless Lorenz system. . . . . . . . . . . . 42 2.3 Hyperparameters for predicting the noisy Lorenz system. . . . . . . . . . . . . . 42 2.4 Hyperparameters for predicting the noiseless Kuramoto-Sivashinsky system. . . . 49 2.5 Hyperparameters for predicting the noisy Kuramoto-Sivashinsky system. . . . . . 52 2.6 Hyperparameters for predicting the noiseless high-dimensional Kuramoto-Sivashinsky system dynamics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.7 Hyperparameters for predicting the noisy high-dimensional Kuramoto-Sivashinsky system dynamics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.1 Hyperparameters for predicting the noiseless non-stationary Lorenz system in Fig. 3.3. The spectral radius was searched on the grid [0.50, 0.60, 0.70, 0.80, 0.90, 1.00]. The input coupling strength was searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir ac- tivation bias was searched on the grid [0.00, 1.00]. The Tikhonov regulariza- tion parameter was searched over a logarithmically spaced interval of 10 grid points between 1 × 10−13 and 1 × 10−8. The slope of the linear control signal was searched over a logarithmically spaced interval of 10 grid points between 1×10−6 and 1×10−4. The observational noise strength was searched on the grid [1× 10−5, 5× 10−5, 1× 10−4, 5× 10−4]. . . . . . . . . . . . . . . . . . . . . . 112 3.2 Hyperparameters for predicting the noiseless non-stationary Lorenz system (in Fig. 3.5) with an insufficient parameter-sweep. The spectral radius and input coupling strength were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir activation bias was searched on the grid [0.00, 1.00]. The Tikhonov regularization parameter was searched over a logarithmically spaced interval of 10 grid points between 1×10−13 and 1×10−8. The slope of the linear control signal was searched over a logarithmically spaced interval of 10 grid points between 1× 10−6 and 1× 10−4. The observational noise strength was searched on the grid [1×10−5, 5×10−5, 1× 10−4, 5× 10−4]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 x 3.3 Hyperparameters for predicting the noisy non-stationary Lorenz system (in Fig. 3.5) with an insufficient parameter-sweep. The spectral radius and input cou- pling strength were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leak- age parameter was searched on the grid [0.50, 1.00]. The reservoir activation bias was searched on the grid [0.00, 1.00]. The Tikhonov regularization parameter was searched over a logarithmically spaced interval of 10 grid points between 1× 10−13 and 1× 10−8. The slope of the linear control signal was searched over a logarithmically spaced interval of 10 grid points between 1× 10−6 and 1× 10−4.114 3.4 Hyperparameters for predicting the Ikeda map. The spectral radius and input coupling strength of the parameter input to the reservoir were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00], and the input coupling strength of the dynamical variable input to the reservoir was searched on the grid [0.10, 0.25, 0.50, ..., 2.00, 2.25]. The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir ac- tivation bias was searched on the grid [0.00, 1.00]. The Tikhonov regulariza- tion parameter was searched over a logarithmically spaced interval of 10 grid points between 1 × 10−11 and 1 × 10−6. The slope of the linear control signal was searched over a logarithmically spaced interval of 10 grid points between 1×10−7 and 1×10−5. The observational noise strength was searched on the grid [1× 10−5, 5× 10−5, ..., 1× 10−2, 5× 10−2]. . . . . . . . . . . . . . . . . . . . . 115 3.5 Hyperparameters for predicting the Kuramoto-Sivashinsky equation. The spectral radius and input coupling strengths were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir ac- tivation bias was searched on the grid [0.00, 1.00]. The Tikhonov regularization parameter was searched over a logarithmically spaced interval of 10 grid points between 1 × 10−13 and 1 × 10−8. The slope of the linear control signal was searched on the grid [1× 10−7, 1× 10−6, 1× 10−5, 1× 10−4]. . . . . . . . . . . 116 3.6 Hyperparameters for predicting the noiseless Lorenz system (subcritical Hopf bifurcation). The spectral radius and input coupling strength were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir activation bias was searched on the grid [0.00, 1.00]. The Tikhonov regularization parameter was searched over a loga- rithmically spaced interval of 10 grid points between 1 × 10−13 and 1 × 10−8. The slope of the linear control signal was searched over a logarithmically spaced interval of 10 grid points between 1×10−6 and 1×10−4. The observational noise strength was searched on the grid [1× 10−5, 5× 10−5, 1× 10−4, 5× 10−4]. . . . 117 3.7 Hyperparameters for predicting the noisy Lorenz system (subcritical Hopf bi- furcation). The spectral radius and input coupling strength were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir activation bias was searched on the grid [0.00, 1.00]. The Tikhonov regularization parameter was searched over a loga- rithmically spaced interval of 10 grid points between 1 × 10−13 and 1 × 10−8. The slope of the linear control signal was searched over a logarithmically spaced interval of 10 grid points between 1×10−6 and 1×10−4. The observational noise strength was searched on the grid [1× 10−5, 5× 10−5, 1× 10−4, 5× 10−4]. . . . 118 xi 3.8 Hyperparameters for predicting the noisy Lorenz system (subcritical Hopf bi- furcation) using the hybrid ML model. The spectral radius and input coupling strength were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir activation bias was searched on the grid [0.00, 1.00]. The Tikhonov regularization parameter was searched over a logarithmically spaced interval of 10 grid points between 1× 10−13 and 1× 10−8. The slope of the linear control signal was searched over a logarithmically spaced interval of 10 grid points between 1× 10−6 and 1× 10−4.119 3.9 Hyperparameters for predicting the Lorenz equations in Fig. 3.15. The spectral radius and input coupling strength were searched on the grid [0.10, 0.25, 0.50, 0.75, 1.00]. The leakage parameter was searched on the grid [0.50, 1.00]. The reservoir ac- tivation bias was searched on the grid [0.00, 1.00]. The Tikhonov regulariza- tion parameter was searched over a logarithmically spaced interval of 10 grid points between 1 × 10−13 and 1 × 10−8. The slope of the linear control signal was searched over a logarithmically spaced interval of 10 grid points between 1×10−6 and 1×10−4. The observational noise strength was searched on the grid [1× 10−5, 5× 10−5, 1× 10−4, 5× 10−4]. . . . . . . . . . . . . . . . . . . . . . 120 xii List of Figures 2.1 Schematic of the reservoir computer. (a) The “open-loop” configuration used for training the output weight matrix, Wout. An input sequence, u(t), is coupled into a reservoir of nodes via a matrix Win. The excited reservoir state r(t) is mapped to an output, ũ(t), via a matrix of adjustable weights Wout. (b) The “closed- loop” configuration used for prediction. The reservoir output at one time step is fed as input to the reservoir at the next time step. The dashed red box denotes the “reservoir system”, made up of Win, the reservoir, and Wout. . . . . . . . . . . . 19 2.2 (a) Bifurcation diagram of the logistic map. (b) Zoomed-in view of the period 3 window of the logistic map bifurcation diagram. . . . . . . . . . . . . . . . . . . 22 2.3 A trajectory evolving under the non-stationary logistic map (a) given by Eq. (2.3) for γ = 10−5 (c) given by Eq. (2.3) for γ = 5 × 10−5 and (e) given by Eq. (2.4) for γ = 10−5 and ϵ = 0.002. (b) and (d) show a zoomed-in view of the red boxed areas in (a) and (c), respectively, near the period doubling bifurcation from a period 3 orbit to a period 6 orbit. . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 (a) Comparison of the actual (black dots) and the reservoir-predicted (red dots) trajectories of the logistic map for Setup 1. (b) The logistic map for λ = 3.8 (black) and the map obtained from the reservoir-predicted trajectory (red dots) of (a). (c) and (d) compare the actual and the reservoir-predicted trajectories for γ = 10−5 and γ = 5× 10−5, respectively, for Setup 2. . . . . . . . . . . . . . . . 25 2.5 (a) Open-loop training phase and (b) closed-loop prediction phase of Setup 2. The rectangular boxes, as for the dashed rectangles in Figs. 2.1(a,b), contain the RC system consisting of the input layer (Win), the reservoir, and the output layer (Wout). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.6 (a) Γ(λ) for a set of λ in the period 3 window and ∆λ = 0.001. (b) Comparison of the approximate cumulative probability distributions obtained from real trajecto- ries (solid black, purple, and green curves) and predicted trajectories (dashed red, yellow, and blue curves) for λ∗ 1 = 3.825, λ∗ 2 = 3.845, and λ∗ 3 = 3.865 (vertical dashed lines in (a)). The numerical approximations to fa(x, λ) and fp(x, λ) are taken as the fraction of sample values less than x. The agreement between the ac- tual and predicted cumulative distributions indicates that the reservoir-predicted trajectories accurately represent the dynamics of the different regimes (i.e., the pre-transition chaos, the periodic orbit, and the post-transition chaos). . . . . . . 32 xiii 2.7 (a) Time-dependence of λ. (b) Comparison of an actual trajectory of the sys- tem (black dots) and the corresponding predicted trajectory (red dots). (c) Com- parison of the approximate cumulative probability distributions obtained from real trajectories (solid black, purple, and green curves) and predicted trajectories (dashed red, yellow, and blue curves) for θ∗1 = π, θ∗2 = 1.5π, and θ∗3 = 2.25π (λ∗ 1 = 3.8253, λ∗ 2 = 3.8111, and λ∗ 3 = 3.8858, respectively) corresponding to the vertical green dashed lines in (d). (d) Γ(θ) over the prediction interval. . . . . . . 34 2.8 A comparison of trajectories evolving under the noisy logistic map (black dots) and those predicted by the reservoir (red dots that overlay the black dots) for two noise levels (a) ϵ = 0.0005 and (b) ϵ = 0.002. . . . . . . . . . . . . . . . . . . . 36 2.9 (a) Diagrammatic representation of the action of the noisy logistic map, using the intermediate variable vn. (b) Diagram depicting how to approximate the dynam- ics of the noisy logistic map using a trained reservoir. . . . . . . . . . . . . . . . 37 2.10 (a,b) Comparison of trajectories evolving under the noisy logistic map (black dots) and those predicted by the reservoir (red dots that overlay the black dots) operating under the modified scheme with noise-injection (Fig. 2.9(b)) for two noise levels in the actual system (a) ϵ = 0.0005 and (b) ϵ = 0.002. (c) Γ(λ) over the prediction interval. (d) Approximate cumulative probability distributions from actual trajectories (solid curves) and predicted trajectories (dashed curves) for λ∗ 1 = 3.8325, λ∗ 2 = 3.845, and λ∗ 3 = 3.865 and ∆λ = 0.001 for ϵ = 0.002. . . 38 2.11 (a) Lorenz system bifurcation diagram for the interval ρ = [25, 200]. (b) a zoomed-in view of the window in the dotted red oval of (a): (1) indicates the crisis transition from a one-piece chaotic attractor to a two-piece chaotic attractor (2) (and the translucent yellow region) indicates the region of pairwise chaotic band splittings (3) (and the translucent blue region) shows the region of period halving cascade (4) indicates the intermittency transition from a period 2 orbit to chaos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.12 a) (zm+1 − zm) versus ρ for the true system (black dots) compared with that pre- dicted by the reservoir (red dots). (b) Zoomed-in view of the region enclosed by the dashed blue oval in (a). (c) Cumulative probability distributions of z max- ima for the true trajectories (solid curves) and the reservoir-predicted trajectories (dashed curves) for ρ∗1 = 139, ρ∗2 = 149, ρ∗3 = 159, ρ∗4 = 169.4, and ∆ρ = 0.5. (d) Γ(ρ) over the prediction interval. The vertical dashed lines indicate the points at which the cumulative probability distributions are shown in (c). . . . . . . . . 43 2.13 (zm+1−zm) versus ρ from numerical solution of the Lorenz equations (black dots) compared with the reservoir prediction (red dots that overlay the black dots). (b) Zoomed-in view of the region enclosed by the dashed blue oval in (a). (c) Cu- mulative probability distributions of zm from an ensemble of true (solid curves) and reservoir-predicted trajectories (dashed curves) for ρ∗1 = 139, ρ∗2 = 149, ρ∗3 = 159, ρ∗4 = 168.5, and ∆ρ = 0.5. (d) Γ(ρ) versus ρ over the prediction interval (for case with ρ(t) decreasing in time). The vertical dashed lines indicate the ρ values for which the cumulative probability distributions are shown in (c). (e) Γ(ρ) for the case of the same noisy Lorenz system, but with ρ(t) increasing with time (γ = 0.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 xiv 2.14 (a) ρ(t) as a function of t. (b) (zm+1− zm) versus γt for the true trajectory (black dots) and for the reservoir prediction (red dots that overlay the black dots). (c) A zoomed-in view of the oval region indicated in (b). (d) Γ(θ) over the prediction interval. The vertical dashed lines indicate the points at which the cumulative probability distributions are shown in (e). (e) Cumulative distributions of the map obtained from the real trajectory (solid curves) and from the reservoir-predicted trajectory (dashed curves) at θ∗1 = π, θ∗2 = 1.8π, and θ∗3 = 2.9π with ∆θ = 0.05. . 46 2.15 (a) Bifurcation diagram of the stationary Kuramoto-Sivashinsky system (Eq. (2.10)). (b) Comparison of the actual a6(tn) (black dots) for a numerically integrated tra- jectory of Eq. (2.13) and the corresponding reservoir-predicted a6(tn) (red dots overlaying the black dots) for a linear drift in the parameter κ. (c) Γ(κ) over the period 5 prediction window. The dashed vertical green lines indicate the points at which the cumulative probability distributions are shown in (d). (d) Cumulative probability distributions of the map obtained from the numerical solutions of Eq (2.13) (solid curves) and from the reservoir predicted trajectories (dashed curves) at κ∗ 1 = 0.0298400, κ∗ 2 = 0.0298500, κ∗ 3 = 0.0298565, and ∆κ = 5 × 10−7 . (e) a6(tn) versus κ for a numerically determined trajectory of Eq. (2.13) (black dots) and that predicted by the trained reservoir (red dots). (f) Γ(κ) over the prediction window for the case in (e). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.16 A typical solution of the stationary KS system (Eqs. (2.16)) for κ = 0.0816. Time is plotted horizontally, x vertically, and the value of w is color-coded (and normalized by its root-mean-square over time). . . . . . . . . . . . . . . . . . . 54 2.17 Bifurcation diagram of the stationary KS system (Eq. (2.16)) obtained by observ- ing the value of w at the 40th spatial grid point (corresponding to x = 3.9270) for w = 0 at the 1st spatial grid point (corresponding to x = 0.0982) . . . . . . . . . 54 2.18 (a) A typical solution of the non-stationary, noiseless KS system (Eqs. (2.16) and (2.17)) obtained by numerical integration (top panel) and the correspond- ing reservoir-computer-predicted orbit (bottom panel) trained on the numerically integrated trajectory for t < 0. Horizontal axis shows the time and the verti- cal axis shows x. (b) Poincare map of w40 of the numerically integrated KS trajectory and reservoir-computer-predicted trajectory in (a) shown in black and red dots, respectively. (c) Cumulative probability distributions of the Poincare map points obtained from the numerical solutions of Eqs. (2.16) and (2.17) (solid curves) and from the reservoir predicted trajectories (dashed curves) at t∗1 = 25, t∗2 = 83.5, t∗3 = 125 (which correspond to κ∗ 1 ≈ 0.08034, κ∗ 2 ≈ 0.07948, and κ∗ 3 ≈= 0.07888, respectively) and ∆t = 8.4. (d) Γ(t) versus t over the pre- diction window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 xv 2.19 (a) A typical solution of the non-stationary, noisy KS system (Eqs. (2.16) and (2.17)) obtained by numerical integration (top panel) and the corresponding reservoir- computer-predicted orbit (bottom panel) trained on the numerically integrated tra- jectory for t < 0. Horizontal axis shows the time and the vertical axis shows x. (b) Poincare map of w40 of the numerically integrated KS trajectory and reservoir- computer-predicted trajectory in (a) shown in black and red dots, respectively. (c) Cumulative probability distributions of the Poincare map points obtained from the numerical solutions of Eqs. (2.16) and (2.17) (solid curves) and from the reservoir predicted trajectories (dashed curves) at t∗1 = 25, t∗2 = 66.8, t∗3 = 125 (which correspond to κ∗ 1 ≈ 0.08034, κ∗ 2 ≈ 0.07973, and κ∗ 3 ≈= 0.07888, respec- tively) and ∆t = 8.4. (d) Γ(t) versus t over the prediction window. . . . . . . . . 59 3.1 Partition of the training data into the long validation set (t ∈ [−T3,−T2)), training set (t ∈ [−T2,−T1]), and short validation set (t ∈ (−T1, 0]) for the proposed hyperparameter optimization scheme. . . . . . . . . . . . . . . . . . . . . . . . 73 3.2 The three-dimensional trajectory of the noiseless non-stationary Lorenz system (given by Eqs. (3.6) for ρ0 = 154, ρ1 = 8, and τ = 100 over the time interval t ∈ [−600, 120]) from a randomly chosen initial condition. The red and black curves correspond to the periodic trajectory for t ≤ 0 and the chaotic trajectory for t > 0, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.3 (a) Γ(t) versus time over the prediction window. Vertical dashed blue lines la- beled (c), (d), (e), and (f) indicate the locations for which the true and predicted z′m point cumulative probability distributions are shown in panels (c), (d), (e), and (f), respectively. (b) The z′m points of an example true system trajectory (black dots) and the corresponding ML prediction (red dots). The vertical dashed green line indicates the starting point of the ML prediction. The vertical dashed blue lines indicate the same times as in panel (a). Panels (c), (d), (e), and (f) show the true (solid black curves) and ML-predicted (dashed red curves) z′m cumulative probability distributions at t = 5, 60, 80, and 110, respectively. . . . . . . . . . . 82 3.4 (a) Γ(t) versus time over the prediction window. Vertical dashed blue lines la- beled (c), (d), (e), and (f) indicate the locations for which the true and predicted z′m point cumulative probability distributions are shown in panels (c), (d), (e), and (f), respectively. (b) The z′m points of an example true system trajectory (black dots) and the corresponding ML prediction (red dots). The vertical dashed green line indicates the starting point of the ML prediction. The vertical dashed blue lines indicate the same as in panel (a). Panels (c), (d), (e), and (f) show the true (solid black curves) and ML-predicted (dashed red curves) z′m point cumulative probability distributions at t = 50, 120, 200, and 300, respectively. . . . . . . . . 84 xvi 3.5 (a) Γ(t) vs time over the prediction window. Vertical dashed blue lines labeled (c), (d), (e), and (f) indicate the locations for which the true and predicted z′m point cumulative probability distributions are shown in panels (c), (d), (e), and (f), respectively. (b) The z′m points of an example true system trajectory (black dots) and the corresponding ML prediction (red dots). The vertical dashed green line indicates the starting point of the ML prediction. The vertical dashed blue lines indicate the same as in panel (a). Panels (c), (d), (e), and (f) show the true (solid black curves) and ML-predicted (dashed red curves) z′m map point cumulative probability distributions at t = 50, 120, 200, and 300, respectively. . . 86 3.6 Phase portraits of the snapshot attractor of the true system at (a) n = −15000, (b) n = −5000, (c) n = 2500, (e) n = 6000, (g) n = 12500 and corresponding ML prediction at (d) n = 2500, (f) n = 6000, and (h) n = 125000. . . . . . . . . . . 90 3.7 (a) Γ(n) vs n over the prediction window. Vertical dashed blue lines labeled (c), (d), (e), and (f) indicate the locations for which the true and predicted y′n cumula- tive probability distributions are shown in panels (c), (d), (e), and (f), respectively. (b) x′ n (top) and y′n (bottom) of an example true system trajectory (black dots) and the corresponding ML prediction (red dots). The vertical dashed green line in- dicates the starting point of the ML prediction. The vertical dashed blue lines indicate the same as in panel (a). Panels (c), (d), (e), and (f) show the true (solid black curves) and ML-predicted (dashed red curves) y′n cumulative probability distributions at n = 2500, 6000, 10000, and 12500, respectively. . . . . . . . . . . 92 3.8 Bifurcation diagram of the stationary Kuramoto-Sivashinsky equation for κ ∈ [0.076, 0.0816] obtained by the w40-Poincare map points. . . . . . . . . . . . . . 93 3.9 (a) (Top panel) Γ(t) over the prediction window. Vertical dashed blue lines la- beled (b), (c), (d), and (e) indicate the locations for which the true and predicted w′ 40-Poincare map point cumulative probability distributions are shown in pan- els (b), (c), (d), and (e), respectively. (Middle Panel) An example of a typical numerically integrated system trajectory. (Bottom panel) The ML-prediction cor- responding to the example trajectory in the middle panel. (b), (c), (d), and (e) show the true (solid black curves) and ML-predicted (dashed red curves) w′ 40- Poincare map point cumulative probability distributions at t = 7.51, 37.64, 60.23, and 97.88, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.10 (a) (Top panel) Γ(t) over the prediction window. Vertical dashed blue lines la- beled (b), (c), (d), and (e) indicate the locations for which the true and predicted w′ 40-Poincare map point cumulative probability distributions are shown in pan- els (b), (c), (d), and (e), respectively. (Middle Panel) An example of a typical numerically integrated system trajectory. (Bottom panel) The ML-prediction cor- responding to the example trajectory in the middle panel. (b), (c), (d), and (e) show the true (solid black curves) and ML-predicted (dashed red curves) w′ 40- Poincare map point cumulative probability distributions at t = 7.51, 30.10, 60.23, and 97.88, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.11 (a) For ρ slightly less than the critical bifurcation value an attracting chaotic or- bit of the Lorenz system (black curve) coexists with two fixed point attractors (blue dots) (representing the convective fluid flow patterns of the Lorenz model in panels (b) and (c)). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 xvii 3.12 (a) z′(t) of an example true system orbit (black curve) and of the correspond- ing ML prediction (red curve) for the case of the noiseless Lorenz system. The vertical green dashed line indicates the start of the ML prediction. To describe panel (b), we first define the quantity z′m as z-coordinate values that occur when the noisy trajectory crosses the surface x = βz. This surface corresponds to the surface of section of the chaotic noiseless system at local z(t) maxima of the noiseless trajectory, which, from Eq. 3.6, occur at dz/dt = xy − βz = 0. Panel (b) shows the z′m values of an example true noisy Lorenz system orbit (black dots) versus t and of the corresponding ML prediction (red dots). The vertical green dashed line marks the start of the ML prediction, and the vertical blue dashed line marks the time near which the ML-predicted trajectory leaves the plotted range due to instability of the ML prediction dynamics. Note that, in these panels, the two noiseless fixed points (x = y = ± √ β(ρ− 1), z = ρ − 1) have the same z-coordinate and would appear as a single curve. . . . . . . . . . . . . . . . . . . 100 3.13 Stationary bifurcation diagrams showing the hysteretic subcritical Hopf bifurca- tion for the true noiseless Lorenz system (top panel) and the inaccurate knowledge- based model (bottom panel) for ρ ∈ [22.5, 45], where the bifurcation plots are constructed by plotting z∗ against ρ. For fixed values of ρ, z∗ are either the max- imum z value over long trajectories (for orbits on the chaotic attractor) or the z value for the fixed point attractor. The bottom horizontal axis labeled t gives the times at which the non-stationary system ρ value passes through the ρ values on the ρ axis immediately above this t axis. . . . . . . . . . . . . . . . . . . . . . . 102 3.14 (a) (Top panel) Γ(t) over the prediction window. Vertical dashed blue lines la- beled (b), (c), (d), and (e) indicate the locations for which the true and predicted z′m point cumulative probability distributions are shown in panels (b), (c), (d), and (e), respectively. (See caption of Fig. 3.12(b) for the definition of z′m.) (Bot- tom Panel) The z′m points of a typical true system trajectory (black dots), and the corresponding ML/knowledge-based model prediction (red dots). (b), (c), (d), and (e) show the true (solid black curves) and ML/knowledge-based model predicted (dashed red curves) z′m point cumulative probability distributions at t = 10, 40, 65, and 110, respectively. . . . . . . . . . . . . . . . . . . . . . . . . 104 3.15 (a) Γ(t) over the prediction window. Vertical dashed blue lines labeled (c), (d), (e), and (f) indicate the locations for which the true and predicted z′m point cumu- lative probability distributions are shown in panels (c), (d), (e), and (f), respec- tively. (b) The z′m points of an example true system trajectory (black dots) and the corresponding ML prediction (red dots). The vertical dashed green line indicates the starting point of the ML prediction. The vertical dashed blue lines indicate the same as in panel (a). (c), (d), (e), and (f) show the true (solid black curves) and ML-predicted (dashed red curves) z′m point cumulative probability distributions at t = 20, 60, 85, and 110, respectively. . . . . . . . . . . . . . . . . . . . . . . . 107 3.16 Data-driven-knowledge-based hybrid ML model architecture. . . . . . . . . . . . 110 4.1 Illustration of the geographic domain decomposition for a local input vector u(in) l (with grid points enclosed by a red box), and a local output vector u(out) l (with grid points enclosed by a blue box). . . . . . . . . . . . . . . . . . . . . . . . . 125 xviii 4.2 Model architecture flow chart. The notation is explained in Sec. 4.2.1. Red boxes represent operations that are performed locally (and in parallel) on each of the L = 1, 152 sub-domains. Blue boxes represent operations for which the outputs are global state vectors. The appropriate input data from (ua G(t),v a G(t)) ((uH G (t),vG(t))) during training (prediction) is fed into SPEEDY, the atmospheric ML, and the ocean ML. For both the hybrid atmospheric and the ML-only ocean models, local forecasts (uH l (t +∆t),vl(t +∆t′)) are stitched together to obtain the global forecasts (uH G (t + ∆t),vG(t + ∆t′)). During prediction, the global forecasts are fed back in as inputs at the next time step. In practice, the ocean model time step is larger than the atmospheric model time step and for time steps where the ocean model is not advanced, the ocean model state is held constant. . 127 4.3 The all-season Pearson correlation coefficients and root-mean-square errors be- tween the predicted and observation-based Nino 3.4 Index (a), Nino 3 Index (b), and Nino 4 Index (c) as a function of lead time in months. Solid (dashed) blue curves show the PCC for our model’s (persistence model’s) predictions. Dashed green curve shows the RMSE for the climatology model’s predictions. Solid (dashed) red curves show the RMSE for our model’s (persistence model’s) pre- dictions. (d), (e), and (f) show the PCC for our model’s prediction as a function of the prediction lead time (in months) and the prediction start month for Nino 3.4 Index, Nino 3 Index, and Nino 4 Index, respectively. . . . . . . . . . . . . . . 139 4.4 The top row shows the all-season concurrent PCC between the Nino 3.4 Index and the 3-monthly averaged SST anomalies. The bottom row shows the all-season lagged PCC between the Nino 3.4 Index and the 3-monthly averaged SST anoma- lies lagging by 3 months. The left column (with panels titled “Hybrid”) shows the correlation between the predicted Nino 3.4 Index and predicted SST anomaly values and the right column (with panels titled “ERA5”) shows it for observation- based data. Black dots indicate grid points for which the PCC was found to be statistically significant at the 95% confidence level. . . . . . . . . . . . . . . . . 141 4.5 The left (right) column, with the label “Hybrid” (“ERA5”) shows the Pearson correlation coefficients between the precipitation anomalies and the Nino 3.4 Index predicted by our model (obtained from ERA5 data) at a prediction lead time of 1 season. The rows, in order from the top-most to the bottom-most, cor- respond to results for the December-January-February (DJF), March-April-May (MAM), June-July-August (JJA), and September-October-November (SON) sea- sons. Black dots indicate grid points for which the PCC was found to be statisti- cally significant at the 95% confidence level. . . . . . . . . . . . . . . . . . . . . 143 4.6 The Pearson correlation coefficients between the precipitation anomalies pre- dicted by our model and those obtained from ERA5 data at prediction lead times of 1, 2, 3, and 4 weeks, from left to right in that order. . . . . . . . . . . . . . . . 144 xix 4.7 The top (bottom) row shows the Wheeler-Kiladis diagram obtained by using the precipitation in the 15◦S-15◦N latitude band predicted by our model (obtained from ERA5 data). The left (right) column shows the normalized symmetric (anti- symmetric) components of the wavenumber-frequency analysis. The horizontal axis of each plot represents the zonal wavenumber and the vertical axis represents the frequency (cycles per day). The solid black lines indicate solutions to the dispersion relations from an analytical model of shallow water waves. . . . . . . 146 1 (a) and (b) show comparisons of ζ̃ (black lines), ζ (blue line), and ζ∆ (green line) for different noise levels (widths of ζ(ϵ)), 0.002 and 0.0005, respectively. . . . . . 154 xx List of Abbreviations AGCM Atmospheric General Circulation Model CFSv2 (NCEP’s) Coupled Forecast System Model version 2 CHyPP Combined Hybrid-Parallel Prediction CMIP Coupled Model Intercomparison Project DARPA Defense Advanced Research Projects Agency DJF December-January-February DL Deep Learning E3SM (Department of Energy’s) Energy Exascale Earth System Model ECMWF European Centre for Medium-Range Weather Forecasts ENSO El Nino Southern Oscillation ER Equatorial Rossby ERA5 ECMWF Reanalysis v5 GCM General Circulation Model HPC High Performance Computing IFS (ECMWF’s) Integrated Forecasting System IPCC Intergovernmental Panel of Climate Change JJA June-July-August KS Kuramoto-Sivashinsky (equation) MAM March-April-May MJO Madden-Julian Oscillation ML Machine Learning NCEP National Centers for Environmental Prediction NSF National Science Foundation NWP Numerical Weather Prediction ORAS5 Ocean Reanalysis System 5 PCC Pearson Correlation Coefficient RC Reservoir Computing RMSE Root-Mean-Square-Error S2S Subseasonal-to-Seasonal SON September-October-November SOTA State-of-the-art SPEEDY Simplified Parameterization, primitive-Equation Dynamics (model) SST Sea Surface Temperature xxi Chapter 1: Introduction and Background 1.1 The “What” and the “Why” This thesis consists of two parts: 1. The first part is concerned with developing, implementing, and testing machine learning (ML) -based approaches for modeling the long-term time evolution of non-stationary dy- namical systems. 2. The second part deals with implementing and assessing a ML-based approach for predict- ing the global atmospheric and oceanic dynamics on the Subseasonal-to-Seasonal (S2S) time scale (2 weeks - 6 months) [1, 2]. In this thesis, a “non-stationary” system refers to a system which is itself changing with time due to, e.g., time-dependent system parameters. While modeling such systems is a generally important problem in many fields, the ultimate goal of the work in this thesis is to devise and test methods for their eventual potential application to the monumental challenge of predicting global climate change. In this way, the two parts of this thesis are related since we would like to devise methods for predicting non-stationary dynamical systems that can then be extended/applied to models (like the one used in the second part of this thesis) for predicting the global atmospheric and oceanic dynamics under climate change. 1 Many real-world systems are non-stationary, and therefore may exhibit widely different behaviors, on time scales of interest. For example, the dynamics of financial markets may change significantly over days and months due to politics, policy, or even illegal activities such as market manipulation [3–6], engineering systems may undergo structural/mechanical/electrical failures over periods of time shorter than the expected system lifetime due to sudden or gradual changes in the environment [7–13], and ecosystems may experience abrupt changes due to human activ- ity [14–23]. Depending on the context, predicting such behavioral shifts in systems sufficiently ahead of time is critical for mitigating financial damage, promoting societal well-being, and pre- serving the natural world. The terrestrial climate system, defined by the Intergovernmental Panel on Climate Change (IPCC) as consisting of “... the atmosphere, the hydrosphere, the cryosphere, the lithiosphere and the biosphere and the interactions between them” [24], is a prominent example of a non- stationary system. This non-stationarity, whether attributed to natural internal processes or ex- ternal forcings such as volcanic eruptions and persistent anthropogenic activities (e.g., green- house gas emissions, and deforestation), is termed climate change [24]. The effects of climate change, such as an increase in the frequency of extreme weather events (e.g., extreme precipita- tion, heat waves, droughts, floods, and tropical cyclones), are already being felt in many parts of the world [25]. For instance, in the United States, there are now three times as many heatwaves per year as there were in the 1960s - and on average, they occur for longer duration and with greater intensity [26]. In addition, simultaneously occurring extreme events can compound to amplify each other’s severity and such compounding events may become more common under climate change [27–29]. Climate change is expected to have a significant impact on key sectors like water (e.g., due to increased drought/flood risks), food (e.g., due to the impacts of altered 2 precipitation patterns on agriculture), and health (e.g., increased deaths due to natural disasters and increased spread of vector-borne diseases) [30, 31]. In addition to an increase in the fre- quency of extreme weather events in many regions of the world, climate change may also drive various sub-systems of the Earth system past tipping points [32] beyond which the dynamics of those sub-systems may change abruptly (e.g., the shutdown of the thermohaline circulation in the Atlantic Ocean [33, 34]). Thus, planning for long-term climate adaptation requires us to be able to predict the Earth’s climate in the far future when the atmospheric and oceanic conditions may be significantly different from those of the present and those observed in the past. At the same time, effective planning for mitigating damages due to weather extremes also requires fore- casting weather conditions at lead times in the S2S time scale. Being able to predict at this time scale is especially crucial for land, water, and energy management. For instance, S2S forecasts are heavily relied upon for management of fisheries, water supply, agricultural resources, energy generation, coastal flooding preparedness, and wildfire risk [35, 36]. This thesis aims to investigate crucial aspects related to both of these issues by developing methods that can enable prediction of the long-term statistics of non-stationary dynamical sys- tems (with the aim of eventually applying those methods for generating climate projections), and developing a global climate model to perform predictions on the S2S time scale. 1.2 A Short History of Numerical Climate (Change) Modeling. Since one of the objectives of this thesis is to devise and test ML methods for modeling non-stationary systems (which is central to modeling climate change), and the ultimate goal of this research is to extend the results in this thesis to the problem of climate change prediction, 3 we will begin by giving a brief history of numerical climate modeling. The aim of this is for us to arrive at an understanding of some of the challenges faced by current state-of-the-art (SOTA) conventional climate models and how the work presented in this thesis can help guide future research. The early stages of numerical climate modeling mirrored that of numerical weather pre- diction (NWP), which itself closely shadowed the developments in computer technology. In the early 1900s, Norwegian physicist Vilhelm Bjerknes and American meteorologist Cleveland Abbe independently recognized that principles of hydrodynamics and thermodynamics can be used to forecast the time evolution of the atmospheric state [37, 38]. They independently formulated a two-step approach: a diagnostic step to obtain information about the initial state, and a prognos- tic step to use governing equations to advance the initial state to a future state [39]. In 1908, Felix Exner made a first attempt at using the laws of physics to produce a quantitative weather forecast by developing a reduced model with restricted capabilities. Shortly thereafter, Lewis Fry Richardson published the first description of a method to directly solve differential equa- tions to produce weather forecasts [39, 40]. The computing technology necessary to implement such a scheme, however, lagged behind by a couple of decades. As a result, Richardson’s work was largely forgotten by his contemporaries. Several decades later, advances in technology for making observations of the atmosphere (e.g., the invention of the radiosonde), and advances in the theory of meteorology, numerical analysis (e.g., the design of numerically stable algorithms), and the design of digital computers all led to the first weather prediction using a computer [41]. Due to his contributions, Richardson is known by many in the atmospheric science community as the father of modern climate models. Indeed, he had key insights that would be critical to climate modeling, e.g., while components of the Earth system such as the ocean may be ignored 4 for weather forecasting, they would need proper treatment when addressing climate modeling. One might erroneously assume that the chronology of climate models is such that the sim- pler models were developed first and then, successively, more complex models were formulated. Some of the first climate models (developed in the 1960s) were in fact atmospheric general cir- culation models (AGCMs) derived directly from NWP models [40]. Scientists interested in ex- panding these models to encompass hemispheric or global domains (combined with studies con- cerning the radiative and thermal equilibrium of the Earth system) led to the development of the Radiative Convective models [42]. Later on, Budyko [43] and Sellers [44] proposed Energy Bal- ance Models, which were much simpler and required fewer computing resources (owing to their coarser grids and simpler parameterizations instead of solving dynamical flow equations). These relatively lightweight models enabled scientists to perform long-term climate studies, some of which led to the discovery that there may exist multiple stable climate states [40]. They were also used to probe the climate response to atmospheric disturbances like volcanic eruptions and an increase in the atmospheric CO2 concentration levels [45]. With increasing computing capa- bilities, by the 1980s it appeared as though AGCMs were beginning to replace simpler climate models. By the 1990s, AGCM ensemble methods were being used to study internal climate variability [46]. As computer technology has advanced, so has the complexity of modern climate models. For instance, more detailed components such as multi-layered oceans, atmospheric chemistry, and carbon budgeting are now commonly added to create coupled ocean-atmosphere general circulation models (GCMs). Despite such innovation, in order to feasibly run GCMs for climate time scale integration times, modern GCMs are still simplifications of our knowledge of oceanic and atmospheric dynamics. Such simplifications manifest as, e.g., parameterization schemes for 5 processes which evolve on temporal and spatial scales that are shorter or longer than the scales which are explicitly accounted for in developing the model. Such decisions usually involve a number of tunable parameters that must be determined from theory or observations. 1.3 Challenges Faced by Conventional Climate Models Substantial progress has been made in developing GCMs in the last several decades, and GCMs now encode numerous highly complicated processes in the Earth system involving the atmosphere, the ocean, and the marine and terrestrial biosphere [40, 47]. In addition, modern GCMs are often operated at much higher spatial resolutions (by at least a factor of about 10) in the horizontal and vertical directions than the model used by Manabe and Wetherald in 1975 to study the effects of doubling the concentration of atmospheric CO2 [48]. Despite such advances, many scientists are wondering if GCMs are really the way forward [47]. Encoding the differ- ent processes into GCMs is done by introducing highly complex parameterization schemes with many tunable parameters. Tuning these parameters is a computationally challenging task (some- times requiring on the order of 100 years of simulations [47]), and often a source of significant parameteric uncertainty. In addition, structural uncertainty can be introduced if the form of the parameterization is itself flawed [47]. Indeed, uncertainty bounds due to parameteric and struc- tural uncertainties have increased in the last two climate model assessment cycles [49]. Faced with such challenges, many scientists in the field are suggesting that further improvements in spatial resolution are need (e.g., to resolve small-scale turbulence), but this would require a sig- nificant boost to current computing capabilities [50]. To make matters worse, once such a model is developed and tuned, in order to make useful climate projections, it will need to produce an 6 ensemble forecast on climate time scales. Large ensemble sizes may be required for character- izing the probability of weather/climate extremes in the future. For instance, while conventional climate models (like GCMs) typically use only a small number of ensemble members to make climate projections (e.g., Coupled Model Intercomparison Project (CMIP) models typically use 3-10 members), much larger ensemble sizes (on the order of at least several hundred) are needed to sample internal variability at regional spatial scales [51, 52]. In addition, because many future hypothetical climate scenarios need to be tested for climate adaptation planning, it is not feasible to use conventional climate models for this task given their computational costs. (The models used for S2S forecasting are typically similar to conventional climate models [53], and therefore they inherit many of the same problems described above for climate models.) 1.4 An Alternative Approach: Machine Learning Recent advances in ML have enabled the creation of deep learning (DL) weather forecast- ing models that are on par (and in some cases superior) in performance to physics-based NWP models in the short- to medium-range (7-14 days) regime, and can require up to 10, 000x fewer computing resources during inference [54–57]. In principle, this can allow one to efficiently generate an ensemble of long-term climate projections to obtain climate statistics. In practice, however, there are limitations. Most SOTA DL weather forecasting models are not long-term sta- ble, or at least physical, and so their behavior at long integration times has not been studied [58]. Even if they are modified to be long-term stable, there is no guarantee, given their purely data- driven nature, that they will produce accurate long-term statistics of the Earth’s climate system. We note, however, a class of physics/ML -based hybrid models have been proposed which are 7 long-term stable and are able to produce realistic climate statistics [59, 60]. While these mod- els are able to generate long-term statistics that match past observed climate data, it is generally unclear how well ML models will perform when they will be required to extrapolate to future cli- mate scenarios for which the atmospheric and oceanic dynamics are significantly different from those the models were trained on. In other words, while ML-based models offer a computa- tionally cheap alternative for climate modeling, little attention has been given to their ability to extrapolate to out-of-distribution dynamics - a key component of extrapolating to future climate scenarios. 1.5 Thesis Outline In this thesis, we attempt to execute the following two research agendas: (1) devise ML- based methods for modeling the long-term statistical properties of non-stationary dynamical sys- tems, test the limitations of such methods, and when they fail, investigate how can these failures be addressed, and (2) modify a physics/ML -based hybrid global climate model for generating useful forecasts on the S2S time scale. 1.5.1 Machine Learning for Modeling Non-Stationary Dynamical Systems In Chapter 2, we develop and test ML-based methods for predicting the long-term statisti- cal properties of non-stationary dynamical systems. We study the utility of our devised methods in scenarios where the future behavior of the target system to be predicted is fundamentally dif- ferent (e.g., periodic orbits) from the behavior on which the ML models were trained on (e.g., chaotic orbits) due to non-stationarity induced by the time-dependence of a parameter of the sys- 8 tem. We also investigate the effect on the predictability of such non-stationary systems when dynamical noise is present in the training data. We introduce a method that, for dynamical noise of small amplitudes, enables our trained models to reproduce the effects of that dynamical noise on typical trajectories of the system during inference. We begin by illustrating our methods on the one-dimensional discrete-time logistic map [61]. We further test our methods on the continuous- time three-dimensional Lorenz system [62], and the spatiotemporal Kuramoto-Sivashinsky sys- tem [63, 64]. This chapter was published in the journal Chaos [65]. 1.5.2 How Well Can Machine Learning Models Extrapolate Dynamics? In Chapter 3, we begin by exploring the limitations of our methods from Chapter 2 when faced with the combined difficulties of having to generalize to previously unseen dynamics (as done in Chapter 2), and to generalize to previously unexplored regions of the system state space. We find that while the ML methods we devised have some utility in the first case, they can easily fail in the second case, especially (as one would expect with any method designed for ex- trapolation) when the amount of extrapolation required increases. We test these scenarios using the three-dimensional Lorenz system [62], the two-dimensional Ikeda map [66], and the spa- tiotemporal Kuramoto-Sivashinksy system [63, 64]. In addition, we introduce a hybrid approach to modeling non-stationary systems that combines an ML component with a, possibly faulty, knowledge-based component. We find that such a hybrid approach can substantially improve the trained model’s ability to generate useful predictions even in situations which require significant amounts of extrapolation. We also explore the effect of dynamical noise on the predictability of future tipping points and extrapolation to post-tipping point dynamics, and find that the presence 9 of dynamical noise can be beneficial. This chapter was published in the journal Chaos [67]. 1.5.3 A Hybrid Machine Learning -based Model for Subseasonal-to-Seasonal Forecasting In Chapter 4, we modify a physics/ML -based hybrid global climate model developed by Arcomano et al. (see [59]) and assess its utility in generating forecasts at the S2S time scale. The model consists of an atmospheric component that is a hybrid between a low-resolution AGCM and an ML model, and an ML-only ocean component. The two components are coupled by exchanging various variables, e.g., the atmospheric component receives the sea surface tempera- tures (SSTs) from the ocean model as input, while the ocean component receives the two compo- nents of the horizontal wind vector, temperature, specific humidity, and surface pressure from the lowest atmospheric model level as input. It was previously found that such a hybrid model can not only perform better than either the physics-based or ML-only version operating alone [68], but can also capture processes not captured by the physics-based model component [59]. We investigate the ability of our model to capture various ocean-atmopshere phenomenon on the S2S time scale, e.g., the El Nino Southern Oscillation (ENSO) and its teleconnections, precipitation anomalies, and equatorial waves. We find that our model has useful skill in predicting such phenomenon on the S2S time scale. 10 Chapter 2: Using Machine Learning To Predict Statistical Properties of Non- Stationary Dynamical Processes: System Climate, Regime Transitions, and the Effect of Stochasticity 2.1 Introduction Predicting the time evolution of a dynamical system is a central requirement for enabling many applications. Moreover, it is often the case that first principles knowledge of the dynamical system to be predicted is unavailable. Instead, one may have access to measured time series data of the previous states of the system. For situations where the system to be predicted is stationary in time, machine learning has proven to be effective in using available time series data to accom- plish prediction tasks [69–94]. In this stationary context, particularly for systems with chaotic behavior, it is useful to partition the problem of prediction into (i) short-term prediction of the state of the system, and (ii) long-term prediction/replication of the statistical/ergodic dynamical behavior of the system. That is, even after short-term prediction fails (e.g., due to the sensitive dependence of chaos), we may still desire to replicate the statistical properties of the orbits of the real system. We have in mind the situation, in which, following an initial transient, the orbit from an initial condition goes to an “attractor”, i.e., a subset of the state space on which the orbit continues to reside, while, perhaps, evolving chaotically. In such a stationary situation, a mea- 11 surement of the state at a random instant of time can be thought of as corresponding to a sample taken from an underlying probability distribution which is invariant under the flow of time. For such systems, the ergodic theorem allows us to characterize the long-term statistical properties of the orbits by equating the time average of a state function over a single typical orbit to the state space average of that function taken with respect to an invariant measure [95]. If we succeed in predicting trajectories which have the same long-term statistical properties as those produced by trajectories of the actual system, then we say we have succeeded in replicating the “climate” of that system. In contrast to the situation described in the above paragraph, we are interested in prediction of non-stationary dynamical systems, i.e., systems which themselves vary with time. However, in the case of a non-stationary system, ergodicity no longer holds. Thus, for non-stationary systems, we cannot meaningfully define a representative statistical quantity in the above way [96], and we wish to extend our concept of ”climate” to the non-stationary case. To do this we consider an ensemble of trajectories originating from many randomly chosen initial conditions in the past. At any subsequent given time, after a sufficiently large convergence time has passed since initialization, the ensemble trace out a so-called ”snapshot attractor” [96–98]. This attractor possesses a probability distribution that (under appropriate general conditions) is independent of the set of initial conditions chosen for the ensemble and thus provides an appropriate measure representing the probability distribution of states on the snapshot attractor [96]. (The collection of all snapshot attractors (at every time instance) is referred to, in mathematical and, terrestrial climate-related literature, as the pullback attractor of the system [99]). This time-dependent snapshot probability distribution can then be used to obtain expectation values of state-dependent functions thereby defining the time-dependent climate of a non-stationary system. 12 The goal of this chapter will be to devise and test machine learning approaches for the prediction of the future evolution of the climate of non-stationary dynamical systems. We note that, although machine learning has previously been extensively investigated and applied for the task of short-term state prediction of stationary dynamical systems (especially chaotic systems) (see Refs. [69–71,82–91]), and has also been shown to be effective for replicat- ing the climate of stationary systems [72, 73, 92, 93], machine learning has previously received little attention as a general approach to the important task of predicting climate evolution in non-stationary dynamical systems. For recent papers on using machine learning for short-term state prediction of non-stationary systems, see Refs. [88, 94, 100]. In the context of our stated goal, an important issue is that of “regime transitions”. Specifically, for different system param- eters stationary dynamical systems can generally exhibit a variety of different types of motions with different climates, and, correspondingly, slowly time varying non-stationary systems can experience regime transitions where, once the system crosses some critical condition, the sys- tem’s behavior rapidly becomes qualitatively and quantitatively different. There are a variety of mechanisms for regime transitions ranging from basic bifurcations of periodic orbits (e.g., Hopf, saddle-node, and period doubling bifurcations) to transitions of chaotic attractors (e.g., period doubling cascades [101, 102], intermittency [103], and crises [104, 105]). For a very recent work on using machine learning to predict system collapse and chaotic transients associated with crisis transitions, see Ref. [106]. Thus, in addition to slow, temporally continuous climate change, regime transitions are important in many fields. For example, in the context of terrestrial climate, reference is typi- cally made to “tipping points” representing abrupt transitions in the Earth’s climate [32, 107]. Geological records indicate that the Earth has experienced such transitions in the past; for ex- 13 ample, Greenland ice-core records indicate a series of abrupt climatic events in the Last Glacial period [108], and there is growing concern that anthropogenic activities such as greenhouse gas emissions could drive the climate system to a regime transition [109, 110], such as the possible shutdown of the thermohaline circulation in the Atlantic Ocean [33, 34], and the potential tran- sition from a wet to a dry Indian summer monsoon [111]. Such regional transitions could also affect the climate of geographically distant locations via teleconnection [112–114]. Naturally, both temporally continuous behavior evolution, as well as regime change events are of immense consequence, not only for atmospheric climate, but also in many other dynamical contexts, e.g., economics [115, 116], ecology [14–23, 117, 118], medicine [119, 120], technology [11, 12], and sociology [121]. The goal of predicting the climate of non-stationary systems from time series state data, in principle, requires some knowledge of the future evolution of the underlying dynamical system that produces the data. Such knowledge might be directly available or derivable from the available past state time series data. The need for this type of knowledge can be seen by considering the example where the underlying system time dependence experiences an abrupt change such that the system after the change is unrelated to the system before the change, thus making predictions of the state changes past the time of the abrupt system change untenable. With this in mind, we restrict our considerations to the case where the underlying system evolution is smooth and slow compared to the time scale (τs) of the state evolution (τns ≫ τs, where τns is the ”non-stationarity time scale” over which the underlying system evolves). We emphasize that we are interested in climate prediction over the long time scale τns. In this chapter we consider the scenario where the non-stationarity of the system to be predicted is due to time dependent system parameters where the time dependence is known, although the system itself is otherwise unknown. Discussion of 14 the importance this scenario in application is given in Sec. 2.4.1. (For simplicity, in what follows, we will only consider the case where there is one relevant time-dependent system parameter.) The main conclusion of the work in this chapter is that machine learning offers a surpris- ingly promising avenue for using measurements of past system state time series to accomplish the important general goal of predicting the long-term dynamical behavior (climate) of non- stationary dynamical systems, including prediction of apparently continuous change and of more sudden regime transitions. In what follows, we use Reservoir Computing [122–124] as our machine learning platform of choice for practical reasons. Reservoir computing is a computationally inexpensive framework since it requires training only its output layer via a simple linear regression. The resulting reduced training time allows for relatively rapid testing of various methodologies and parameter regimes. However, we expect other types of machine learning, e.g., deep learning [125], to also work well for this task via the methods we develop and test in this chapter. Another issue we address is that of predicting the time evolution of a non-stationary dy- namical system evolving under the influence of dynamical noise, which is often unavoidable in real situations. We present a simple method which uses the residual error distribution from the training procedure to inject appropriately determined noise into the reservoir prediction scheme to perturb the reservoir output in a way which mimics the effects of the dynamical noise present in the real system, thus enabling prediction of the noise-influenced climate. The rest of the chapter is organized as follows. In Sec. 2.2 we give a brief overview of Reservoir Computing and of a widely employed basic scheme for using it for prediction of stationary dynamical systems. In Sec. 2.3 we discuss noiseless and noisy non-stationary versions of the logistic map and their behaviors. In Sec. 2.4.1, using the noiseless non-stationary logistic 15 map, we introduce and test our method. In Sec. 2.4.2 we generalize the method of Sec. 2.4.1 to include dynamical noise. Sections 2.5, 2.6 and 2.7 test our methods on non-stationary versions of the Lorenz 63 system [62] and the Kuramoto-Sivashinsky system [63, 64], respectively. Section 2.8 presents conclusions. Referring to Secs. 2.4-2.7, we note that the example systems tested include a simple discrete time map (Sec. 2.4), a low-dimensional continuous time system (three ordinary differential equations; Sec. 2.5), and a partial differential equation potentially exhibiting spatiotemporal chaos (Sec. 2.6 and 2.7). 2.2 Review of Reservoir Computing (RC) and Its Use for Prediction of Station- ary Systems Reservoir computing (e.g., see the review paper [122]) is a machine learning approach that has proven to be effective in tasks involving dynamical time series data. Although there are various ways of implementing RC [126–131], in this chapter we consider the widely used implementation which utilizes a network of neuron-like nodes with recurrent connections, the “reservoir”, to process temporal/sequential data [123,124]. The main idea is to drive the reservoir with an input signal and form a trainable linear combination of the nonlinear response of the individual network nodes to obtain an output sequence which closely approximates some desired sequence. For illustrative and background purposes, in this section we describe the application of reservoir computing to predicting the time evolution of an unknown stationary system. Given a set of measurements of the state of a system from some time in the past t = −T up to some time labeled t = 0, we desire to predict the state of the system for t > 0. Even after prediction of the 16 state of the system fails, we may still desire that the predicted trajectory replicates the climate of the actual system in the long-time limit (Refs. [72, 73]). We denote the state of the dynamical system (from which the measurements are obtained) at time t by a K dimensional vector u(t) = [u1(t), u2(t), ..., uK(t)] T . If our reservoir consists of N nodes, then the K dimensional input vector at each iteration is coupled into the reservoir via a (N × K) dimensional input-to-reservoir coupling matrix, Win (see Fig. 2.1). We chose to construct this matrix by randomly selecting one element of each row to be a nonzero number chosen randomly from a uniform distribution on the interval [−χ, χ]. Each reservoir node i has a scalar state ri(t). The state of the reservoir at time t is denoted by the N dimensional vector r(t) = [r1(t), r2(t), ..., rN(t)] T . The reservoir evolves dynamically given the linearly coupled input and a network adjacency matrix A, according to r(t+∆t) = tanh (Ar(t) +Winu(t)), (2.1) where the hyperbolic tangent function is applied to each element of its vector argument. The directed adjacency matrix is a randomly generated sparse matrix such that it has a predetermined average number of connections per node, ⟨d⟩. For our task the desired output time sequence is of the same dimension as the input sequence, so a (K×N) dimensional reservoir-to-output coupling matrix Wout is used to linearly map the reservoir state vector r(t) to the output denoted ũ at the next time step, t + ∆t. That is, the output is the K-vector, Woutr(t) = ũ(t + ∆t). These steps are diagrammatically represented in Fig. 1(a). The matrix elements of Wout, are determined by the training so that the output of the reservoir computer closely approximates the desired output for −T ≤ t ≤ 0. To do this, we minimize a “cost function”, 17 ∑ −T≤t≤0 ||Woutr(t)− u(t+∆t)||2 + α||Wout||2, where we include a Tikhonov regularization parameter α to prevent overfitting by penaliz- ing large output weights. This minimization is a simple linear regression. The set of coefficients of Wout which minimizes this L2 norm is taken to be the “trained” set of output weights for the network. The choice of α and other “hyperparameters” (such as the average degree ⟨d⟩ and spec- tral radius ρs of A (i.e., the largest eigenvalue magnitude of A), as well as the maximal strength of the input-to-reservoir coupling χ are empirically chosen to minimize the discrepancy between a target prediction and the reservoir-prediction over a validation data set (e.g., by performing a grid search over the space of these hyperparameters). Further, we constrain our search by re- stricting our attention to a set of hyperparameters which satisfy the “echo state condition” (see, e.g., Ref. [123]) which requires that, starting from any two different reservoir states r1 and r2, iterations of Eq. (2.1), after a transient phase, yields the same time series for the reservoir state r(t), i.e., the reservoir eventually “forgets” its initial condition, meaning that its evolution de- pends only on its input u(t) and is independent of its initial state. If N is sufficiently large it can be expected that the output ũ(t + ∆t) will be extremely close to the desired output u(t + ∆t), i.e., the reservoir will give an accurate one time step prediction. Once we have trained the reservoir on previous measurements of the system states, using the “open-loop” configuration for −T ≤ t ≤ 0 shown in Fig. 2.1(a), we predict the system evolution for t > 0 by using the “closed-loop” configuration shown in Fig. 2.1(b). In this configuration, the output from the reservoir computer at one step is fed back as input at the next iteration. Thus, on the first iteration the input is u(t) and (neglecting the small difference 18 Figure 2.1: Schematic of the reservoir computer. (a) The “open-loop” configuration used for training the output weight matrix, Wout. An input sequence, u(t), is coupled into a reservoir of nodes via a matrix Win. The excited reservoir state r(t) is mapped to an output, ũ(t), via a matrix of adjustable weights Wout. (b) The “closed-loop” configuration used for prediction. The reservoir output at one time step is fed as input to the reservoir at the next time step. The dashed red box denotes the “reservoir system”, made up of Win, the reservoir, and Wout. between u(t) and ũ(t)) the output is ideally u(t + ∆t), which, when fed back, yields an output u(t+ 2∆t), which, when fed back, yields an output u(t+ 3∆t), and so on. However, due to the small deviation between u(t) and ũ(t), errors build up with multiple circuits of the feedback loop, and eventually the prediction fails. Nevertheless, it is observed that this scheme often produces predictions that are useful for a limited time. One notable point is that, because the reservoir network is “recurrent” (i.e., has directed links forming closed paths cycling between nodes), the reservoir system retains memory of its past states and inputs. As mentioned in the introduction, Sec. 2.1, we are interested in both discrete-time dynami- cal systems (maps; Secs. 2.3 and 2.4), as well as continuous-time dynamical systems (flows; Sec. 2.5, 2.6, and 2.7). In the case of maps t is an integer t = n (n = 0, 1, 2, ...) and the reservoir time step is ∆t = 1. In the case of flows we will choose ∆t to be small compared to the characteristic time for variation of u(t) so that the predicted discrete outputs, separated by ∆t, give a good 19 representation of the continuous time evolution. It is important to note (e.g., Sec 2.4) that the closed-loop configuration of Fig. 2.1(a) is itself a stationary dynamical system whose dynamics, through the training of Wout, has essentially been devised so as to mimic the dynamics of the unknown system producing the measurement data (see Ref. [72]). In Sec. 2.4.1, we modify this set-up (applicable for the prediction of stationary dynamical evolution processes) to the application of non-stationary, possibly stochastic, processes. 2.3 The Non-Stationary Logistic Map In this section we consider the one-dimensional logistic map which provides a simple plat- form for formulating, illustrating, and testing the application of RC to the task of predicting the time evolution of a non-stationary discrete time dynamical system. Because of its simplicity, the logistic map allows us to rapidly test various methodologies and techniques. Furthermore, we will subsequently show (Secs. 2.5, 2.6 and 2.7) that our results and methods, demonstrated for the simple example of the logistic map, carry over to other, less simple systems. Before we demon- strate the reservoir computer’s ability to predict the non-stationary climate of the logistic map, we briefly review three variations of this system: the stationary logistic map, a non-stationary lo- gistic map, and a stochastic non-stationary logistic map. We demonstrate that the dynamics and regime transitions have different characteristics for each of these three variations of this system. We will first briefly review the behavior of the stationary logistic map system, xn+1 = λxn(1− xn) (2.2) 20 for different, but fixed values of λ. We randomly initiate a trajectory in the interval (0, 1), allow it to evolve under the action of the map for fixed λ, and consider only its long-term behavior (after any transients associated with the choice of initial condition have died away). Plotting the iterations of x in this long-term limit for each value of λ, we generate a bifurcation diagram, as shown in Fig. 2.2(a). For λ ∈ (1, 3), a typical initial condition x0 ∈ (0, 1) will evolve to a fixed point attractor, x∗ = 1 − 1/λ. As λ is increased through λ = 3, the stationary point x∗ bifurcates to a period 2 orbit. As λ is increased further, the attracting orbit undergoes a period doubling cascade [101, 102] terminating at λ = λ∞ ≈ 3.569.... In the range λ∞ < λ < 4, chaos occurs, accompanied by an infinite number of windows of different periodicity, e.g., see Ref. [132] (Sec. 2.2). For this section, in order to conveniently address a variety of regime transition types, we focus on the period 3 window λ ∈ [3.82, 3.87], as shown in Fig. 2.2(b). As λ increases through a critical value λc1 ≈ 3.828..., the period 3 window is initiated from a chaotic situation below λc1 through an intermittency transition [103] associated with a period 3 tangent bifurcation. As λ increases beyond λc2 , this period 3 orbit undergoes a period doubling cascade leading to chaos cycling through three bands (with interspersed windows). This three-piece chaotic attractor widens for larger λ until it collides with the unstable period 3 orbit generated at the start of the window. This collision results in a crisis transition [104,133] causing an abrupt broadening of the three-piece chaotic attractor into a single-band chaotic attractor, thereby terminating the period 3 window (at λc3). In the tests and examples in the rest of this chapter, we will often consider situations with windows, like the period three window of Fig. 2.2(b). We emphasize, however, that we do this, not necessarily because we want to treat windows, but, rather because, as noted above for the case of Fig. 2.2(b), windows contain within them a variety of general types of common regime 21 transitions (intermittency transitions [103], period doubling cascades, crises [104, 133]). Thus windows provide a convenient platform for studying and testing the effectiveness of our methods when regime transitions are present. Figure 2.2: (a) Bifurcation diagram of the logistic map. (b) Zoomed-in view of the period 3 window of the logistic map bifurcation diagram. Now we modify the stationary system to create a non-stationary situation where the param- eter λ drifts linearly with time, xn+1 = λnxn(1− xn) (2.3a) λn+1 = λn + γ, (2.3b) (Later in this chapter we will consider the case where the linear drift with time, Eq. (2.3b), is replaced by a nonlinear drift with time.) To discuss the dynamics of this system, we consider the situation where λn varies very slowly (|γ| ≪ λn). For γ = 10−5 Figure 2.3(a) shows a typical orbit evolved under the action of Eq. (2.3) over the interval of the period 3 window. Notice that the period doubling bifurcation, from period 3 to period 6, looks different, more “abrupt” (see blow-up in Fig. 2.3(b)), in this non-stationary case than in the stationary case 22 shown in Fig. 2.2(b). This can be understood as follows. Just before the bifurcation, the orbit is approximately moving on the period 3 attractor. When λn increases past λc2 , the period 3 orbit points become unstable, but the trajectory remains, for a few iterations, very close to now unstable period 3 orbit, and then, after a few more time steps, the trajectory becomes sufficiently displaced from the unstable period 3 orbit (since λn is changing), and is repelled from the period 3 orbit, quickly moving to the now stable period 6 orbit. Thus, we obtain the observed delayed, but rapid, transition from a period 3 orbit to a period 6 orbit. As the drift rate of λ increases, the transition from a period 3 orbit to a period 6 orbit is further delayed, and the details of the period doubling cascade become less discernable and are eventually washed out, as seen from Figs. 2.3(c,d) which apply for γ = 5 × 10−5. The main point illustrated in Figs. 2.3(a-d) is that non-stationarity can qualitatively alter the dynamical behavior of a system, especially during a regime transition. Similarly, dynamical noise can also significantly alter the climate and features of regime transitions. To illustrate this, we consider the non-stationary stochastic logistic map [134], given by Eq. (2.4), xn+1 = λnxn(1− xn) + εn (2.4a) λn+1 = λn + γ (2.4b) where ϵn (the dynamical noise) is a number randomly and independently chosen from a distri- bution ζ(ϵ). For simplicity, we will consider the case in which the noise distribution is uniform on the interval [−ϵ, ϵ]. As shown in Ref. [135] for the stationary system bifurcation diagram the dynamical noise acts to blur the fine structure of the period doubling cascade (near λ∞) and ap- pears to cause the period doubling cascade to terminate after a few period doublings. Turning to 23 Figure 2.3: A trajectory evolving under the non-stationary logistic map (a) given by Eq. (2.3) for γ = 10−5 (c) given by Eq. (2.3) for γ = 5 × 10−5 and (e) given by Eq. (2.4) for γ = 10−5 and ϵ = 0.002. (b) and (d) show a zoomed-in view of the red boxed areas in (a) and (c), respectively, near the period doubling bifurcation from a period 3 orbit to a period 6 orbit. the non-stationary case, in Fig. 2.3(e) we have plotted a typical trajectory in the period 3 window evolving under the action of Eq. (2.4) with a noise of strength ϵ = 0.002 and λ drifting linearly at a rate γ = 10−5. We see that the dynamical noise “blurs” the period doubling bifurcation from a period 3 orbit to a period 6 orbit. We also see, before the occurrence of the internal crisis, at the upper λ range of the window, that the noisy trajectory intermittently leaves (and returns) to the region of the state space which corresponds to the three-piece chaotic attractor of the stationary system. Thus, dynamical noise can cause significant change in features of regime transitions, including blurring of the noiseless sharp location of the point at which a transition occurs. We desire to accurately predict and replicate, from past time series data, the effects of both noiseless and noisy non-stationarity on the climate associated with time evolution of the system. 24 2.4 Application of RC to Climate Prediction of the Non-Stationary Logistic Map Figure 2.4: (a) Comparison of the actual (black dots) and the reservoir-predicted (red dots) tra- jectories of the logistic map for Setup 1. (b) The logistic map for λ = 3.8 (black) and the map obtained from the reservoir-predicted trajectory (red dots) of (a). (c) and (d) compare the actual and the reservoir-predicted trajectories for γ = 10−5 and γ = 5× 10−5, respectively, for Setup 2. 2.4.1 The Noiseless Non-stationary Logistic Map Now we turn to our stated goal: given a finite set of measurements of the state of an un- known non-stationary dynamical system from some time in the past t = −T up to some time t = 0, predict the climate and regime transitions for t > 0. More precisely, when the predic- 25 tion is performed over an ensemble of trajectories, each initiated from a randomly chosen initial condition, we want our predicted ensemble of trajectories to replicate the non-stationary climate of the actual system. For work previously applying initial condition ensembles for studying and understanding non-stationary dynamics see Refs. [136–138]. First, we demonstrate how naively applying RC using the measurements of the previous states of the system as the only input to the reservoir (i.e., following the procedure of Sec. 2.2) fails to accomplish our goal. We will call this setup “Setup 1”. Considering the noiseless system given by Eq. (2.3) for γ = 10−5, we train the reservoir in the “open-loop” configuration (Fig. 2.1(a)) using a set of training data, {xn} for n ≤ 0 (1000 data points), and then allow the reservoir to autonomously predict a trajectory, xn for n > 0, forward in time using the “closed-loop” configuration (Fig. 2.1(b)). Table 2.1 displays the set of hyperparameters used in all tasks in this chapter involving the prediction of the logistic map. Figure 2.4(a) compares the actual trajectory of the system (black dots) and the reservoir predicted trajectory (red dots). We see that the reservoir predicted trajectory does not undergo any of the regime transitions that the actual trajectory undergoes. Instead, the reservoir-predicted trajectory appears to evolve according to a fixed stationary map. Going back to the description of the prediction in the stationary case, Sec. 2.2, this is to be expected because the memory of the reservoir is short compared to the timescale for change of λn (i.e., γ−1) and the training produces a single fixed Wout which, when used in the closed-loop prediction configuration, produces a stationary reservoir dynamical system. As such, we can think of the training procedure of Fig. 2.1(a) acting on non-stationary training data as choosing a Wout that in some sense yields the sta- tionary system that is most consistent with the non-stationary training data −T < t < 0. Figure 2.4(b) shows the map traced out by the reservoir-predicted trajectory (red dots) and by the actual trajectory (black dots) from Fig. 2.4(a). 26 Table 2.1: Hyperparameters for predicting the logistic map. Reservoir size N 500 Average number of connections per node ⟨d⟩ 1 Spectral radius of the reservoir’s adjacency matrix ρs 0.55 Maximal coupling strength of the input to the reservoir χ 0.23 Tikhonov regularization parameter α 2.15× 10−11 In order to take into account non-stationarity, we must break the stationarity of the closed- loop dynamical system of Fig. 2.1. Instead of providing only the state of the dynamical system as input to the reservoir, we also provide the reservoir with the value of the driving parameter λn of the system during both the training and the prediction. We call this setup, illustrated in Fig. 2.5, “Setup 2”. (A similar scheme was very recently independently formulated in the pre-print Ref. [106] for studying crises.) We have coupled this parameter to the reservoir in the same way we couple the measured states of the dynamical system, i.e., each reservoir node is coupled to a single randomly selected element of the state vector u(t) or to the driving parameter. Using Setup 2, we will now demonstrate the successful application of RC to the prediction of the noiseless non-stationary logistic map. With an additional modification to this method, we will later extend its capabilities to successfully predict the stochastic logistic map as well. We assume we have a set of measurements of the state of the system, as well as knowledge of the time variation of the parameter, {xn, λn}n=0 n=−T , from some time in the past n = −T to the present n = 0, and we desire to predict the future climate of this non-stationary system. We note that the task of prediction where non-stationarity is due to time variation of a known parameter is often of great practical interest. For instance, suppose we are trying to predict how the Earth’s climate will change over time and we have a set of measurements which describe the climate from the past as well as of relevant quantities that we think may directly influence the dynamical 27 Figure 2.5: (a) Open-loop training phase and (b) closed-loop prediction phase of Setup 2. The rectangular boxes, as for the dashed rectangles in Figs. 2.1(a,b), contain the RC system consisting of the input layer (Win), the reservoir, and the output layer (Wout). behavior of the terrestrial climate system, such as atmospheric CO2 concentration [109,110]. We may be interested in forecasting how the climate would change in the future for different rates of increase of CO2 concentration in the atmosphere. In that case we may treat CO2 concentration levels as a known time dependent driving parameter of the system. With this in mind, we train the reservoir, as shown in Fig. 2.5(a), and then we employ the “closed-loop” configuration of Fig. 2.5(b) to predict the future states of the system. For the case with γ = 10−5, ϵ = 0, the results are shown in Fig. 2.4(c). The black dots represent the actual trajectory evolving under Eqs. (2.3a,b), while the red dots represent the trajectory predicted by the reservoir computer. The reservoir computer was trained on a portion of the actual trajectory corresponding to the interval λ ≈ [3.81, 3.82] (T = 1000 data points). Although the reservoir quickly loses the ability to predict the precise time evolution of the true system orbit, as expected for a chaotic orbit, we see that it was able to replicate and predict the climate, including the various regime transitions: the intermittency transition near λ = 3.828 from a chaotic orbit to a stable period three orbit; the 28 cascade of period doubling bifurcations starting at λ ≈ 3.845 leading to chaos near λ = 3.857; and the crisis transition at the upper boundary of the period 3 window (λ ≈ 3.856). Notice that the reservoir-predicted trajectory was able to accurately capture the effects of non-stationarity, for example, in the sudden transition from a period 3 orbit to a period 6 orbit discussed previously. We have also considered the reverse bifurcations (obtained as λn decreases with time) and they are well-predicted by the machine learning method; however, we omit these figures here but will show the corresponding figures for the next example (the Lorenz system, Sec. 2.5). Figure 2.4(d) shows a similar result, but here the system is drifting at a faster rate (γ = 5× 10−5). We see that this faster drift alters the details of the transition quite a bit (e.g., we cannot clearly see a period doubling cascade), while, the reservoir, this time trained on T = 250 data points (over a similar range of λ as for the previous case), again accurately mimics the dynamics. We can, more quantifiably, verify that the long-term dynamical behavior predicted by the reservoir computer is indeed characteristic of that of the true system as follows. We initiate a trajectory xn from an initial condition randomly selected from the interval (0, 1). We train the reservoir over a portion of the actual trajectory and then predict forward in time. We repeat this many times, each time using a trajectory initiated from a different randomly chosen initial condition. (The number of iterations of this step is made large enough that Γ (defined below) appears to be well converged, i.e., Γ does not change appreciably when there is an increase in the number of iterates.) After prediction has started, for each of the actual and predicted trajectories, we collect those points in the trajectory which are contained in the segment of the trajectory where λ ∈ [λ∗, λ∗ + ∆λ] for a predetermined small ∆λ and set of λ∗. For each such interval we then record the orbit points for the actual and predicted trajectories, from which we approximat