ABSTRACT Title of Dissertation: EMERGENT BEHAVIORS IN ADAPTIVE DYNAMICAL NETWORKS WITH APPLICATIONS TO BIOLOGICAL AND SOCIAL SYSTEMS Brandon Marc Alexander Doctor of Philosophy, 2021 Dissertation Directed by: Professor Michelle Girvan Department of Physics In this thesis, we consider three network-based systems, focusing on emergent be- haviors resulting from adaptive dynamical features. In our first investigation, we create a model for gene regulatory networks in which the network topology evolves over time to avoid instability of the regulatory dynamics. We consider and compare different rules of competitive edge addition that use topological and dynamical information from the network to determine how new network links are added. We find that aiming to keep connected components small is more effective at preventing widespread network failure than limiting the connections of genes with high sensitivity (i.e., potential for high variability across conditions). Finally, we compare our results to real data from several species and find a trend toward disassortativity over evolutionary time that is similar to our model for structure-based selection. In our second investigation, we introduce a bidirectional coupling between a phase synchronization model and a cascade model to produce our ?sync-contagion? model. The sync-contagion model is well-suited to describe a system in which a contagious signal alerts individuals to realign their orientations, where ?orientation? can be in the literal sense (such as a school of fish escaping the threat of a predator) or a more abstract sense (such as a ?political orientation? that changes in response to a hot topic). We find that success in realigning the population towards some desired target orientation depends on the relative strengths of contagion spread and synchronization coupling. In our third and final investigation, we attempt to forecast the complex infection dy- namics of the COVID-19 pandemic through a data-driven reservoir computing approach. We focus our attention on forecasting case numbers in the United States at the national and state levels. Despite producing adequate short-term predictions, we find that a simple reservoir computing approach does not perform significantly better than a linear extrapo- lation. The biggest challenge is the lack of data quantity normally required for machine learning success. We discuss methods to augment our limited data, such as through a ?library-based? method or a hybrid modeling approach. EMERGENT BEHAVIORS IN ADAPTIVE DYNAMICAL NETWORKS WITH APPLICATIONS TO BIOLOGICAL AND SOCIAL SYSTEMS by Brandon Marc Alexander Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2021 Advisory Committee: Professor Michelle Girvan, Chair Professor Brian Hunt Professor Edward Ott Professor Derek Paley Professor Bill Fagan ? Copyright by Brandon Marc Alexander 2021 Acknowledgments First and foremost, I would like to thank my committee, especially my advisor Prof. Michelle Girvan. Without her knowledge and guidance, this research and my grad- uate school career would not have been possible. Second, I would like to thank my family and loved ones. It was my mother?s dream before it was mine that I would some day earn a PhD. She raised me into who I am today and deserves the majority of the credit for getting me this far. This work would not have been possible without the constant love and support of my mother Cathy, my partner Dominick, my sister Jaclynn, my step-father John, my father Carman, and my grandparents Ken and Patricia. Next, I would like to thank my colleagues. My office-mates Graham and Sarah helped pass many hours with random discussions, giving necessary distractions for our overworked brains. We went through this journey together, and I look forward to seeing them too reach this milestone. Finally, I would like to thank my close friends. Although we haven?t been able to see each other for over a year, we?ve been able to enjoy countless nights online. Thank you to Abbey, Chip, Collin, Dan, Forrest, Hannah, Joe, Josh, Nick, Nikki, Robert, Scott, Shawn, and Viggy. ii Table of Contents Preface ii Acknowledgements ii Table of Contents iii List of Tables v List of Figures vi List of Abbreviations xiv Chapter 1: Introduction 1 1.1 Phase transitions and assortativity in models of gene regulatory networks evolved under different selection processes . . . . . . . . . . . . . . . . . 3 1.2 The sync-contagion model of dual information spread . . . . . . . . . . . 5 1.3 A reservoir computing approach to forecasting COVID-19 data . . . . . . 7 Chapter 2: Phase transitions and assortativity in models of gene regulatory net- works evolved under different selection processes 10 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Modeling failure propagation in gene networks as a percolation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Tracking network components . . . . . . . . . . . . . . . . . . . 17 2.2.3 Network growth via selection . . . . . . . . . . . . . . . . . . . . 18 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1 Visualizing networks evolved via different selection rules . . . . . 20 2.3.2 Growth of giant components . . . . . . . . . . . . . . . . . . . . 23 2.3.3 Maximizing yield . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.4 Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3.5 Connecting with data . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Chapter 3: The sync-contagion model of dual information spread 36 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2 Model and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 iii 3.2.1 Sync model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.2 Contagion model . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.3 Combining the models: the sync-contagion model . . . . . . . . . 43 3.2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.5 Parameter restrictions . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2.6 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.1 General behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.2 Modifying the width ? of the noise distribution . . . . . . . . . . 53 3.3.3 Modifying the coupling strength ? . . . . . . . . . . . . . . . . . 56 3.3.4 Modifying the transmission coefficient ? . . . . . . . . . . . . . . 59 3.3.5 Behavior classification . . . . . . . . . . . . . . . . . . . . . . . 61 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Chapter 4: A reservoir computing approach to forecasting COVID-19 data 66 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.1 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.2 Reservoir computer architecture . . . . . . . . . . . . . . . . . . 71 4.2.3 Recursive least squares . . . . . . . . . . . . . . . . . . . . . . . 76 4.2.4 Baseline predictors . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2.5 Library-augmented reservoir computing . . . . . . . . . . . . . . 77 4.2.6 Predictor evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3.1 Recursive least squares and forgetting factor selection . . . . . . . 81 4.3.2 Predictor results . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.3 Library-augmented reservoir computing applied to U.S. COVID- 19 case data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3.4 Autoencoder embedding of state-level COVID-19 data . . . . . . 95 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Appendix A: Supplementary material for Chapter 2 102 A.1 Locally treelike features . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 A.2 Selection with m = N . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Appendix B: Complications with data preprocessing of COVID-19 case data 105 B.1 Negative number of cases reported . . . . . . . . . . . . . . . . . . . . . 105 B.2 Irregular reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 B.3 Zero cases reported . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 B.4 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Bibliography 109 iv List of Tables 2.1 Number of nodes, edges, and the in-out assortativity for the E. coli, yeast, fruit fly, mouse, and human transcriptional networks. These in-out assor- tativity values are also the final values in Fig. 2.6 . . . . . . . . . . . . . 34 4.1 List of hyperparameter values used in training the reservoir computers, unless otherwise noted. . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 v List of Figures 2.1 Structure of networks formed under our four different selection rules on a network of size N = 100, stopped after E/N = 1.75. Nodes are sorted according to sensitivity with least sensitive at the top of the circle and increasing in sensitivity clockwise. If a node is damaged, it is filled in blue. Nodes in GOUT are enlarged and filled in black (if undamaged) or blue (if damaged); all other nodes are filled in white. Isolated nodes with no links are placed below each network with their corresponding lo- cations in the circular layout left empty. Thick links connect nodes in GOUT. Links are blue if they connect two damaged nodes, black if they connect nodes in GOUT that are not both damaged, or gray otherwise. We see that GOUT is much larger for the random and sensitivity-based pro- cesses as compared with the structure-based and hybrid processes. Both the sensitivity-based and hybrid processes show a higher density of links between low sensitivity nodes, which is much more pronounced in the former. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Network structure for a network of size N = 100 grown under the four selection rules stopped after E/N = 5.75. As in Fig. 2.1, nodes are sorted according to sensitivity, with least sensitive at the top of the circle and increasing in sensitivity clockwise and with isolated nodes left out of the circular layout and placed below each network. We now focus on GOUT* rather than GOUT. Nodes belonging to GOUT* are enlarged and red with thick red links connecting between them. Damaged nodes that do not belong to GOUT* are blue, with blue links connecting pairs of damaged nodes. All other nodes are colored white, and all other links are colored gray. Even though the size of GOUT for the sensitivity-based process is similar to the random process, the former has a much smaller GOUT*. The structure-based and hybrid processes show even smaller GOUT*, despite a large number of connections between damaged nodes. . 22 vi 2.3 Growth of the connected components on the full network (left) and the damaged subnetwork (right). We compare growth under random selec- tion to our three competitive rules. Results shown are the averages of 100 runs on networks of size N = 105. We omit GIN and GIN* due to their symmetry with GOUT and GOUT*, respectively. Of the selection rules considered, we see that structure-based selection (blue curves) does best at delaying the phase transition for the full network, while the structure- based and hybrid selection (red curves) exhibit simultaneous transitions when we consider only the damaged subnetwork, with the hybrid selec- tion showing greater suppression of the component size after the transi- tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 The average yield of networks grown under our four selection rules, as de- scribed by Eq. (2.5). Results are averaged over 100 runs on networks of size N = 105. Hybrid and structure-based selection processes show simi- lar yield profiles, consistently outperforming both random and sensitivity- based yields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5 In-out assortativity (top) and the degree-sensitivity correlation (bottom) for our four selection rules. Results shown are averages over 100 runs on networks of size N = 105. While the random and sensitivity-based pro- cess are assortatively neutral (i.e., have near zero in-out assortativity), we see that both the structure-based and hybrid selection processes result in disassortative networks before the phase transition, with the former con- tinuing to show weak disassortativity after the transition, while the latter is nearly neutral after the transition. The random and structure-based pro- cesses result in no correlation between degree and sensitivity while, as expected, the sensitivity-based selection process creates a negative corre- lation between a node?s out-degree and its sensitivity. The hybrid process shows weaker and non-monotonic negative correlations. . . . . . . . . . 29 2.6 In-out assortativity for the transcriptional gene regulatory networks listed in Table 2.1, as a function of the number of links added. Links are ordered using gene age information [1], and a randomized ordering of the links in the final network is shown for comparison. Both orderings yield the same final network. E. coli is omitted, due to lack of gene age information from [1]. Note that the horizontal axis is scaled by total number of final links (Efinal), rather than number of genes (N ), to allow for comparison across the different networks. . . . . . . . . . . . . . . . . . . . . . . . . 33 vii 3.1 Left: The critical boundary for the sync model (3.1) with Gaussian noise ?kt ? N (0, ?2). Values of (?, ?) above the blue curve lead to an inco- herent system state at equilibrium (?? ? 0). Colored dots mark values of (?, ?) used in the example steady-state distributions shown in the right figure, with colors of the dots matching colors of the distributions. Right: Example steady-state distributions for the sync model (3.1) with Gaus- sian noise ?k ? N (0, ?2t ) for different values of ? and fixed ?, generated from a simulation of N = 105 oscillators. Note that each distribution has been shifted to have mean 0. As ? decreases, the steady-state dis- tribution changes from a uniform distribution (the incoherent state) to a symmetric, unimodal distribution (the coherent state), with the width of the distribution decreasing as ? continues to shrink. . . . . . . . . . . . . 40 3.2 Transition diagram for the contagion model. . . . . . . . . . . . . . . . . 43 3.3 Example outcomes of the contagion model for different choices of trans- mission coefficient ? and recovery rate ?. In general, larger values of ? lead to greater exponential growth of the infected population (and decay of the susceptible population), while larger values of ? lead to faster ex- ponential decay of the infected population (and growth of the removed population). In the case where ?/? < 1 (top left), the infection immedi- ately dies out. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4 Transition diagram for the sync-contagion model. We have updated the transition diagram from Figure 3.2 with the effect of the acceptance func- tion ?(?). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5 Plot of the acceptance function ?(?) used in our simulations. Individuals with orientations near the target orientation ? are more likely to take on the infection when exposed. . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.6 Example behavior of an incoherent (left) and coherent (right) system. Simulations used a population of N = 104 oscillators. Top row: Frac- tions of the population in each category (susceptible, infected, and re- moved) over time. Middle row: Distribution of orientations over time. The red center line indicates ? = 0. The mean angle ?t is marked by a yellow dot. Bottom row: The coherence of the full population ?t, the infected subpopulation ?It , and the uninfected subpopulation ? S?R t . . . . . 54 3.7 Distributions of orientations over time for coherent systems with different choices of noise distribution width ? and ? = 0.5, ? = 0.1, and ? = 0.01. The mean angle ?t is marked by yellow dots, and the central red line indicates ? = 0. As the width of the noise distribution increases, the uninfected population realigns towards ? at a faster rate, at the cost of a less tight distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 viii 3.8 Heatmaps of our measures in ??-space for low noise (? = ?/18, left) and moderate noise (? = ?/9, right). Results are averaged over five realiza- tions each with a system of N = 1000 oscillators. The vertical dashed red line indicates the critical value of ?c for each value of ?. Incoherent systems exist to the left of the red line, while coherent systems exist to the right. Top row: Absolute value of the final mean angle ?? scaled by ?. For incoherent systems, the final mean angle is a random value, leading to a mean near 0.5. Middle row: The maximum size of the in- fected population Imax as a fraction of the total population. Bottom row: The maximum mean cosine Cmax, which takes into account both the mean orientation ?t and the coherence ?t of the distribution. . . . . . . . . . . . 58 3.9 Our measures for fixed values of ? = 0.3, ? = ?/18, and ? = 0.01 as ? varies. The vertical dashed red line indicates the critical value of ?c for ? = ?/18. For coherent systems (? > ?c), intermediate values of ? lead to a drop in performance according to all three measures. Within this region, the distribution is not wide enough (? too high) and does not realign fast enough (? too low) to be within an appropriate region for the infection to continue, as required by the acceptance function ?(?). . . . . 59 3.10 Our three measures for different varying values of ? on incoherent sys- tems with ? = 0.02, ? = 0.01, and ? = ?/9. Results are simulated on a system of N = 1000 oscillators and averaged over 100 realizations. The large grey box marks the region where ??o/? < 1, where the initial infec- tion does not spread. The final mean orientation in an incoherent system is a random value, which leads to |??|/? averaging out to 0.5 for all val- ues of ?. The maximum size of the infected population and the maximum mean cosine measure both increase as the transmission coefficient increases. 60 3.11 Distributions of orientation over time for ? = 0.2 (left) and ? = 0.9 (right) for two coherent systems with ? = 0.4, ? = ?/18 and ? = 0.01. The mean orientation ?t is indicated by yellow dots, and the horizontal red line indicates ? = 0. When ? is made too large, the exposures occur too quickly, leading to most exposures turning into instant removals, due to ?(?). The transmission coefficient must be slow enough to allow for partial realignment of the uninfected population before the majority of exposures occur, while still large enough to sustain the infection at low levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.12 Classifications of sync-contagion model behaviors, experimentally de- rived from the results shown in Figure 3.8. The classification regions are as follows: infectionless (black), temporary realignment (red), full per- manent realignment (green), partial permanent realignment (yellow), and limited/no realignment (grey). Partial permanent realignment is based on ?? being within ?/6 of the target orientation. Similarly, limited/no realignment is based on ?? being further than 5?/6 from the target ori- entation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 ix 4.1 Top: New COVID-19 cases x(t) for the United States t days after March 1, 2020 shown from March 21, 2020 (t = 20) to February 28, 2021 (t = 364). Bottom: Transformations of the U.S. case data using the windowed average (blue) and the local linear fit (orange) methods for dif- ferent choices of m. For m = 1, the two transformations are identical, but even for largerm the transformations give similar results. Note that larger m leads to greater ?smoothing? of the data, causing increased suppression of the oscillations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Top: New COVID-19 cases x(t) for New York (left) and Kansas (right) t days after March 1, 2020 shown from March 21, 2020 (t = 20) to February 28, 2021 (t = 364). Bottom: Transformations of the New York and Kansas case data using the windowed average (blue) and the local linear fit (orange) methods for different choices of m. Note that the two transformations are identical for m = 1. For larger values of m, the transformations give similar results for well-behaved data, like New York and the United States data (Figure 4.1). For poor-quality data, such as for Kansas, the two transformations are more distinct. The windowed average approach can more effectively remove massive spikes, but is less effective at overall suppression of the resulting oscillations (see for example m = 6). 73 4.3 The structure of a standard reservoir computer. During ?open loop? train- ing, the switch on the left takes in the input u(t). The input is coupled to the reservoir through the input matrix Win and update Eq. (4.9). The output is then generated by multiplying the reservoir state r(t+ 1) by the output matrix Wout. After training, the switch on the left is rotated to accept the output of the reservoir computer as the input for the next time step, creating an autonomous ?closed loop?. . . . . . . . . . . . . . . . . 75 4.4 The two stages of the library-augmented reservoir computing process. In stage one (top), a library of trained reservoir computers are used to train an autoencoder to predict the pair (Wout, r0), consisting of an output map- ping Wout generated from training the RC and an associated reservoir state r0 at the end of training. In stage two (bottom), a state from the em- bedded space of the autoencoder is decoded to generate a pair (W?out, r?0). This pair is used to construct a new reservoir computer, that is otherwise identical to those in the library. The sample reservoir computer generates a signal that is compared to a new (short) signal that we wish to predict. Using an optimization scheme, information is sent back to the decoder to pick a new pair (W?out, r?0) that better matches the new data until some stopping criterion is reached. . . . . . . . . . . . . . . . . . . . . . . . . 79 4.5 Reservoir computer predictor results for different choices of forgetting factor ?. For each value of ? shown, we increment through all prediction start times t0 in [50, 150]. For each value of t0, a reservoir computer was trained on the entire history (i.e., all data from t = 0 to t = t0 ? 1), with data transformed according to the percent daily change Eq. (4.7). . . . . . 82 x 4.6 The true daily U.S. COVID-19 case data (black) and example predictions for the four prediction methods: reservoir computer with the windowed average transformation applied to daily cases with m = 6 (magenta), reservoir computer with the local linear fit transformation applied to daily cases with m = 6 (cyan), a linear fit applied to the daily case data using 10 points (orange), and a quadratic fit applied to the daily case data using 12 points. Predictions start at times t = 70 (top) and t = 110 (bottom). Vertical dashed lines mark the start and end of data points ?visible? to the corresponding predictors. . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.7 State-level COVID-19 case data (black) compared to a reservoir computer prediction using the local linear fit transformation (cyan) with m = 6 and a linear extrapolation (orange) using 10 points. All predictions start on day 70, the first day after the vertical dashed black line. The vertical or- ange dashed line marks the start of points used by the linear extrapolation and the vertical cyan dashed line marks the start of points used to train the reservoir computer. Here we show the first twenty-six states alphabet- ically and the District of Columbia. The remaining states are shown in Figure 4.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.8 The continuation of Figure 4.7, showing state-level COVID-19 case data (black) compared to a reservoir computer prediction using the local linear fit transformation (cyan) with m = 6 and a linear extrapolation (orange) using 10 points. All predictions start on day 70, the first day after the vertical dashed black line. The vertical orange dashed line marks the start of points used by the linear extrapolation and the vertical cyan dashed line marks the start of points used to train the reservoir computer. . . . . . . . 86 4.9 Predictor statistics for the four different predictor methods with different choices of m for the reservoir computer transformations (a,d) and number of points for the polynomial fits (b,e) applied to the U.S. COVID-19 daily case data. Statistics are taken by generating a prediction for each starting time t0 in [50, 250]. In (c,f), we compare the distributions for the errors of the reservoir computer predictions using the local linear fit transformation with m = 6 and the linear extrapolation using 10 points. . . . . . . . . . . 88 4.10 Predictor statistics for the four different predictor methods with different choices of m for the reservoir computer transformations (a,d) and number of points for the polynomial fits (b,e) applied to the New York COVID-19 daily case data. Statistics are taken by generating a prediction for each starting time t0 in [50, 250]. In (c,f), we compare the distributions for the errors of the reservoir computer predictions using the local linear fit transformation with m = 6 and the linear extrapolation using 10 points. . 89 xi 4.11 Predictor statistics for the four different predictor methods with different choices of m for the reservoir computer transformations (a,d) and number of points for the polynomial fits (b,e) applied to the Kansas COVID-19 daily case data. Statistics are taken by generating a prediction for each starting time t0 in [50, 250]. In (c,f), we compare the distributions for the errors of the reservoir computer predictions using the local linear fit transformation with m = 6 and the linear extrapolation using 10 points. Note that the box plots for the reservoir computer predictors (a,d) with low values of m extend way beyond the vertical scale used. A zoom out is provided in Figure 4.13. . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.12 Scatter plots comparing the one-week cumulative error (left) and one- week NMAE (right) for the reservoir computer prediction using the local linear fit transformation with m = 6 and the linear extrapolation using 10 points. The coordinates of each point are the median values of the error metrics for the labeled locations. The red line marks y = x. The plot in- cludes forty-seven states, the District of Columbia, and the United States as a whole. Not included are Connecticut, Rhode Island, and Vermont, which each had significantly higher median errors from both predictors. . 91 4.13 Zoom out of Figure 4.11(a,d), showing the full distributions for low values of m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.14 Embedding of the U.S. COVID-19 library into a three-dimensional la- tent space using an autoencoder. The library consists of reservoir com- puters trained over 50-day overlapping windows of data from t = 30 to t = 300. Daily case data were transformed using the percent daily change Eq. (4.7) before input into the RCs. Points are colored according to the first day of the training window, with green points corresponding to start- ing in [30, 100], blue points corresponding to starting in (100, 200], and red points corresponding to starting in (200, 300]. Panels (a?c) are projec- tions of the points onto the three coordinate planes. Panel (d) shows all points in three-dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.15 Predictions using the library-augmented reservoir computing method (red lines) using five points of training data compared to a single reservoir computer (blue lines) using 21 days of training data and a linear extrap- olation (orange lines) using five days of data. Vertical red dashed lines indicate the start of the LARC training data and data used for the linear extrapolation, and vertical blue dashed lines mark the start of the single RC training data. Predictions start on the first day after the green dots. Each bottom panel is a one-week zoom in of the predictions shown in their corresponding top panels. The library used for LARC consists of RCs trained on 50-day overlapping windows in [30, 125]. . . . . . . . . . 96 xii 4.16 Embedding of the New York COVID-19 library into a twelve-dimensional latent space using an autoencoder. The library consists of reservoir com- puters trained over 50-day overlapping windows of data from t = 30 to t = 300. Daily case data were transformed using the percent daily change Eq. (4.7) before input into the RCs. Each panel is a projection onto a plane spanned by a pair of coordinate axes. Points are colored according to the first day of the training window, with green points corresponding to starting in [30, 100], blue points corresponding to starting in (100, 200], and red points corresponding to starting in (200, 300]. . . . . . . . . . . . 98 4.17 Embedding of the combined U.S. and New York COVID-19 library into a twelve-dimensional latent space using an autoencoder. The library con- sists of reservoir computers trained over 50-day overlapping windows of data from t = 30 to t = 300. Daily case data were transformed us- ing the percent daily change Eq. (4.7) before input into the RCs. Each panel is a projection onto a plane spanned by a pair of coordinate axes. Top: Points are colored according to the first day of the training window, with green points corresponding to starting in [30, 100], blue points corre- sponding to starting in (100, 200], and red points corresponding to starting in (200, 300]. Bottom: Points are colored according to region, with red indicating New York and black indicating the U.S. . . . . . . . . . . . . . 99 A.1 The fractions of nodes connected by a directed path of length two con- nected by at least one other directed path of length two. . . . . . . . . . . 103 A.2 Growth of the connected components on the full network (left) and the damaged subnetwork (right). Results shown are the averages of 100 runs on networks of size N = 1000. We omit GIN and GIN* due to their symmetry with GOUT and GOUT*, respectively. . . . . . . . . . . . . . 104 xiii List of Abbreviations BT bow tie GBT giant bow tie GIN giant in-component GOUT giant out-component GSCC giant strongly connected component IN in-component LARC library-augmented reservoir computing OUT out-component NMAE normalized mean absolute error RC reservoir computer RLS recursive least squares SCC strongly connected component SIR susceptible-infected-recovered xiv Chapter 1: Introduction Network models have proven to be useful tools in representing the complex interac- tions of many-agent systems, especially when combined with a dynamical system model to describe the state of the system over time. These systems are made further complex when we no longer assume that the ?rules? of the dynamics are fixed, but instead changing over time. Such systems quickly become intractable to study analytically, but can instead be studied through computer-simulated experiments to uncover their unknown behaviors. In this thesis, we focus on analyzing three dynamical networks that apply to biolog- ical or social systems (or both, in the case of epidemic models). Each dynamical network features rules that are ever-changing (adapting) in response to other properties of their systems. We quickly summarize each investigation below, with more detail present in the remainder of this chapter: 1. In our first investigation, we consider a model of gene network evolution in which the topology of the network changes over evolutionary time as new links are added via a selection rule that uses some combination of structural and dynamical in- formation of the network. The changing topology inherently affects the possible dynamics of the network, eventually leading to widespread network failure. We are interested in the location of the critical point at which failure occurs and which 1 selection rules perform best at delaying this critical point. 2. In our second investigation, we combine a phase synchronization model with a contagion model using a bidirectional feedback. In other words, the behavior of the synchronization model depends on an individual?s infection status, and the spread of the infection depends on an individual?s phase. We work to classify all possible behaviors of this system, considering both temporary and permanent consequences. 3. In our third investigation, we use a data-driven approach through reservoir comput- ing to forecast COVID-19 cases in the United States. The disease dynamics depend on the constant change in human behavior, both naturally occurring and following government-mandated orders, in response to reported case numbers. In addition to generating case predictions with a reservoir computer, we observe the result of embedding several trained reservoir computers in a latent variable space through an autoencoder. We find that reservoir computers trained on similar time series form clusters in the latent space, indicating the possibility of a low-dimensional underlying system. The work from investigation one has been published in J. R. Soc. Interface [2]. The work from investigation two will be submitted for publication to J. R. Soc. Interface with Brandon Alexander (author of this thesis) as the lead author and Michelle Girvan (thesis advisor) as the senior author. The work from investigation three was done in collaboration with Andrew Pomerance, Daniel Canaday, Brian Hunt, Edward Ott, and Michelle Girvan. Brandon Alexander took the lead on all work presented here. Daniel Canaday led the effort in developing the library-augmented reservoir computing approach, which will be 2 published in a separate paper. 1.1 Phase transitions and assortativity in models of gene regulatory net- works evolved under different selection processes Gene regulatory networks are believed to exist near a critical point, where they have maximized dynamical versatility without risk of widespread network failure. While gene networks have been extensively studied as they appear today, not much is known about how they evolved to this point and what features may have guided this evolution. In Chap- ter 2, we propose a model of gene network evolution, based on a Boolean network model, that grows over evolutionary time based on the topological and dynamical properties of the network. In a Boolean network model for gene regulation [3], each gene is represented by a node with regulatory interactions between pairs of genes represented by directed edges. A node is either in the state ?1? (the gene is expressed) or ?0? (the gene is not expressed). At each time step, all nodes are simultaneously updated according to the states of their inputs and a set of rules in the form of a truth table. For a finite-sized network with deterministic update rules, the dynamics will always reach a steady-state of a fixed point or a periodic orbit. The stability of the dynamics can be quantified by considering how much the steady-states differ when beginning with two initially-close network states (i.e., a single node is switched between 0 and 1). Determining the stability of a Boolean network through direct simulation is a com- putationally intensive task. However, if we consider a locally tree-like network under the 3 ?semi-annealed? approximation [4, 5], then the stability problem can be directly mapped onto a percolation problem, which is considerably faster to simulate. Specifically, if we consider the sensitivity of a node, defined as the probability of the node changing state if at least one of its input changes, and set that as the probability that the node is ?filled-in?, then the stability measure for the network as a whole is given by the size of the giant out-component of the filled-in subnetwork. For our model of gene network evolution, we begin with a network of nodes with no edges. Each node is assigned a sensitivity and filled-in accordingly. For each generation of network evolution, we select an edge to add to the network according to a competitive selection rule, motivated by natural selection and modeled after the rules introduced by Achlioptas et al. [6]. After an edge has been added, we track the size of the giant out- component of the filled-in subnetwork, which is equal to the stability measure for the entire network. This process is repeated for several generations, until we sufficiently pass the critical point at which the network switches from stable to unstable dynamics (the percolation transition of the filled-in subnetwork). Our main focus is determining what selection rules are more effective at delay- ing the onset of percolation, and thereby delaying instability. We consider four types of rules: random selection, structure-based selection, sensitivity-based selection, and hybrid (structure and sensitivity) selection. We find that topological information, i.e., adding connections that keep components small, is more effective at delaying percolation than avoiding connection between highly-sensitive (likely to be filled-in) nodes. For further validation, we analyze gene network data from five organisms. For each network, we approximate the order of edge addition by ordering the genes by their 4 estimated ages. By tracking the in-out assortativity of the networks over evolutionary time, we observe a striking similarity to the in-out assortativity of our simulated networks grown under structure-based and hybrid rules. 1.2 The sync-contagion model of dual information spread Synchronization is a natural phenomenon found in social systems of both humans and animals. In many cases, the ?status quo? synchronization of a group will adapt to the presence of some external stimulus. Consider, for instance, the swarming behavior of a group when presented with a threat. Whatever the default behavior may be, initial ?leaders? aware of the threat will alert others, who will follow the leaders to escape. In such scenarios, two systems are at play: a synchronization system that determines how strongly individuals match each others? behaviors and a contagion system that controls the spread of the alert through the population. Furthermore, the two systems are coupled, where synchronization behavior changes based on one?s knowledge of the alert. In Chap- ter 3, we present a coupled sync-contagion model that describes the interplay of these two systems. Other than group escape in response to a threat, we also describe an ap- plication to opinion dynamics, in which individuals? opinions adapt in response to a hot topic driven by a group of promoters. We base our model on that of Chicoli and Paley [7], originally used to represent fish school response to the presence of a predator. Among several minor adjustments, we make two key changes to their model. First, the Chicoli and Paley model only con- tains unidirectional feedback between the sync and contagion dynamics. Specifically, 5 the contagion status of an individual (susceptible, infected, or removed) determines the synchronization behavior, but not the reverse. We keep this interaction but also add an ?infection acceptance function? ?(?) that creates a bidirectional feedback, by allowing the sync dynamics to influence the contagion spread. The acceptance function gives the probability of a susceptible individual to become infected upon being exposed, depending on their orientation ?, with orientations further from some target orientation having lower acceptance probabilities. In the case of a failed infection upon exposure, the individual immediately becomes removed (not infected and can never become infected). In the con- text of swarming, the acceptance function takes into account individuals who are ?facing the wrong direction? and are unable to either see or turn quickly enough to join in in- stantaneous realignment. For our opinion dynamics application, the acceptance function filters out individuals whose opinions differ too greatly from the hot topic and choose not to become promoters. The acceptance function introduces a ?barrier? to realignment and drives several of the emergent behaviors we observe. For the second key change, we note that the Chicoli and Paley model distinguishes between naturally coherent systems (attentive obligate shoaling) and naturally incoherent systems (attentive facultative shoaling) through setting the initial orientations of the popu- lation to either be polarized or uniformly distributed, respectively. In both cases, the same coupling strength ? is used for the sync dynamics that will result in synchronization of the systems over time. We choose to instead distinguish between coherent and incoherent systems by varying this coupling strength ?. While incoherent systems will no longer be able to permanently realign, we can determine the ability of these systems to temporarily realign while also actively dispersing. 6 We find that the overall behavior of our sync-contagion system can be classified into five categories. The first category consists of the trivial case where no initial contagion spread occurs, due to the exposure rate being too slow. The second category consists of all naturally incoherent systems, which are able to temporarily realign towards the target orientation, but will eventually return to a uniform distribution. The final three categories are various levels of permanent realignment success in coherent systems: no/limited re- alignment, partial realignment, and full realignment. Which of these last three categories a system falls into depends primarily on the relative strengths of the contagion exposure rate and the sync coupling. 1.3 A reservoir computing approach to forecasting COVID-19 data The COVID-19 pandemic presented the world with a global health crisis never be- fore seen in our lifetime. In response, countless efforts have gone towards forecasting COVID-19 cases to better plan emergency response. While the most successful predic- tions have been model-based, considerable effort has gone into data-based model-free approaches using machine learning. In Chapter 4, we use a reservoir computing approach to forecast United States COVID-19 cases at both the national and state levels. A reservoir computer is a system in which an input signal is mapped into a high dimensional recurrent neural network (the reservoir), which is then trained to predict some output signal. When applied to time series forecasting, the output signal is the input at the next time step. Reservoir computers offer a level of simplicity in their training. The input mapping and reservoir topology are completely randomized, with the output mapping 7 being the only piece affected from the training phase. In training, data are fed into the reservoir with the resulting sequence of reservoir states tracked. The mapping from the reservoir to the desired output is then determined through a least squares fit or similar method. Once trained, the output of the reservoir computer can be used as its input at each subsequent step, creating the ?closed loop? fully autonomous prediction mode. Reservoir computers have proven to be powerful tools in forecasting and mimicking properties of select chaotic systems. Most applications of reservoir computers have focused on forecasting model sys- tems, in which training data is generated from a system of differential equations and can be produced with arbitrary size and resolution. By contrast, real-world data is rarely as available or high-quality. Particularly, with daily case reports, a full year of COVID-19 case data is substantially smaller than usually required for successful machine learning training. The case data are also heavily influenced by testing and reporting frequency, leading to a weekly oscillation in case numbers, or in some regions, several days of ap- parently zero cases interrupted by artificial ?spikes? in cases. To address the reporting irregularity, we find it necessary to apply a preprocessing transformation that will ?smooth? out the data. The contributions from a single day are then spread out over multiple days. This transformation is especially needed in states that report cases once every few days, in order to remove the massive spikes that would otherwise be impossible to predict. In addition, the data must be detrended to remove the non-stationary behavior of the mean. We must choose an appropriate preprocessing transformation that will also convert the data to be approximately constant. We compare the short-term predictions from our reservoir computer to those from 8 a linear and quadratic fit. The quadratic fit performs the worst of these methods, due to its tendency to deviate (quadratically) from the true data whenever the fit is poor. We also find that the reservoir computer is able to occasionally replicate the oscillations present in real case data, but when considered as a whole, is unable to significantly outperform a linear regression. While our preprocessing steps address the data quality, we also need to resolve the lack in data quantity. As one potential solution, we consider the library-augmented reservoir computing (LARC) approach. In the LARC approach, we construct a ?library? of reservoir computers trained on COVID-19 data. The library is then embedded in a low- dimension latent space by an autoencoder. When presented with new training data, rather than directly training a new reservoir computer, a point in the latent space is selected and decoded to produce the reservoir computer. Through an optimization scheme, selection of the point in latent space can be refined until a best match is produced. Success of LARC requires reservoir computers produced from similar time series to be located close in the latent space. We demonstrate that such clustering is present both temporally, where RCs trained on time series segments from similar start times are grouped together, and regionally, where time series from the same geographical zone are grouped together. 9 Chapter 2: Phase transitions and assortativity in models of gene regula- tory networks evolved under different selection processes This chapter has been published in the April 2021 issue of the Journal of the Royal Society Interface [2]: Brandon Alexander, Alexandra Pushkar, and Michelle Girvan. Phase transitions and assortativity in models of gene regulatory networks evolved un- der different selection processes. J. R. Soc. Interface., 18(177):20200790, 2021. (doi: 10.1098/rsif.2020.0790). 2.1 Introduction Mathematical models of gene regulatory networks provide a powerful tool for un- derstanding the complex features of genetic control. While various modeling efforts have been successful at explaining gene expression patterns, much less is known about how evolution shapes the structure of these networks. Recent studies suggest that evolution- ary ?tinkering? plays a large role in the organization of biological networks [8]. Further, these networks are often thought to exist near some critical point [9?11], where dynamic variability is maximized without reaching widespread network failure/breakdown. The phase transition between stability and instability in networks has been widely investi- gated, with studies focusing on the effects of topological features [3, 12?14], dynamical 10 features [4, 15?17], or both [18] and their contributions to the location and behavior of the transition. Of particular interest is the evolutionary process that leads to this critical point and how this process depends on both the topological and dynamical properties of the network and its nodes. Some previous studies have considered the evolution of these networks but begin with already-established networks that are then allowed to change over time [19?21]. By contrast, we consider the process from the very beginning, starting with an ?empty? network. In order to gain insight into this process, we study simple models of gene net- work evolution in which links are added according to various competitive selection rules. Similar rules were explored by Achlioptas et al. [6], who demonstrated that undirected networks grown following competitive link selection can have drastically delayed per- colation transitions, leading to a seemingly discontinuous ?explosive? transition, a topic which has since become popular in the network science literature [14, 22?27]. Squires et al. extended the idea of competitive selection to directed networks, where similar, but less drastic ?weakly explosive? transitions occurred [14]. Incrementally growing networks according to competitive selection rules is a natural choice for modeling biological net- works, which are the result of generations of improvements through the complex process of natural selection. For the same reason, biological networks share commonalities with systems that are said to exhibit highly optimized tolerance, in which a yield or fitness is to be maximized [28, 29]. In such systems, competitive selection rules can be used heuristi- cally to maximize a yield dependent on the size of a giant component. Besides evolution through natural selection, biological systems develop over time through neutral evolution, or evolution through mutation without selection [30?33]. While we primarily focus on 11 networks evolved via selection, we will also consider a simple case of neutral evolution for comparison. Other approaches have used a combination of mutation and selection to evolve networks that maximize the topological entropy [34] or minimize the distance to a time-dependent network output function [20]. By comparison, our modeling efforts focus on growth starting from a blank state, serving to highlight how simple differences in selection rules can give rise to very different network structures and change the nature of phase transitions in the network. Our models of network growth build upon various studies of Boolean models of gene regulation. From a dynamical perspective, Boolean networks have been widely used as mathematical models of gene regulation since their introduction by Kauffman in 1969 [3]. In these models, at any given time, nodes (genes) exist in one of two states: ?on? (1), meaning that the gene is expressed, or ?off? (0), meaning that the gene is unexpressed. In Kauffman?s original model, gene states are synchronously updated at each time step ac- cording to a truth table, in which the state of a gene at time t is determined by the states of all of its input genes at time t?1. (While we connect our evolutionary modeling approach to the case of synchronous update, we also discuss how the approach may be connected to cases of asynchronous update.) By tuning system parameters, the models can exhibit a transition from stable dynamics to ?chaotic? (unstable) dynamics [3, 4, 12]. Stability here is measured by the saturated average Hamming distance, or fraction of genes that differ in state after a long time, between the trajectories of two close initial conditions. Since a finite network of size N has at most 2N possible states and the dynamics are determinis- tic, the trajectories will eventually reach an asymptotic state in the form of a fixed point or periodic orbit. If the state of a gene in the asymptotic state differs between these two 12 trajectories at time t, the gene is said to be ?damaged? at time t [5]. As demonstrated by Squires et al., we can map the stability of Boolean networks directly to a percolation problem under certain assumptions [5]. In a percolation problem, the size of a ?giant component? is tracked as a certain tuning parameter is varied, typically passing through a critical value where the giant component transitions from being vanish- ingly small to being a non-trivial fraction of the size of the network. Following Squires et al., by setting a gene?s sensitivity, i.e., the probability that the gene changes state if one of its inputs changes state, equal to its probability of being ?damaged?, the saturated average Hamming distance, which provides a measure of instability in the gene network dynamics, can be mapped directly onto the size of the largest out-component in the sub- network of ?damaged? nodes. As either the node sensitivities or network connections are tuned, the phase transition from dynamic stability to instability then corresponds to the percolation transition of this giant component. We extend this idea by allowing the topology of our network to change over time following different competitive selection rules, which works as a primitive model of bio- logical networks evolving via natural selection. As new links are added, at some critical point, percolation will occur in the damaged subnetwork, corresponding to the network?s transition from stability to instability. To summarize, we are motivated by a model of gene regulatory dynamics in which there are benefits for the system to exist on the stable side of the transition between sta- bility and instability. We utilize the mapping between a model of this kind and an appro- priately defined percolation problem to explore simple models of evolution via growth. We note that the percolation problem can be considered as its own simple abstraction of 13 the propagation of failure in gene regulatory networks. Using the percolation framework, we investigate the following growth processes: (1) random link addition, as a control; (2) sensitivity-based competitive selection; (3) structure-based competitive selection; and (4) hybrid competitive selection. We find that incorporating network structure into the selec- tion rule is key to delaying percolation of the damaged subnetwork, even while having no knowledge of each gene?s sensitivity, or probability of becoming damaged. In fact, incorporating information about sensitivity can backfire, leading to highly exclusive net- works that would rather form large components than connect to high risk genes. Finally, we compare the results of our simulations to data for several empirically-derived gene regulatory networks and find similarities in structural properties. 2.2 Methods 2.2.1 Modeling failure propagation in gene networks as a percolation problem We construct a model of gene regulation as follows: we first consider damage prop- agation on a network with a fixed topology, which will later be allowed to evolve. We are motivated by the traditional Boolean models of gene regulation in which each gene in the network has a fixed ?truth table?, or output rule, that determines its state (0 or 1) at time t based on the states of its inputs at time t ? 1. To simplify the problem for our purposes but still capture the effects of network topology, we apply the semi-annealed approximation [4], in which the network remains fixed, but the truth table for gene i is no longer fixed but rather randomly filled in at each time step according to some bias bi, 14 the probability that its state is 1. We can think of bi as the expression bias of gene i, with genes having a wide distribution of expression biases. Working within the semi-annealed approximation, from the gene?s bias, we can then calculate its sensitivity qi = 2bi(1? bi), which represents the probability that a gene changes state if the state of one or more of its inputs change. This can also be thought of as the probability that a gene spreads ?dam- age?. Damaged genes are those whose states differ in the asymptotic states of two initially close trajectories of the network?s dynamics, meaning a small perturbation in the state of the network can lead to unpredictable dynamics of such genes, likely causing improper regulatory behavior. Following [5], for a locally treelike network under the semi-annealed approxima- tion, dynamic instability of the network, measured in terms of the average saturated Ham- ming distance between trajectories of two close initial conditions, is equivalently mea- sured by the expected size of the giant out-component of the damaged subnetwork, which takes into account both the network topology and the genes? propensities for spreading damage, i.e., their sensitivities. In particular, the phase transition from stability to insta- bility occurs simultaneously as the onset of a non-trivial out-component of the damaged subnetwork (i.e., a component whose size does not go to zero as the number of nodes gets larger). In Appendix A, we show that our simulated networks display the relevant locally treelike properties. While we are motivated by a model of synchronously updated genes with locally treelike connectivity, we note, importantly, that the semi-annealed approximation works well even for an asynchronous update scheme and for the case of networks with many small loops. Although there are important differences between synchronous and asyn- 15 chronous update models (e.g. the occurrence of limit cycles [35]), Refs. [4] and [18] demonstrate that the semi-annealed approach for determining the transition between global stability and instability (and the growth of perturbations) also applies for asynchronous update. Further, Ref. [4] shows that the approximation holds even for networks that de- viate substantially from the locally treelike assumption. This is consistent with studies that show that calculations involving the locally treelike approximation provide accurate results even for networks that are very far from treelike [36, 37]. By using the semi-annealed approximation, we are able to work entirely in the realm of the percolation problem, without needing to directly simulate the complex dynamics of the Boolean network. If fact, we can think of the percolation problem as its own simple abstraction of gene regulatory networks, in which genes become damaged with different propensities and they can propagate that damage to their target genes. To incorporate evolutionary changes into our percolation framework, we add new links (edges) individually to the network according to a given selection rule, described in Sec. 2.2.3. Each new link represents a new regulatory interaction that has evolved in the network, potentially changing the stability of the dynamics. We then repeat this procedure until a sufficient number of links have been added to the network. For our simulations, we initialize the network to consist of N genes and no links. We let E denote the number of links (edges) in the network at later steps. The gene biases are drawn from a uniform distribution bi ? U [0, 1]. Note that for a single simulation, bias values do not change, meaning we are fixing an inherent dynamical property of each gene. This is a restrictive assumption, but it provides a nice starting point for comparing networks evolved under different selection rules. Before the simulated evolution process 16 begins, each gene is assigned damaged status with probability qi, leading to an average of one-third of all genes being damaged (due to how biases were chosen). By averaging over multiple simulations, we found that keeping the same damage status constant throughout all steps of a single simulation produces very similar results to re-randomizing the statuses at each individual step, so we have chosen to keep the damaged statuses constant. Finally, at each step, we track the size of the giant components in both the full network and damaged subnetwork, which also allows us to efficiently simulate each growth process using the algorithm suggested in [14]. 2.2.2 Tracking network components For a gene i, define the in-component IN(i) to be the set of all genes that can reach i via a directed path in the regulatory network. Similarly, define the out-component OUT(i) to be the set of all genes that i can reach via a directed path. The strongly connected component SCC(i) is the intersection of these two sets and the bow tie BT(i) their union. For a directed network, we say the giant strongly connected component (GSCC) is the largest strongly connected component. We then define the giant out-component GOUT = OUT(GSCC), the giant in-component GIN = IN(GSCC), and the giant bow tie GBT = BT(GSCC). In systems where percolation occurs, all four giant components form simultaneously at the same critical point [38]. Within our networks, we will also have a subset of genes that are ?damaged?. The damaged subnetwork is the directed network consisting of these genes and any links that exist between them. We use an asterisk (*) to distinguish the components of the damaged 17 subnetwork from those of the full network. 2.2.3 Network growth via selection To decide which link to add at each step, we use a class of competitive selection rules based on those introduced by Achlioptas et al. [6] and extended to directed networks by Squires et al. [14]. The form of these rules is as follows: 1. Consider m potential ?source? genes {i1, . . . , im} and m potential ?target? genes {j1, . . . , jm}. All genes are sampled uniformly from the network. 2. Select the directed link i? j such that i = arg min fs(i), j = arg min ft(j). (2.1) i?{i1,...,im} j?{j1,...,jm} 3. If the link i ? j already exists in the network or if i = j, then a different link is selected starting back at step 1. The choice of fs(i) and ft(j) are chosen so as to delay the percolation transition of GOUT*. We consider three forms of competitive selection (sensitivity-based, structure- based, and hybrid) and the case of random link addition as a control. As in the original Achlioptas study, the selection rules we consider rely only on local information about the candidate links. In Appendix A, we study the extreme case of m = N for structure-based selection and demonstrate that, as expected, having access to global information when adding links serves to delay the transition for a much, much longer period of time. 18 2.2.3.1 Random selection In the case of m = 1, we select a random link uniformly from all possible links. This is equivalent to the directed Erdo?s-Re?nyi process [14, 39]. Random selection serves as a simple baseline to which we can compare the performance and behavior of other selection rules. In addition, one can consider random selection as a model for neutral evolution. 2.2.3.2 Sensitivity-based selection To consider the effect of sensitivity, we define fs(i) = qi, ft(j) = qj. (2.2) Genes with higher sensitivity values are more likely to be damaged. The strategy used by sensitivity-based competition is then to prevent growth of the damaged subnetwork by preventing ?risky? genes from connecting. 2.2.3.3 Structure-based selection To consider the effect of network structure in delaying the onset of percolation, we define fs(i) = |IN(i)|, ft(j) = |OUT(j)|. (2.3) This rule is a generalization of the dCDGM rule applied to directed networks [40]. The strategy imposed by structure-based competition is to prevent percolation in the damaged 19 subnetwork by preventing large components from forming in the full network. 2.2.3.4 Hybrid selection The final rule is a combination of structure-based competition and sensitivity-based competition. We set fs(i) = qi ? |IN(i)|, ft(j) = qj ? |OUT(j)|. (2.4) The hybrid strategy will prevent components from becoming large in the full network, while also showing a preference towards connecting low risk (low sensitivity) genes. High sensitivity genes will only be accepted if they belong to small components. 2.3 Results 2.3.1 Visualizing networks evolved via different selection rules For each of our network measures, we use the average node degree E/N as the tuning parameter. We can first gain intuition about the structure of our networks by con- sidering smaller networks of size N = 100, as is done in Figs. 2.1 and 2.2. We begin by considering the networks when E/N = 1.75, a point after which percolation of GOUT occurs for both the random and sensitivity-based selection rules, but not the others. Com- pared to random selection, sensitivity-based competition has a preference towards con- necting low sensitivity genes, even going so far as to exclude a large number of high sensitivity genes from the network, an effect that is more exaggerated for higher values 20 Figure 2.1: Structure of networks formed under our four different selection rules on a network of size N = 100, stopped after E/N = 1.75. Nodes are sorted according to sensitivity with least sensitive at the top of the circle and increasing in sensitivity clock- wise. If a node is damaged, it is filled in blue. Nodes in GOUT are enlarged and filled in black (if undamaged) or blue (if damaged); all other nodes are filled in white. Isolated nodes with no links are placed below each network with their corresponding locations in the circular layout left empty. Thick links connect nodes in GOUT. Links are blue if they connect two damaged nodes, black if they connect nodes in GOUT that are not both damaged, or gray otherwise. We see that GOUT is much larger for the random and sensitivity-based processes as compared with the structure-based and hybrid processes. Both the sensitivity-based and hybrid processes show a higher density of links between low sensitivity nodes, which is much more pronounced in the former. 21 Figure 2.2: Network structure for a network of size N = 100 grown under the four selection rules stopped after E/N = 5.75. As in Fig. 2.1, nodes are sorted according to sensitivity, with least sensitive at the top of the circle and increasing in sensitivity clockwise and with isolated nodes left out of the circular layout and placed below each network. We now focus on GOUT* rather than GOUT. Nodes belonging to GOUT* are enlarged and red with thick red links connecting between them. Damaged nodes that do not belong to GOUT* are blue, with blue links connecting pairs of damaged nodes. All other nodes are colored white, and all other links are colored gray. Even though the size of GOUT for the sensitivity-based process is similar to the random process, the former has a much smaller GOUT*. The structure-based and hybrid processes show even smaller GOUT*, despite a large number of connections between damaged nodes. of m. By comparison, both the structure-based and hybrid competition rules takes much longer to form a GOUT component. These two methods are nearly indistinguishable, ex- cept the hybrid method mimics some properties of sensitivity-based competition, namely having higher link density between low sensitivity genes and exclusion of high sensitivity genes. We next consider a higher parameter value of E/N = 5.75, where GOUT* has now formed for random and sensitivity-based selection and is beginning to form for 22 structure-based and hybrid selection. Because sensitivity-based selection avoids connect- ing to high sensitivity genes overall, the GOUT* component mainly consists of lower sensitivity genes. Since most damaged genes have higher sensitivities, this also leads to fewer links overall between damaged genes and a smaller GOUT* component than in the case of random selection. We also observe that in random selection and sensitivity-based selection, nearly every link between damaged genes belongs to GOUT*. By compari- son, structure-based and hybrid selection have a large number of links between damaged genes, but only few belong to GOUT*, since the individual out-components are kept small and avoid connecting to each other. 2.3.2 Growth of giant components Now equipped with better intuition about these selection methods, we turn our at- tention to growth of the connected components in the N ? ? limit. We can approxi- mately achieve this goal by simulating on a network of size N = 105, shown in Fig. 2.3. As before, we begin by considering the growth of the full-network components, ignoring damage status. As expected, the formation of GOUT is delayed much further in structure-based and hybrid selection methods, due to their nature of selecting links to maintain smaller component sizes overall. Meanwhile, sensitivity-based selection instead tends towards earlier formation of the giant component, with higher values of m pushing this transition earlier. By increasing m in structure-based selection, we are more likely to find a smaller component, and thus are able to spread out links more efficiently to avoid forming larger components. By comparison, in sensitivity-based selection, increasing 23 Figure 2.3: Growth of the connected components on the full network (left) and the dam- aged subnetwork (right). We compare growth under random selection to our three com- petitive rules. Results shown are the averages of 100 runs on networks of size N = 105. We omit GIN and GIN* due to their symmetry with GOUT and GOUT*, respectively. Of the selection rules considered, we see that structure-based selection (blue curves) does best at delaying the phase transition for the full network, while the structure-based and hybrid selection (red curves) exhibit simultaneous transitions when we consider only the damaged subnetwork, with the hybrid selection showing greater suppression of the com- ponent size after the transition. 24 m makes us more likely to find low sensitivity genes (and avoid high sensitivity genes altogether), resulting in a higher link density among the lower sensitivity genes and a faster formation of a giant component. The exclusion of highly-sensitive genes leading to faster emergence of a giant component also explains why hybrid selection transitions slightly earlier than structure-based selection. Of greater interest is the formation of GOUT*. We again have structure-based and hybrid selection leading to later percolation of GOUT*, but unlike before, sensitivity- based selection leads to formation of GOUT* later than random selection. The sensitivity- based selection rule is designed to prevent formation of GOUT* by preventing con- nections between damaged genes, which leads to slower formation of GOUT* but not GOUT. Comparing structure-based and hybrid selection, we see that the critical point at which GOUT* forms is approximately the same when m = 2. But when m is increased, structure-based selection can delay the transition much more easily. Again, the additional effect introduced by considering sensitivities seems to be at play. Hybrid selection is willing to form larger components, as long at the genes involved have relatively low sen- sitivity. Eventually, however, the high sensitivity genes will have smaller components and be more favorable to gain a new link. It appears that this shift is enough to lead to faster growth of GOUT* than if all components were to be kept smaller, ignoring sensitivity. Other than location of the percolation transition for GOUT*, we can also look at the growth rate of this component post-transition. One advantage of sensitivity-based and hy- brid selection rules are their slow growth of GOUT*, despite the earlier transition. Since these rules will tend to prevent connections between damaged genes, a certain number of highly-sensitive genes will have trouble joining with GOUT*, and the rate at which 25 GOUT* can achieve its maximum size is reduced. In particular, increasing m will slow down the growth even further in sensitivity-based and hybrid selection, since larger values of m lead to increased exclusion of genes. 2.3.3 Maximizing yield Our model exhibits certain features common to systems developed through highly optimized tolerance [28, 29], namely incremental improvements in response to maximiz- ing some yield or fitness function, leading to specialized structures. While we do not directly maximize any yield function, our selection methods can be thought of heuristics for maximizing 1 Y = (nactive ? |GOUT*|), (2.5) N where nactive is the number of genes with at least one link. Only genes that interact with others will contribute to the dynamics in a meaningful way. Therefore, we should max- imize the number of active genes to increase the dynamical variation of the system. At the same time, we do not want GOUT* to grow too big, since the size of GOUT* directly gives the number of genes that are damaged in the asymptotic state. We consider this yield for each of our selection methods, as shown in Fig. 2.4. Although avoiding connection to high risk genes is a potentially useful feature in preventing connections between damaged genes, we do not want these genes to be kept entirely isolated from the network. In the case of each selection rule, the yield quickly grows as the number of active genes increases, but reaches a maximum when either all genes have been assimilated into the network or the percolation transition of GOUT* has 26 Figure 2.4: The average yield of networks grown under our four selection rules, as de- scribed by Eq. (2.5). Results are averaged over 100 runs on networks of size N = 105. Hybrid and structure-based selection processes show similar yield profiles, consistently outperforming both random and sensitivity-based yields. been reached. The yield then begins to decrease and approach 1?N?/N , the fraction of undamaged genes in the network. Sensitivity-based selection is the only of our rules that does not reach a maximum of Y = 1 before decreasing, due to a large number of inactive genes. One should note that after peaking, sensitivity-based selection does continue to assimilate genes into the network, but at a slower rate than the growth of GOUT*. For structure-based and hybrid selection, there is a long period during which the yield is maximized. Both rules are able to quickly incorporate genes into the main net- work, since small components of size one are much more preferable. Hybrid selection tends to grow slower at first, due to its slight exclusion of high sensitivity genes. For m = 2, when both of these rules form GOUT* at approximately the same time, hybrid 27 selection performs slightly better, since the slower growth of GOUT* leads to a higher yield after the transition. Increasing to m = 3 and beyond, however, the longer delay in percolation makes structure-based selection significantly better. 2.3.4 Correlations Finally, we consider two forms of correlations in our networks. The first is the in-out degree assortativity, given by ? ?? i s j ti?j(din ? d?in)(dout ? d?out)r = ?? , (2.6) i s 2 j t 2 i?j(din ? d?in) i?j(dout ? d?out) where sums are taken over all directed links i ? j, diin (djout) is the in- (out-) degree of node i (j), and d?sin (d? t out) is the average in- (out-) degree of all source (target) nodes av- eraged over links. The in-out degree assortativity is equivalent to the Pearson correlation coefficient between in-degree of source nodes and out-degree of target nodes, with all av- erages taken over links. Very positive assortativity indicates a preference of large degree nodes to connect to other large degree nodes, while very negative assortativity, also called high disassortativity, indicates large degree nodes tend to connect to small degree nodes. When the assortativity is near zero (no preference), the network is said to be assortatively neutral. The second relationship we consider is the correlation between node out-degree and node sensitivity, calculated by the Pearson correlation coefficient between out-degree and sensitivity averaged only over active nodes. Due to the symmetry of our selection rules, the correlation between out-degree and sensitivity is equal to the correlation between in- 28 Figure 2.5: In-out assortativity (top) and the degree-sensitivity correlation (bottom) for our four selection rules. Results shown are averages over 100 runs on networks of size N = 105. While the random and sensitivity-based process are assortatively neutral (i.e., have near zero in-out assortativity), we see that both the structure-based and hybrid se- lection processes result in disassortative networks before the phase transition, with the former continuing to show weak disassortativity after the transition, while the latter is nearly neutral after the transition. The random and structure-based processes result in no correlation between degree and sensitivity while, as expected, the sensitivity-based selec- tion process creates a negative correlation between a node?s out-degree and its sensitivity. The hybrid process shows weaker and non-monotonic negative correlations. 29 degree and sensitivity. We will refer to either case as the degree-sensitivity correlation, since they are interchangeable. Both the in-out assortativity and degree-sensitivity corre- lation are shown in Fig. 2.5. As one should expect, both random selection and sensitivity-based selection are as- sortatively neutral, since they do not directly consider component sizes or node degree in the decision process. Structure-based selection and hybrid selection are highly disas- sortative processes, since both will tend to prevent large in-components from connecting to large out-components. Nodes with large in-components typically have large in-degree and similarly with out-components and out-degree. In both of these rules, the assortativ- ity continues to decrease until the critical point at which GSCC percolates, after which the assortativity tends toward zero. Once the giant components are large enough, there is a high probability to select genes from the GSCC. If all m genes are from the GSCC, then they will each have the same component sizes, at which point a link will be chosen either randomly, in the case of structure-based selection, or based on sensitivity, in the case of hybrid selection. Either approach will tend towards neutral assortativity, since the decisions are independent of component size, which relates to degree. Since hybrid se- lection leads to a faster growing GSCC, its assortativity tends to zero much faster. For the case of m = 3, the trend is similar, though the assortativities of structure-based and hy- brid selection decrease at a slower rate to a higher (less negative) minimum value. Since higher values of m are better at keeping components small, more components exist near the average component size. This trend is clarified by considering the degree-sensitivity correlation. First, we note that both random and structure-based selection have uncorrelated degrees and sensi- 30 tivities, since they ignore sensitivity information. Sensitivity-based selection has a strong negative degree-sensitivity correlation, which is also to be expected since low sensitivity genes have a higher probability of gaining links. The most interesting case is hybrid se- lection. Initially, when all components are small, sensitivity is a larger influence in hybrid selection, causing a negative degree-sensitivity correlation. Once components are of mod- erate size, component size begins to take over until the formation of GSCC. As discussed previously, once GSCC forms there is a greater chance that all m potential genes will come from GSCC, in which case all m genes have equal component sizes and sensitivity again takes over. 2.3.5 Connecting with data We compare our simulated networks to networks reconstructed from data. We con- sider several reconstructed transcriptional gene regulatory networks for different species, listed in Table 2.1. Using [1], we have estimates on the ages of most genes found in these networks, which allows us to construct an ?ancestral ordering? of links. We do this by ordering each gene by age and adding a link once both genes appear. Admittedly, this way of capturing the network through evolutionary time provides only a very coarse view, since it relies on the strong assumption that a regulatory link between two genes appears as soon as the ?younger? gene appears. While this assumption is almost certainly violated, we believe it provides a good starting point for connecting with data. For comparison, we also consider a random ordering of the links. In both the ancestral and random orderings, the final networks are identical, with the only difference being the order in which the links 31 were added. Due to the small size of the networks, which results because the empirical data is incomplete, the connected components do not have clear transitions. In addition, we do not have known values for the bias or sensitivity of each gene. The sensitivity of a gene may be approximated using expression data, but calculating accurate sensitivities is a challenging problem and beyond the scope of our investigation. We can, however, analyze the in-out assortativity of the network as a function of the number of links added, as shown in Fig. 2.6. The final values of the in-out assortativity are given in Table 2.1. As we observed in our simulated structure-based and hybrid networks, when we add links using the ancestral ordering that incorporates gene age, we see a general trend in in-out assortativity: a significant early drop to a moderately negative minimum, followed by a slow increase back toward zero. This pattern is clear for 3 of the 4 species for which Ref. [1] provides extensive gene age data: yeast, drosophila, and mouse. We see a similar, yet weaker, trend for human in Fig. 2.6(d), but only after an initial peak with elevated assortativity, which is especially pronounced for the ENCODE data. We believe that the difference in the pattern may be due to the small network size (the ENCODE network includes only 122 genes) and the biases involved in subsampling from the full human gene regulatory network. Nonetheless, the difference between the ancestral ordering and the random ordering for the human data is similar to the other species: the assortativity for the ancestral ordering dips far lower than for the random ordering and stays significantly negative for a much larger range of links added. Perhaps interestingly, the two species with positively assortative final networks, yeast and E. coli, are also the only single-celled organisms considered. Unfortunately, incomplete gene age 32 Figure 2.6: In-out assortativity for the transcriptional gene regulatory networks listed in Table 2.1, as a function of the number of links added. Links are ordered using gene age information [1], and a randomized ordering of the links in the final network is shown for comparison. Both orderings yield the same final network. E. coli is omitted, due to lack of gene age information from [1]. Note that the horizontal axis is scaled by total number of final links (Efinal), rather than number of genes (N ), to allow for comparison across the different networks. data from [1] means we cannot verify if the assortativity of the E. coli network follows the same trend as the others. 2.4 Discussion We have seen that using simple competitive rules, a network can be evolved to re- sist large-scale damage propagation significantly better than the random case. Often this 33 Species Nodes Edges In-out assort. E. coli [41] 201 460 0.0475 S. cerevisiae [42] 409 5068 0.0590 D. melanogaster [43] 329 3407 ?0.0447 M. musculus [44] 827 2213 ?0.010 H. sapiens (TRRUST) [44] 795 1882 ?0.0516 H. sapiens (ENCODE) [45] 122 624 ?0.0430 Table 2.1: Number of nodes, edges, and the in-out assortativity for the E. coli, yeast, fruit fly, mouse, and human transcriptional networks. These in-out assortativity values are also the final values in Fig. 2.6 leads to specialized structure such as higher link density between low sensitivity genes, high disassortativity, and strong negative correlations between degree and sensitivity. Per- haps most surprising is that knowing a gene?s sensitivity is not necessary to effectively delay the onset of a system-wide damaged component, although it does help to ensure that the components remain small. In fact, when we include sensitivity but not struc- ture in the selection rule, more links form between a smaller subset of genes. Not only does this feature result in faster growth of giant components, but it also tends to exclude some high sensitivity genes from contributing to the network at all. Further, the in-out assortativity of our simulated networks evolved using structure-based selection mimics the in-out assortativity of our reconstructed real networks, further corroborating the im- portance of structure information in the evolution of actual regulatory networks. Future investigations into the connection between this kind of model and real-world data would benefit tremendously from accurate estimates of gene biases. Surprisingly, using gene sensitivity information along with structure does not pro- vide a benefit, at least when applied through a simple multiplicative rule, when compared to using structure alone. This hybrid rule results in a three phase growth process that 34 switches between emphasizing low sensitivity genes, then small component genes (which at such a point are high sensitivity), and then back to low sensitivity genes. The drawbacks found in the purely sensitivity-based selection take effect, leading to earlier percolation than if sensitivity were ignored. One can construct more complicated selection rules that incorporate sensitivity and structure to perform better than structure alone, but such rules tend to be overly engineered, and hence we suspect less biological. One missing aspect in this process is the ability for gene sensitivities to evolve over time. The sensitivity of a gene is an innate dynamical property, and should not be expected to stay constant throughout every generation. In fact, as the structure of the network evolves over time, the dynamical behavior of individual genes would likely change in response. Allowing for the coevolution of structure and sensitivity may lead to a different story, and represents a natural next step. Data Accessibility All relevant code and data are publicly available at https://github.com/bralex1/EvoNets. 35 Chapter 3: The sync-contagion model of dual information spread 3.1 Introduction Many agent-based models have been studied to explore emergent dynamics in pop- ulations of interacting individuals. Examples range from modeling the synchronization of flocks of animals [7, 46?48] to the spread of epidemics or information along net- works [49?52]. Most of these studies focus on a single type of dynamics, e.g. either the synchronization process or an information cascade, but not both simultaneously. A recent study [47] explored swarming and synchronization at the same time through a ?swarmalator? model. The swarmalator model consists of coupled spatial and phase syn- chronization dynamics. With this model, O?Keeffe, Hong, and Strogatz discovered five type of pattern formation involving clustering in space and phase. In this chapter, we are similarly interested in a combined model, with our focus on the case of synchronization and information cascades coevolving in a population. Like the swarmalators of Ref. [47], the combined ?sync-contagion? model has the potential to exhibit new behaviors not oth- erwise observable in the individual models alone. Synchronization is a phenomenon in which a group?s behaviors converge over time towards some consensus. Examples include flocking and swarming of animals [7,46,53], the songs of Japanese tree frogs [54], humans escaping a fire [55, 56], and the flow of 36 power in a power grid network [57]. One model of synchronization is the Kuramoto model [58,59], which has been used to analyze the behavior of a network of coupled phase oscillators, but has also been applied to swarming [7, 47, 48] and opinion dynamics [60, 61]. The Kuramoto model is known to exhibit two main types of behavior, based on choice of coupling strength [59, 62, 63]. For coupling above some critical value, the oscillators will synchronize, forming one or more clusters. For coupling below the critical value, the oscillators will instead disperse into an incoherent state, in which the oscillator phases are evenly distributed around a circle. Contagion models, on the other hand, represent systems where a signal or attribute cascades throughout a population. The archetypal contagion model is the well-studied SIR model of infection spread [49, 64, 65]. In the SIR model, each individual belongs to one of three groups: susceptible, infected, and removed/recovered. The infected in- dividuals spread the infection to susceptible individuals and eventually recover, gaining long-term immunity. Beyond infectious diseases, the SIR model can easily be adapted to model the spread of a computer virus [50, 66] or the spread of information such as news and rumors [51, 52, 67]. When combining the sync and contagion models, we consider a system of individu- als where some initial small population is incited by an external stimulus. The presence of this stimulus requires some form of behavioral change of the population represented by a target ?orientation? known only to the initial group. The initial group must then send two kinds of information throughout the population: the existence of the triggering stimulus and the target orientation. Specific applications include escape of a group in the presence of a threat, in which the orientation is the literal directional facing of an individual, and 37 the changing opinions of a population in response to a hot topic, where the orientation is an abstraction of one?s ?political orientation?. A combined sync and contagion model was introduced by Chicoli and Paley [7] to model the shoaling of fish schools and their escape in the presence of a predator. We base our sync-contagion model on the model presented by Chicoli and Paley, but with two key differences. First, the model of Chicoli and Paley only features unidirectional feedback between the two models, where the contagion dynamics influence the sync dynamics but not the other way around. We introduce a bidirectional feedback, to allow the sync system to affect the contagion system, by including an ?acceptance function? that determines the probability of successful contagion spread to an individual based on its orientation. For the second change, we note that Chicoli and Paley investigate the distinction between a naturally coherent and naturally incoherent system through selection of the initial condi- tions, but in both cases choose system parameters that will lead to synchronization over time. While this is valid for short timescales, we instead choose to distinguish between co- herent and incoherent systems by setting the sync model parameters appropriately, which allows us observe the longer-term behaviors for each. We discover five types of behavior exhibited by our sync-contagion model when considering the success of realignment of the system towards the target orientation. First, if the rate of contagion transmission is not fast enough, then we achieve the trivial sce- nario of no initial contagion spread. Second, for naturally incoherent systems, successful realign is possible, but only temporary, as the system will return to total incoherence af- ter sufficient time when no contagion is present. Finally, the remaining three behaviors are various levels of permanent align in coherent systems: no/limited realignment, partial 38 realignment, and full realignment. The distinction between these three behaviors results from the relative strengths of the contagion transmission and the sync coupling and their ability to overcome a ?barrier? introduced by the acceptance function. 3.2 Model and methods In this section, we first present an overview of the two models that we integrate to form our sync-contagion model: the Kuromoto model for synchronization and the SIR model for information cascades. We then describe our unified model that combines the two with bidirectional feedback. Finally, we define our initial conditions and discuss some considerations regarding parameter choices. 3.2.1 Sync model The Kuramoto model describes the phenomenon of synchronization in a system of coupled oscillators [58, 59] as a function of the coupling strength. While normally a system of differential equations, the Kuramoto model is easily adapted to a discrete-time map model [62,63,68] that exhibits similar behavior. The discrete version version we use as a foundation most closely resembled the ?sync map? presented in [63]. For our sync model, we consider a system of N coupled oscillators. At each time step t, oscillator k updates its orientation ?kt according to ?N? ?k kt+1 = ?t + ? k + sin(?jt t ? ?kt ), (3.1)N j=1 where ? is the coupling strength between oscillators and ?kt is i.i.d. random noise. After 39 1.75 = 0.6, = 0.2 = 0.6, = 0.4 1.50 = 0.6, = 0.8 incoherent 1.25 1.00 0.75 0.50 coherent 0.25 0.00 3 2 1 0 1 2 3 Figure 3.1: Left: The critical boundary for the sync model (3.1) with Gaussian noise ?kt ? N (0, ?2). Values of (?, ?) above the blue curve lead to an incoherent system state at equilibrium (?? ? 0). Colored dots mark values of (?, ?) used in the example steady- state distributions shown in the right figure, with colors of the dots matching colors of the distributions. Right: Example steady-state distributions for the sync model (3.1) with Gaussian noise ?kt ? N (0, ?2) for different values of ? and fixed ?, generated from a simulation of N = 105 oscillators. Note that each distribution has been shifted to have mean 0. As ? decreases, the steady-state distribution changes from a uniform distribution (the incoherent state) to a symmetric, unimodal distribution (the coherent state), with the width of the distribution decreasing as ? continues to shrink. each step, all orientations are taken mod 2? and remapped to the interval [??, ?). For our purposes, the noise can be considered the error in each individual?s ability to achieve their desired orientation. The macroscopic dynamics of the sync model can be captured by the order param- eter N 1 ? rt = ? k t exp(i?t) = exp(i? ), (3.2) N t k=1 which can be broken down into the coherence, ?t = |rt|, and the mean orientation, ?t = Arg(rt). As shown in [63], in the large N limit for a specified noise distribution, there is a critical value of ? = ?c such that a uniform distribution of oscillator orientations is stable (i.e., ?t ? 0) for ? < ?c and unstable for ? > ?c. In particular, for Gaussian noise ?kt ? N (0, ?2), the critical value is given by ?c = 2(exp(?2/2)?1), shown in Figure 3.1. 40 If we restrict ourselves to ? ? [0, 1] and finite N , we end up with two classes of behavior, demonstrated in Figure 3.1: 1. For 0 ? ? < ?c, the stable distribution is approximately uniform on [??, ?) with ?? ? 0. We will call such systems incoherent. 2. For ?c < ? ? 1, the stable distribution is unimodal and symmetric about ??. As ? increases, the width of the distribution decreases, and therefore ?? increases. We will call these systems coherent. Note that it is possible to use a larger range of ?. We choose a lower bound of ? ? 0 so that the system will ?naturally? synchronize in the absence of noise, except in the exact case of ? = 0, where no dynamics occur. For the upper bound, values of ? ? 2 can potentially lead to formation of multiple clusters in the steady-state [62]. We are only interested in the single-cluster case, and find that a maximum of ? ? 1 is sufficient to illustrate the behaviors of our combined model. 3.2.2 Contagion model Compartmental models have been the fundamental model for infectious disease spread [64, 65] and other forms of information propagation through a population [exam- ples], with the SIR model being likely the most well-known. In these models, individuals exist in one of several ?compartments? (or groups) at each point in time, and transition between the groups according to specified rules. Our contagion model is based on the discrete-time SIR model [65], described in this section. 41 We consider the case of an infection spreading among a community of N individu- als. At any point in time, an individual will belong to exactly one of three groups: 1. Susceptible (S) individuals have never been infected, but are capable of becoming infected. 2. Infected (I) individuals have the infection and are able to spread it to susceptible individuals. 3. Removed/Recovered (R) individuals were previously infected and cannot later be- come infected. We begin with everyone in S at time t = 0, except for a few initial infected indi- viduals, and as time progresses individuals can transition between groups according to two rules. First, a susceptible individual can become infected upon being exposed to an infected individual. If we assume a fully-mixed population, where the transmission coef- ficient ? is defined as the contact rate ? the transmission probability. For a fully mixed system, a susceptible individual at time t becomes infected with probability ?|It|/N , where |It| denotes the size of the infected population at time t. Second, any infected individual at time t becomes removed (or recovers) with probability ?. This process is summarized in Figure 3.2. The mean behavior of the contagion model can be described by the system of equations St+1 = St(1? ?It), I (3.3)t+1 = It(1 + ?St ? ?), Rt+1 = Rt + ?It, 42 ?I ? S I R Figure 3.2: Transition diagram for the contagion model. where St = |St|/N , It = |It|/N , and Rt = |Rt|/N . The dynamics of the contagion model are fully determined by the transmission coefficient ?, the recovery rate ?, and the initial infection probability pI0. In the case of I0 ? 0, we have the restriction that ?/? > 1 is required for the initial infection to grow. Examples of the contagion dynamics over time for different ? and ? are shown in Figure 3.3. 3.2.3 Combining the models: the sync-contagion model To combine our sync and contagion models, we again consider a system of N indi- viduals, each with an orientation ?kt , but also belonging to one of S, I, or R at each time t. We imagine that a few individuals in the population are exposed to special information and become ?infected? by setting their orientation to the target orientation reflected by this special information, with all other individuals starting in the susceptible state. For schools of fish, the target orientation might correspond to the direction of escape from a predator. Susceptible individuals become infected based on interacting with infected individuals as well as the distance between their orientation and the target orientation. If a susceptible individual?s orientation differs too greatly from the the target orientation, then the probability of transmission is reduced. 43 Low transmission, moderate recovery Moderate transmission, moderate recovery 1.0 = 0.005, = 0.01 1.0 = 0.05, = 0.01 | t|/N | t|/N | |/N | |/N 0.8 t 0.8 t| t|/N | t|/N 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0 200 400 600 800 1000 0 200 400 600 800 1000 t t High transmission, moderate recovery Moderate transmission, low recovery 1.0 = 0.5, = 0.01 1.0 = 0.05, = 0.001 | t|/N | t|/N | t|/N | t|/N0.8 | t|/N 0.8 | t|/N 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0 200 400 600 800 1000 0 200 400 600 800 1000 t t Figure 3.3: Example outcomes of the contagion model for different choices of trans- mission coefficient ? and recovery rate ?. In general, larger values of ? lead to greater exponential growth of the infected population (and decay of the susceptible population), while larger values of ? lead to faster exponential decay of the infected population (and growth of the removed population). In the case where ?/? < 1 (top left), the infection immediately dies out. 44 Specifically, we modify Eq. (3.1) to depend on an individual?s infection status as ????? ?k k ? N??t + ?t + j=1 sin(? j t ? ?kt ), if k ? SN t+1 ?Rt+1, ?kt+1 = ??? (3.4)?It + ?kt , if k ? It+1, with [ ] 1 ? ?I kt = Arg |I | exp(i?t ) . (3.5)t k?It In other words, individuals that are susceptible or removed follow the normal sync model rules, while individuals that are infected align to the mean orientation of the infected subpopulation, calculated from the previous time step. The bidirectional feedback is achieved by making the transition probabilities be- tween the contagion groups to depend on an individual?s orientation. Specifically, upon exposing a susceptible individual to the infection, there is a probability ?(?) that the infection takes hold and a 1 ? ?(?) probability that the individual instead becomes im- mediately removed. The acceptance function ?(?) depends on the distance between the individual?s orientation ? and the target orientation ? , with infection success increasing as this distance decreases. The new transition diagram for the sync-contagion model is then given in Figure 3.4. The resulting mean behavior is summarized by St+1 = St(1? ?It), It+1 = It(1 + ?St??(?kt )?k ? ?), (3.6) R kt+1 = Rt + It(?St(1? ??(?t )?k) + ?). 45 ?I?(?) ? S I R ?I(1? ?(?)) Figure 3.4: Transition diagram for the sync-contagion model. We have updated the tran- sition diagram from Figure 3.2 with the effect of the acceptance function ?(?). 1.0 ( ) 0.8 0.6 0.4 0.2 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 | | Figure 3.5: Plot of the acceptance function ?(?) used in our simulations. Individuals with orientations near the target orientation ? are more likely to take on the infection when exposed. We choose ?(?) to be defined as ( | )( ? ? ? | ? ?)2 ?(?) = 1? exp ? , (3.7) 2(?/6)2 as shown in Figure 3.5. We note that the specific choice of ?(?) is not too important, but Eq. (3.7) has key features of ?(?) strictly increases as |?? ? | decreases, ?(? ) ? 1, and ?(? +?) = 0. Other choices of ?(?) with these features will produce similar qualitative results. We define t = 0 as the time when the first infection is introduced into the sys- tem. Before this point, we allow the orientations to equilibrate according to the stable 46 distribution determined by Eq. (3.1), and then redefine our coordinates so that the mean orientation at time t = 0 is set to ?0 = ?. We introduce the infection at t = 0 by setting each individual as infected with probability pI0, leading to I0 ? pI0. If an individual k is infected at t = 0 (i.e., k ? I0), then we immediately reorient ?k0 = ? + ? k 0 . For our analysis, we choose ? = 0, causing the initial susceptible and infected populations to be as far apart as possible and presenting a ?worst-case? scenario for the system. 3.2.4 Applications Our sync-contagion model can be used to represent systems with two pieces of information to spread: awareness of an external stimulus (the ?infection?) and the initial response orientation, ? , for the small set of individuals infected at t = 0. Infected individuals are the primary drivers in realignment of the system towards ? . Note that only the original infected population ?knows? to redirect towards ? , but this knowledge is preserved within the infected population since Eq. (3.4) leads to ?It ? ? as long as |It| > 0. Those who are uninfected may align towards ? either by becoming infected themselves, or by the base synchronization dynamics determined by Eq. (3.1). Applying the model for escape in the presence of the threat, similar to Ref. [7], each individual moves in some direction ?. Knowledge of the threat is propagated as a startle, which eventually dies down, while the desired escape direction is ? . Individuals who are moving in a direction too different from ? do not react their startled neighbors, either due to not seeing others moving in the opposite direction or because the change in 47 direction would be too large to be feasible. This non-reaction to the startle is captured through the acceptance function ?(?). As a model of opinion dynamics, the orientation may represent an individual?s stance on a particular topic. The infection then represents interest in some current ?hot topic? that might redirect an individual towards a new stance ? . Individuals who take interest become promoters of the stance ? and try to encourage others to become pro- moters. Eventually, interest in the hot topic dissipates. Individuals whose stances are too distinct would refuse to become promoters upon learning of the hot topic, as captured by ?(?). Take, for instance, ? to be one?s opinion on social injustice and the existence of systemic racism. The killings of George Floyd, Breonna Taylor, Ahmaud Arbery, and countless others triggered a national dialogue on the topic, with many taking part in the Black Lives Matter protests. The protesters can be seen as promoters of the stance that systemic racism is real and is an issue that needs to be addressed. Many individuals were convinced by the movement and joined the side of the promoters, further spreading the discussion. However, many individuals were not swayed, due to their political stance deviating too greatly from the intended position ? of the promoters, an effect captured by ?(?) in our model. 3.2.5 Parameter restrictions For our combined sync-contagion system, we have five parameters: coupling strength ?, noise distribution width ?, transmission coefficient ?, recovery rate ?, and initial infec- 48 tion probability pI0. We can reduce this to three by first observing that realignment of ?t is primarily driven by the infected population. The susceptible and removed populations can only passively realign according to the sync model (3.1), with more weight given towards ? when |I | is larger. The two parameters that directly affect |I | are pI t t 0, for which larger values give the infected population a greater head start, and ?, for which smaller values allow the size of the infected population to be sustained for longer. Since the effects of these two parameters are fairly straightforward, we will set them to reasonable values of pI0 = 0.01 and ? = 0.01. The effects of ?, ?, and ? are less obvious. However, we can note that all three play a role in the requirement for the growth of the initial infection. For |I0| ? 0, the initial infection will grow on average only when ??0/? > 1, where N 1 ? ? k0 = ?(? ). (3.8) N 0 k=1 Recall that the distribution of ?k0 depends on the stable distribution for Eq. (3.1), which in turn depends on ? and ?. 3.2.6 Measures We need to evaluate the emergent dynamics of our model in two ways. First, we need to determine the success of the sync model in realigning the system towards the target direction ? . Second, we need to determine how well infection information is spread through the contagion model. We describe our means of measuring success in this section. 49 3.2.6.1 Coherence, mean orientation, and realignment Just as with the original sync model, we can consider the order parameter rt given in Eq. (3.2), which gives the coherence of the orientation distribution ?t = |rt| and the mean orientation ?t = Arg(rt). Furthermore, we can consider the coherence and mean orientation of the subpopulations S, I, and R by averaging over only members of those groups, as was done for ?It in Eq. (3.5). We consider realignment to be successful if ?? becomes reasonably close to ? . Since the infection will always die out and the system will always return to the basic sync model (3.1), permanent realignment is only possible in coherent systems. For incoherent systems, only temporary (re)alignment is possible, since the system will naturally return to the totally incoherent state. In this case, the order parameter ?t is more useful, since it will generally increase only when a cluster forms around ? , with strongest alignment likely occurring when ?t is maximized. 3.2.6.2 Maximum infection fraction We can also consider the maximum infected fraction Imax = max It. (3.9) t The value of Imax is especially meaningful in the context of opinion dynamics, where the size of the infected population represents the number of concurrent promoters of the stance ? . Even if realignment did not occur, maximizing the ?visibility? of a hot topic is 50 still desirable. Furthermore, for incoherent systems, if we assume the sync model dynamics are fast enough such that uninfected individuals are always uniformly distributed, then in the large N limit, ?t is given by ?t = It? I t . (3.10) Note that for all t and large N that ?kt ? N (0, ?2) for k ? It, meaning ?It ? ?I0 is con- stant, and ?t is then maximized precisely when It is maximized. Thus, Imax will directly give the alignment success for an incoherent system, removing the need to consider ?max at all. We find that this assumption requires ? to be large enough for the choice of ?, since increased noise leads to faster convergence to the incoherent state. In cases where this approximation is invalid, however, ?max still occurs approximately simultaneously with Imax in incoherent systems. 3.2.6.3 Maximum cosine mean While the final mean orientation ?? works well to evaluate coherent systems, and ?max (or Imax) works well for incoherent systems, it is desirable to have a measure to compare coherent to incoherent systems. One measure that works in both cases is the maximum cosine mean N 1 ? Cmax = max cos(? k t ) = max ?t cos(?t). (3.11) t N t k=1 51 The value Cmax is larger when ?t is close to our target orientation ? = 0, but penalized for wider distributions of orientations (?t small). In incoherent systems, symmetry causes ?t = ? as long as |It| > 0, so that Cmax = ?max. Meanwhile for coherent systems, Cmax will depend more on the final orientation ??. Thus, Cmax gives a way to compare the primary measure of coherent systems (??) to the measure for incoherent systems (?max). 3.3 Results In this section, we present the key results of the sync-contagion model. We begin by observing the general behavior of the system in the incoherent and coherent cases, and then consider the effects of modifying our three remaining parameters: noise distribution width ?, coupling strength ?, and transmission coefficient ?. 3.3.1 General behavior To begin our investigation, we take a look at a sample of the sync-contagion dy- namics applied to both an incoherent and a coherent system. As described in Section 3.2.6.2, if we assume that the orientations of uninfected individuals are always uniformly distributed in an incoherent system, then the coherence ?t will be directly proportional to |It|. In the incoherent case shown in Figure 3.6, this assumption does not exactly hold, since ?S?Rt becomes noticeably greater than zero; how- ever, comparing the graphs of ?It and |I|/N still shows a close relationship regardless. As long as the infection exists, the mean orientation remains near ? , and ?t becomes more 52 positive with the larger infection. However, once the infection dies out, the system returns to the incoherent state, leading to the mean orientation becoming a random value and ?t approximately zero. For coherent systems, the rise of the infection actually causes a decrease in coher- ence, shown in Figure 3.6, as individuals are split between one distribution centered on the mean orientation ?It of the infected population and another distribution centered on the mean orientation ?S?Rt of uninfected population. Over time, ? S?R t is pulled towards ?It , leading to permanent realignment of the system, ideally keeping ?? near ? after the infection ends. In both cases, since ?It is essentially constant, the change in ?t follows the change in ?S?Rt . Note that as the infection dies out, the coherence of the infected subpopulation becomes more noisy, as fewer members remain. 3.3.2 Modifying the width ? of the noise distribution We determined the effects of the recovery rate ? and initial infection probability pI0 in Section 3.2.5. Of our three remaining parameters, the standard deviation ? of the noise term ?kt has the most straightforward effect. First, recall that for fixed ?, the value of ? will establish whether the system is coherent or incoherent. Beyond this effect, the value of ? that best propagates the infection and realignment actually depends on which of these two synchronization tendencies the system has. By comparison, ideal values ? and pI0 are independent of synchronization tendency. For incoherent systems, the?goal? of the infection is to maximize ?t and Imax. If we 53 1.0 = 0.1, = /6, = 0.1, = 0.01 1.0 = 0.5, = /6, = 0.1, = 0.01 | t|/N | t|/N | t|/N | t|/N0.8 | |/N 0.8t | t|/N 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0 200 400 600 800 1000 0 200 400 600 800 1000 t t = 0.1, = /6, = 0.1, = 0.01 = 0.5, = /6, = 0.1, = 0.01 3 3 2 2 1 1 0 0 1 1 2 2 3 3 0 200 400 600 800 1000 0 200 400 600 800 1000 t t 1.0 = 0.1, = /6, = 0.1, = 0.01 1.0 = 0.5, = /6, = 0.1, = 0.01 0.8 0.8 0.6 t 0.6 t 0.4 t 0.4 t 0.2 0.2 t t 0.0 0.0 0 200 400 600 800 1000 0 200 400 600 800 1000 t t Figure 3.6: Example behavior of an incoherent (left) and coherent (right) system. Simu- lations used a population of N = 104 oscillators. Top row: Fractions of the population in each category (susceptible, infected, and removed) over time. Middle row: Distribu- tion of orientations over time. The red center line indicates ? = 0. The mean angle ?t is marked by a yellow dot. Bottom row: The coherence of the full population ?t, the infected subpopulation ?It , and the uninfected subpopulation ? S?R t . 54 again assume that uninfected individuals in an incoherent system always have uniformly distributed orientations, then ??(?kt )?k = ?0 for all t. The average probability of a sus- ceptible becoming infected is then ??0|It|/N , where ?0 is the mean of ?(?) over the uniform distribution on [??, ?). Hence, the contagion dynamics, particularly Imax, are independent of ?. We do note, however, that ? |It| |It|?t ?I0 = exp(??2/2), (3.12)N N which is dependent on ? and decreases as ? increases. We can conclude that lower val- ues of ? are therefore beneficial to incoherent systems, as it allows the distribution of orientations of infected individuals to be tighter around ? . By comparison, coherent systems tend to form one cluster of infected individuals oriented around ?It near ? = 0 and one cluster of uninfected individuals around ? S?R t , initially near ?, the initial mean orientation. The infection needs sufficient susceptible individuals to be near ? for the infection to continue, due to ?(?). If too many indi- viduals are far from ? , then the infection will end and soon after realignment will end. Smaller values of ? are therefore detrimental to coherent systems by causing a tighter initial distribution of susceptible individuals around ? = ?, making it harder to sustain the infection. Seen another way, larger ?, and therefore a wider distribution of susceptible indi- viduals? orientations, will speed up the rate of realignment in coherent systems, as shown in Figure 3.7. This wider distribution allows ?(?) to capture more susceptible individuals, 55 = 0.5, = /36, = 0.1, = 0.01 = 0.5, = /12, = 0.1, = 0.01 3 3 2 2 1 1 0 0 1 1 2 2 3 3 0 200 400 600 800 1000 0 200 400 600 800 1000 t t = 0.5, = /6, = 0.1, = 0.01 3 2 1 0 1 2 3 0 200 400 600 800 1000 t Figure 3.7: Distributions of orientations over time for coherent systems with different choices of noise distribution width ? and ? = 0.5, ? = 0.1, and ? = 0.01. The mean angle ?t is marked by yellow dots, and the central red line indicates ? = 0. As the width of the noise distribution increases, the uninfected population realigns towards ? at a faster rate, at the cost of a less tight distribution. fueling the infection, and consequently creating more weight towards ?It . 3.3.3 Modifying the coupling strength ? We next consider the effect of the coupling strength ?. For fixed ?, the value of ? determines whether the system is coherent or incoherent. For incoherent systems, the coupling strength ? is largely unimportant, other than the requirement ? < ?c. The dynamics are driven primarily by ?, as just discussed, and ?, as we will see later. In coherent systems, ? serves two roles. First, similar to the effect of small ?, larger 56 ? in coherent systems will lead to a tighter distribution of uninfected orientations around ?S?Rt , thus potentially preventing a spread of infection due to ?(?). However, stronger coupling will also lead to a faster realignment rate of ?S?Rt towards ? I t . The balance of these two features is key in successful realignment. We demonstrate the effect of adjusting both ? and ? in Figure 3.8, and show a slice for constant ? in Figure 3.9. We can see that, indeed, incoherent systems behave independently of ? (for ? < ?c, while coherent systems exhibit more complex behavior. For ? slightly above the critical value, where the stable distribution for Eq. (3.1) is quite wide, the infection is easily sustained and ?t realigns towards ? . As ? increases, the stable distribution tightens up, making infection harder to sustain. In order for the realignment speed to move the mean orientation towards ? within an acceptable range for ?(?) before the infection dies out, ? must be increased further. Note that once ?S?Rt is within a certain proximity of ? that ?(?) ? 1 for all individuals, essentially guaranteeing successful realignment. As previously discussed, larger ? benefits realignment, to the point where ? does not matter. The regions for realignment success also match up generally well with the higher values of Imax and Cmax. In the case of large ?, where realignment is guaranteed for coherent systems, the values of Imax are generally higher, but do still exhibit the ?dip? for intermediate values of ?. For Cmax in high ? coherent systems, larger ? is more beneficial. When realignment is guaranteed, the value of ?t is more influential on Cmax, which is higher for larger ?. 57 1.0 / ( = /18, = 0.01) 1.0 1.0 / ( = /9, = 0.01) 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 Imax ( = /18, = 0.01) 1.0 1.0 Imax ( = /9, = 0.01) 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 Cmax ( = /18, = 0.01) 1.00 1.0 Cmax ( = /9, = 0.01) 1.00 0.75 0.75 0.8 0.8 0.50 0.50 0.6 0.25 0.6 0.25 0.00 0.00 0.4 0.25 0.4 0.25 0.50 0.50 0.2 0.2 0.75 0.75 0.0 1.00 0.0 1.00 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 3.8: Heatmaps of our measures in ??-space for low noise (? = ?/18, left) and moderate noise (? = ?/9, right). Results are averaged over five realizations each with a system ofN = 1000 oscillators. The vertical dashed red line indicates the critical value of ?c for each value of ?. Incoherent systems exist to the left of the red line, while coherent systems exist to the right. Top row: Absolute value of the final mean angle ?? scaled by ?. For incoherent systems, the final mean angle is a random value, leading to a mean near 0.5. Middle row: The maximum size of the infected population Imax as a fraction of the total population. Bottom row: The maximum mean cosine Cmax, which takes into account both the mean orientation ?t and the coherence ?t of the distribution. 58 1.0 = 0.3, = /18, = 0.01 0.8 0.6 0.4 0.2 0.0 / Imax 0.2 Cmax 0.0 0.2 0.4 0.6 0.8 1.0 Figure 3.9: Our measures for fixed values of ? = 0.3, ? = ?/18, and ? = 0.01 as ? varies. The vertical dashed red line indicates the critical value of ?c for ? = ?/18. For coherent systems (? > ?c), intermediate values of ? lead to a drop in performance according to all three measures. Within this region, the distribution is not wide enough (? too high) and does not realign fast enough (? too low) to be within an appropriate region for the infection to continue, as required by the acceptance function ?(?). 3.3.4 Modifying the transmission coefficient ? Finally, we study the effects of adjusting the transmission coefficient ?. We again see that the contribution of ?(?) on the dynamics is key to understanding the effect of ?. Recall that temporary alignment in incoherent systems is stronger when the infected population is larger. To grow the infection as large as possible, we simply need to grow it as fast as possible. If we again assume that orientations of uninfected individuals are uniformly distributed, then we can ignore the effects of ?(?) and increase the infection rate by increasing ?. This effect is clearly shown in Figure 3.10, where larger values of ? directly lead to an increase in Imax and Cmax. The dynamics when considering the effect of ? in coherent systems is similar to our considerations for ?. By increasing ?, we increase the transmission coefficient be- tween infected and susceptible individuals. To sustain the infection, we require ? to be 59 1.0 = 0.02, = /9, = 0.01 | |/ I 0.8 maxCmax 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 3.10: Our three measures for different varying values of ? on incoherent systems with ? = 0.02, ? = 0.01, and ? = ?/9. Results are simulated on a system of N = 1000 oscillators and averaged over 100 realizations. The large grey box marks the region where ??o/? < 1, where the initial infection does not spread. The final mean orientation in an incoherent system is a random value, which leads to |??|/? averaging out to 0.5 for all values of ?. The maximum size of the infected population and the maximum mean cosine measure both increase as the transmission coefficient increases. sufficiently large. However, because of ?(?), if the distribution of susceptible individuals is too far from ? during an exposure, then each exposure will likely lead to a transition from susceptible to removed instead. Consequently, it is necessary for the transmission coefficient to be high enough to prevent the infection from dying out (due to recoveries), but low enough to give time for partial realignment of the susceptible orientations to get in proper range of ? for ?(?). We can see a comparison of a successful and failed realign- ment in Figure 3.11, where the failure is due to increasing the transmission coefficient. Increasing ? speeds up the sync dynamics, while increasing ? speeds up the con- tagion dynamics. As seen in Figure 3.8, in order for the system to achieve realignment to the target direction, increasing ? requires increasing ? so as to preserve the careful balance. 60 = 0.4, = /18, = 0.2, = 0.01 = 0.4, = /18, = 0.9, = 0.01 3 3 2 2 1 1 0 0 1 1 2 2 3 3 0 200 400 600 800 1000 0 200 400 600 800 1000 t t Figure 3.11: Distributions of orientation over time for ? = 0.2 (left) and ? = 0.9 (right) for two coherent systems with ? = 0.4, ? = ?/18 and ? = 0.01. The mean orientation ?t is indicated by yellow dots, and the horizontal red line indicates ? = 0. When ? is made too large, the exposures occur too quickly, leading to most exposures turning into instant removals, due to ?(?). The transmission coefficient must be slow enough to allow for partial realignment of the uninfected population before the majority of exposures occur, while still large enough to sustain the infection at low levels. 3.3.5 Behavior classification In total, we can classify the behavior of the sync-contagion model into five cate- gories: (a) infectionless, where ??0/? < 1, leading to no initial infection; (b) temporary realignment, which includes all incoherent systems for which the initial infection spreads, causing ?t to temporarily align with ? before the system returns to an incoherent state; (c) full permanent realignment, where ?? is close to ? ; (d) partial permanent realign- ment, where ?t shifts away from its initial orientation, but ?? is not close to ? ; and (e) limited/no realignment, where ?t does not move far, if at all. Permanent realignment only occur in coherent systems. In Figure 3.12, we indicate example regions in ??-space corresponding to these five classifications. 61 1.0 classifications ( = /18, = 0.01) 1.0 classifications ( = /9, = 0.01) 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 3.12: Classifications of sync-contagion model behaviors, experimentally derived from the results shown in Figure 3.8. The classification regions are as follows: infec- tionless (black), temporary realignment (red), full permanent realignment (green), partial permanent realignment (yellow), and limited/no realignment (grey). Partial permanent re- alignment is based on ?? being within ?/6 of the target orientation. Similarly, limited/no realignment is based on ?? being further than 5?/6 from the target orientation. 3.4 Discussion We have introduced a simple model for information propagation that combines the sync dynamics of a Kuramoto-like model with the cascading dynamics of a contagion model. Distinct from other models, we have introduced a bidirectional feedback between the two systems, where the sync dynamics affect the contagion dynamics, which in turn affect the sync dynamics. Summarizing our findings, we first note the effects of the sync model parameters, coupling strength ? and noise distribution width ?, on the contagion dynamics. For fixed ?, the choice of ? relative to a critical value ?c will determine whether the sync model exhibits coherent or incoherent behavior. These two behaviors carry over into the sync- contagion model, most notably in whether realignment will be permanent or temporary. Furthermore, the choice of ? has opposite effects in coherent versus incoherent 62 systems, with regards to realignment strength. In incoherent systems, smaller values of ? allow for a tighter distribution around the target orientation ? during the infection period, and increase the strength of the realignment. By contrast, the tighter distribution intro- duced by lowering ? in coherent systems prevents susceptible individuals from accepting early infections, due to the acceptance function ?(?), resulting in too many immediate transitions from susceptible to removed before sufficient realignment can occur. We see a similar effect in varying the coupling strength ? for coherent systems. Larger values of ? lead to to a tighter distribution of susceptible individuals? orientations, preventing early infections from being accepted. However, ? also controls the speed at which realignment occurs, meaning large enough ? can potentially overcome this obsta- cle. In incoherent systems, ? has no notable effect. We also observed the effects of the contagion model parameters on the sync model. The infected population drives realignment of the system, so whichever choices of param- eters lead to a large, sustainable infection are preferred for realignment. It is straightfor- ward that low recovery rate ? and high initial infection probability pI0 are always beneficial to realignment strength, due to the direct effect these have on the size of the infected pop- ulation. For the remaining parameter, transmission coefficient ?, the effect also depends on incoherent versus coherent behavior. For an incoherent system, a faster transmission coefficient is generally preferred, since the susceptible population will usually have in- dividuals in range of ? to accept the infection. Spreading the infection faster will then create a stronger temporary realignment. In coherent systems, the effect of ? again relates to the action of the acceptance function. Early infections are typically not accepted, since the initial susceptible distribu- 63 tion is far from ? . Increasing the transmission coefficient too far will then cause many of the early exposures to lead to immediate removal. The transmission coefficient must therefore be large enough to sustain the infection, countering the recovery rate, but small enough so that the system can realign closer to ? for infections to be accepted. Finally, we compare our findings to those of a similar model presented by Chicoli and Paley [7]. While the two models differ in a number of ways, with ours introducing several generalizations that make the two difficult to compare, a key result of the Chicoli and Paley model is that the uniform distribution of an incoherent system can be beneficial to realignment due to a larger number of individuals already near the target orientation. We come to a similar conclusion, where the widespread distribution in an incoherent system is more reliable in propagating an infection and achieving a strong temporary realignment. One key distinction, however, is that the Chicoli and Paley model only represents a coherent versus incoherent system through the initial conditions, and chooses a cou- pling strength that causes coherent dynamics in both cases. By relaxing this condition, so that the system returns to its natural coherent or incoherent state once the infection ends, we find that coherent systems have the potential to significantly outperform inco- herent systems with regards to realignment success. In particular, incoherent systems can only achieve temporary realignment, since they return to an incoherent state once the in- fection passes. Within the context of swarm escape from a predator, though, temporary realignment towards the target orientation is likely sufficient. One key assumption that we wish to relax in the future is the fully-mixed condition. Instead, we can introduce a network structure, where information of the infection and 64 orientations can only be propagated between connected neighbors. Alternatively, we can introduce planar spatial dynamics in a Viscek-like model [46]. In this case, the network structure is dynamically changing, as individuals move and can only ?see? other individ- uals within a certain proximity. Combining a Vicsek-like model with a Kuramoto-like model has shown to result in clustering of individuals based on orientation [47, 48]. 65 Chapter 4: A reservoir computing approach to forecasting COVID-19 data 4.1 Introduction The COVID-19 pandemic has undoubtedly become the largest world health emer- gency of the modern age. Since its onset, researchers around the world have focused on developing better methods for simulation and prediction of COVID-19 data. The most successful methods have proven to be model predictors, based on the traditional SIR epi- demic model [64], but augmented to consider numerous additional subgroups [69, 70] or given a network structure to better represent social interaction [71]. Predicting future values of COVID-19 case numbers falls into the larger task of time series forecasting. While modeling methods are suitable for specific forecasting tasks, general model-free machine learning methods have become increasingly popular. Strate- gies have involved using support vector machines [72], deep neural networks [73, 74], recurrent neural networks [75], and auto-regressive moving average models [76?78]. For the context of COVID-19 case forecasting, techniques have ranged from simple regressive models [79,80] to more sophisticated methods, such as auto-regressive integrated moving average models [81], modified autoencoders [82], and long short-term memory recurrent 66 neural networks [83, 84]. Our focus is on the method of reservoir computing [85], which has gain popularity in recent years for the task of time series forecasting. Reservoir computers (RCs) have shown to accurately forecast chaotic systems [86,87] and have the ability to learn system features such as Lyapunov exponents [88] and other statistical properties, even in non- stationary systems [89], when trained solely on data. ?Hybrid? reservoir computing archi- tectures, augmented with an accompanying knowledge-based model, have been shown to vastly outperform either the RC and knowledge-based model used in isolation [90]. When applying RCs to time series forecasting with real data, two main issues arise. First, the data are unreliable in accurately representing the true state of the system. Re- porting frequency leads to weekly oscillations in the reported COVID-19 case data. In most extreme cases, where reporting occurs once every few days, the data indicate sev- eral days of no cases followed by a sudden spike on a single day, immediately followed once again by no cases. We introduce two transformations to the data to combat this irregularity and increase predictability. The second issue in applying machine learning to data-based time series prediction is a lack of quantity of data. Machine learning success typically depends on having ac- cess to a larger amount of training data. We perform our analysis using a full year of case data, but even 365 data points is small compared to typical machine learning algorithms. In addition, we should not assume that all these data are available at a time, since such an assumption would be impractical in forecasting the early stages of a future pandemic, where only a few days or weeks of data are available. We attempt to resolve this issue by considering the library-augmented reservoir computing (LARC) [91] approach, in which 67 the training data is supplemented by a ?library? of similar example time series predic- tors. Finally, we discuss other approaches to supplement the limited amount of available data, such as by considering past pandemics, increasing the number of input variables, or pairing the data-based RC with a model approach. 4.2 Methods 4.2.1 Data preprocessing Successful prediction through a data-based machine learning approach requires re- liable data to train the system. We address several issues present in preprocessing of the COVID-19 data set in Appendix B, and focus on our resolution within this section. We retrieved our data from the Delphi COVIDcast Epidata API [92], which pro- vides several U.S. COVID-19 statistics over time, including case reports, deaths, hospi- talization, and bar visits, among others. While the COVIDcast data set provides several useful statistics, we focus solely on the daily number of new cases. The data are presented at different levels of geographical resolution, as small as county-level in most regions, but we will focus on state-level data and the United States as a whole. Preprocessing the data comes in two steps: cleaning the data and transforming the data. To clean the data, we remove any invalid values. Specifically, any day with a nega- tive number of cases reported, we set to zero cases instead. (For the prediction results we present, none of the days involve negative case counts.) For the second step of prepro- 68 cessing, we aim to use a transformation of the form x??(t) u(t) = , (4.1) x?(t) where x?(t) and x??(t) are estimates of the underlying continuous signal for the number of new cases x(t) on day t and its derivative, respectively. The transformation Eq. (4.1) is suitable to detrend the data, removing the non-stationarity and generating a new, approxi- mately constant, data series. Note that in the case of purely exponential growth, Eq. (4.1) would give the exponential growth rate of x(t). The connection between exponential growth and epidemic spread further indicates that Eq. (4.1) would be a natural fit. How- ever, Eq. (4.1) presents the possibility of division by zero, so we additionally increase all daily case numbers by one before applying the transformation, so that x?(t) = x(t) + 1. 4.2.1.1 Windowed averaging For our first transformation, we directly smooth to the data through an m-day win- dowed average. We define 1 xavg(t) = (x?(t) + x?(t? 1) + ? ? ?+ x?(t?m+ 1)) . (4.2) m We then take x? (t) = x (t ? 1) and x??avg avg avg(t) = xavg(t) ? xavg(t ? 1), giving a final transformation of ( ) xavg(t)? xavg(t? 1) 1 x?(t)? x?(t?m) uavg(t) = = m ? , (4.3)xavg(t 1) xavg(t? 1) 69 which is equivalent to the daily percent change in them-day average. The inverse process, to calculate x(t) given uavg(t) and {x(t? 1), . . . , x(t?m+ 1)} is straightforward. 4.2.1.2 Local linear fit As our second transformation method, we estimate x?(t) and x??(t) through linear fits over an (m+ 1)-day window. Specifically, we take x??(t) as the slope of the line fit to {x?(t), x?(t? 1), . . . , x?(t?m)}, given by ? ( )( )0 ?0 ?0 (m+ 1) k x?(t+ k)? k x?(t+ k) ? k=?m ? k=?(m k=?mx?loc(t) = ? ) . (4.4)0 0 2 (m+ 1) k2 ? k k=?m k=?m For the value of x?(t), we apply a linear fit to {x?(t? 1), . . . , x?(t?m)} and use the value of the resulting line at time t? 1, so that ?? ? ???? ? ? ? ? ?? ? ?0 0 0 0x?(t+ k ? 1) k2??? k?? k x?(t+ k ? 1)? k=?(m?1) k=?(m?1) k=?(m?1) k=?(m?1) x?loc(t) = ? ? ? ?2 ,0 0 m k2 ?? k? k=?(m?1)) k=?(m?1) (4.5) for m > 1. For m = 1, we take x?loc(t) = x?(t? 1). Together, we have the transformation x?? u (t) = loc (t) loc . (4.6) x?loc(t) 70 The inverse transformation to recover x(t) is again straightforward. We have intentionally chosen to use one less data point in determination of the denominator (x?loc(t)) to guarantee invertability. Note that we have defined m such that for m = 1, both transformations give the daily percent change x?(t)? x?(t? 1) u(t) = . (4.7) x?(t? 1) We present examples of both transformations in Figure 4.1 for the national-level data and Figure 4.2 for state-level data from New York an Kansas. As m increases, both transfor- mations reduce the magnitude of the oscillations by approximately the same amount. Of special note is the large spikes present in the Kansas data, which are removed by increas- ing the smoothing parameter m. The irregular reporting in the Kansas data that created these spikes would otherwise present an issue in its predictability. 4.2.2 Reservoir computer architecture Reservoir computing is a machine learning approach involving a large recurrent network (a ?reservoir?) that learns a mapping of a much smaller-scale system [85]. A main advantage of reservoir computing, compared with other machine learning techniques, is the simplicity in the training process. Specifically, the majority of the reservoir computer system parameters are set randomly, based on predetermined distributions, with only the map from the reservoir to the output value being changed through the training process. The details in the implementation of the reservoir computer can vary, particularly in the initialization of different features, so we focus only on our implementation. 71 1e5 U.S. new COVID-19 cases 3.0 2.5 2.0 1.5 1.0 0.5 0.0 50 100 150 200 250 300 350 days after March 1, 2020 (t) 1 U.S. transformed data u 0 avg (t) uloc(t) 1 50 100 150 200 250 300 350 1 0 1 50 100 150 200 250 300 350 1 0 1 50 100 150 200 250 300 350 1 0 1 50 100 150 200 250 300 350 1 0 1 50 100 150 200 250 300 350 1 0 1 50 100 150 200 250 300 350 1 0 1 50 100 150 200 250 300 350 days after March 1, 2020 (t) Figure 4.1: Top: New COVID-19 cases x(t) for the United States t days after March 1, 2020 shown from March 21, 2020 (t = 20) to February 28, 2021 (t = 364). Bottom: Transformations of the U.S. case data using the windowed average (blue) and the local linear fit (orange) methods for different choices ofm. Form = 1, the two transformations are identical, but even for larger m the transformations give similar results. Note that larger m leads to greater ?smoothing? of the data, causing increased suppression of the oscillations. 72 new cases (x(t)) m = 7 m = 6 m = 5 m = 4 m = 3 m = 2 m = 1 1e4 NY new COVID-19 cases 1e3 KS new COVID-19 cases 2.0 8 1.5 6 1.0 4 0.5 2 0.0 0 50 100 150 200 250 300 350 50 100 150 200 250 300 350 days after March 1, 2020 (t) days after March 1, 2020 (t) 1 NY transformed data 2 KS transformed data u (t) u 0 avg avg (t) uloc(t) 0 uloc(t) 1 2 50 100 150 200 250 300 350 50 100 150 200 250 300 350 1 2 0 0 1 2 50 100 150 200 250 300 350 50 100 150 200 250 300 350 1 2 0 0 1 2 50 100 150 200 250 300 350 50 100 150 200 250 300 350 1 2 0 0 1 2 50 100 150 200 250 300 350 50 100 150 200 250 300 350 1 2 0 0 1 2 50 100 150 200 250 300 350 50 100 150 200 250 300 350 1 2 0 0 1 2 50 100 150 200 250 300 350 50 100 150 200 250 300 350 1 2 0 0 1 2 50 100 150 200 250 300 350 50 100 150 200 250 300 350 days after March 1, 2020 (t) days after March 1, 2020 (t) Figure 4.2: Top: New COVID-19 cases x(t) for New York (left) and Kansas (right) t days after March 1, 2020 shown from March 21, 2020 (t = 20) to February 28, 2021 (t = 364). Bottom: Transformations of the New York and Kansas case data using the windowed av- erage (blue) and the local linear fit (orange) methods for different choices of m. Note that the two transformations are identical for m = 1. For larger values of m, the transforma- tions give similar results for well-behaved data, like New York and the United States data (Figure 4.1). For poor-quality data, such as for Kansas, the two transformations are more distinct. The windowed average approach can more effectively remove massive spikes, but is less effective at overall suppression of the resulting oscillations (see for example m = 6). 73 m = 7 m = 6 m = 5 m = 4 m = 3 m = 2 m = 1 new cases (x(t)) m = 7 m = 6 m = 5 m = 4 m = 3 m = 2 m = 1 new cases (x(t)) A reservoir computer consists of three layers: the input layer, the reservoir, and the output layer. The reservoir consists of a directed network of D nodes, where D is large, with adjacency matrix A. In our implementation, we generate the reservoir as a sparse Erdo?s-Re?nyi network with mean degree d. The non-zero values of the adjacency matrix are initially randomly drawn from a standard normal distribution, but then all values of A are uniformly scaled so that the spectral radius is ?. The state of the reservoir can be represented by the D-dimensional vector r(t), where ri(t) is the scalar state of node i at time t. An N -dimensional input vector u(t) is coupled to the reservoir through a D ? N matrix Win and the update equation [ ] r(t+ 1) = tanh Ar(t) + Winu(t) + b , (4.8) where tanh[?] is taken element-wise and b is a D-dimensional bias vector. (In the case of one-dimensional input, we will use u(t) and u(t) interchangeably.) The weights of Win are randomly generated from a uniform distribution over [??, ?], and the weights of b are taken from a uniform distribution over [??, ?]. The reservoir is then mapped to an output vector, which in the case of time series forecasting is the input vector at the next time step u(t + 1). The map from the reservoir to the output is given by u(t+ 1) = Woutr(t+ 1), (4.9) for some N ? D matrix Wout. After initialization, only Wout is altered, which occurs 74 Figure 4.3: The structure of a standard reservoir computer. During ?open loop? training, the switch on the left takes in the input u(t). The input is coupled to the reservoir through the input matrix Win and update Eq. (4.9). The output is then generated by multiplying the reservoir state r(t + 1) by the output matrix Wout. After training, the switch on the left is rotated to accept the output of the reservoir computer as the input for the next time step, creating an autonomous ?closed loop?. during the training phase. In the training phase, data are fed into the reservoir computer while it is kept in ?open loop? mode, in which the reservoir states are tracked, but the outputs are not looped back into the reservoir as input. After throwing away an initial transient, the reservoir states are used to fit Wout so that the output mapping Eq. (4.9) best approximates the true values of the training data, typically determined through a least squares fit. Once trained, the reservoir can then be switched to ?closed loop? prediction mode, in which the outputs of the reservoir are used as the subsequent inputs, creating a fully autonomous system. A full schematic of a reservoir computer is presented in Figure 4.3. 75 4.2.3 Recursive least squares To fit Wout to the training data, we will employ the recursive least squares method. In the recursive least squares method, Wout is chosen to minimize the cost function {? }k?1 min ?k?j ||Woutr(tj)? u(tj)|| , (4.10) Wout j=0 where t0 < t2 < ? ? ? < tk?1 and 0 < ? ? 1 is the ?forgetting factor?. Traditionally, recursive least squares is used for online training, where Wout is continuously updated after each time step as new data come in. For our purposes, though, we are interested in the exponential discounting by ?. For non-stationary data, such as the COVID-19 case data, the underlying hidden dynamics are constantly changing. We will assume, however, that the data are locally stationary, in which any underlying parameters are approximately constant and short- term predictions can accurately be based on recent data, but not necessarily data far into the past. We do not want to completely ignore the older data, especially since we are limited to very few available data points. The solution then is to exponentially-weight the data, as in the recursive least squares method, so that the most recent data points are more heavily considered in determining Wout without totally losing old values. 4.2.4 Baseline predictors For comparison to the reservoir computing approach, we will use a polynomial pre- dictor applied directly to the daily case data x(t). For small windows, the mean behavior 76 of the daily case data is approximately linear, suggesting a straight-line fit may be suffi- cient for short-term predictions of the mean behavior. We also consider a quadratic fit as a slightly higher-order alternative. 4.2.5 Library-augmented reservoir computing As an attempt to augment the limited data set, we will also consider the library- augmented reservoir computing (LARC) process [91]. As a ?metalearning? approach, LARC involves using information from a ?library? of trained reservoir computers to con- struct a new reservoir computer that will predict a time series signal not yet seen by the library. The library consists of identical reservoir computers, with the same reservoir topol- ogy and randomized system variables. However, each library member is trained on differ- ent training data, resulting in a different sequence of reservoir states r(t) and a different output mappings Wout. Let r0 denote the state of the reservoir at the end of training. The set of pairs {ri0,Wiout}, for each library member i, is used to train an autoencoder, which embeds these high-dimensional reservoir state and output matrix pairs in a low- dimensional latent space. The latent space embedding can then be ?decoded? to produce a close approximation of the original input. When presented with new time series data, instead of directly training a reservoir computer, we can search the latent space for a state e, which when decoded, gives the pair (W?out, r?0) that produce a reservoir computer that best predicts the given time series. Note that this reservoir computer is otherwise identical to those used by library members. All 77 else equal, the defining characteristics of a reservoir computer is the initial state r0 before prediction and the output mapping Wout. A schematic of the full LARC process is shown in Figure 4.4. This library approach is particularly useful when a new time series is too short to reliably train a reservoir computer directly, such as at the onset of a pandemic, where available data is limited, but where one may have access to past data with similar dynam- ics that can be used to construct a library (e.g. a past pandemic). 4.2.6 Predictor evaluation We consider two measures of success for our different predictors. The first measure, the cumulative error, gives the mean prediction accuracy, while the second measure, the normalized mean absolute error, tracks errors within individual daily predictions. 4.2.6.1 Cumulative error In the context of epidemic response, using the cumulative case count is a reasonable choice, since we primarily care about the total number of new cases that appear over a short period of time, rather than the specific number of cases that occur on each day. Therefore, we first consider the percent cumulative error, |?t0+T?1 ?x t0+T?1truth(t)? xpred(t)| errcumul = t=t0 ? t=t0 , (4.11)t0+T?1 t=t x0 truth(t) where t0 is the starting day of the prediction, T is the prediction length, xtruth(t) is the true number of cases on day t, and xpred(t) is the predicted number of cases on day t. We will 78 Figure 4.4: The two stages of the library-augmented reservoir computing process. In stage one (top), a library of trained reservoir computers are used to train an autoencoder to predict the pair (Wout, r0), consisting of an output mapping Wout generated from training the RC and an associated reservoir state r0 at the end of training. In stage two (bottom), a state from the embedded space of the autoencoder is decoded to generate a pair (W?out, r?0). This pair is used to construct a new reservoir computer, that is otherwise identical to those in the library. The sample reservoir computer generates a signal that is compared to a new (short) signal that we wish to predict. Using an optimization scheme, information is sent back to the decoder to pick a new pair (W?out, r?0) that better matches the new data until some stopping criterion is reached. 79 primarily focus on the cumulative error after one week T = 7, to match the perceived one-week period of oscillation in the daily case numbers. Using the cumulative case count allows for small errors in the specific daily pre- dictions to even out, particularly when they are alternating over/under-estimates from the weekly oscillations being off by one day. Furthermore, using cumulative cases allows us to compare to a linear predictor, which will be unable to closely capture the oscillations in the data, but can potentially match the mean behavior. 4.2.6.2 Normalized mean absolute error Although cumulative counts are preferable, we still desire a measure of the daily prediction accuracy. To this end, we use the mean absolute error between the true daily cases xtruth(t) and the predicted daily cases xpred(t), normalized by the mean of the true data measured over the same period, ? 1 t0+T??1t=t |xtruth(t)? xpred(t)|NMAE = T 0 . (4.12) 1 t0+T?1 T t=t x 0 truth(t) We again choose to use a one week window (T = 7) to match the one-week period in the daily case oscillation. 4.3 Results In the following section, we present our main results. Unless otherwise specified, all reservoir computers use the hyperparameters listed in Table 4.1. Note that the ?ini- tialization length? refers the number of points u(t) prior to the training data that are used 80 Hyperparameter Value reservoir size (D) 500 mean degree (d) 3 spectral radius (?) 0.7 input scaling distribution hyperparameter (?) 5.75 bias distribution hyperparameter (?) 0.1 forgetting factor (?) 0.96 initialization length 10 training length 21 Table 4.1: List of hyperparameter values used in training the reservoir computers, unless otherwise noted. to synchronize the reservoir state, but which are not used for fitting Wout. The ?training length? refers to the number of data points u(t) in the training data used to fit Wout. It should be noted that the hyperparameter choices were optimized ?by hand?, by manually adjusting each hyperparameter until no noticeable improvement could be gained. The results can potentially be further improved by an automated optimization scheme for the hyperparameters. 4.3.1 Recursive least squares and forgetting factor selection We begin by determining an appropriate forgetting factor ? to be used for reservoir computer training. For each starting value of t0 ? [50, 150], we construct a reservoir computer trained on the entire history {u(0), u(1), . . . , u(t0?1)}, with the first ten points used to initialize the reservoir, resulting in 101 sample predictions. Data are transformed using the daily percent change Eq. (4.7) to calculate each u(t). We repeat this task for different values of ?. Smaller values of ? correspond to faster forgetting rates, meaning older data is less influential for smaller ?, at the cost of requiring less accuracy overall in the fit of Wout. 81 0.25 0.20 0.24 0.18 0.23 0.22 0.16 0.21 0.14 0.20 0.19 0.12 0.90 0.92 0.94 0.96 0.98 1.00 0.90 0.92 0.94 0.96 0.98 1.00 forgetting factor ( ) forgetting factor ( ) Figure 4.5: Reservoir computer predictor results for different choices of forgetting factor ?. For each value of ? shown, we increment through all prediction start times t0 in [50, 150]. For each value of t0, a reservoir computer was trained on the entire history (i.e., all data from t = 0 to t = t0 ? 1), with data transformed according to the percent daily change Eq. (4.7). In Figure 4.5, we present the median values of the one-week cumulative error and one-week NMAE, as describe in Section 4.2.6, for 0.9 ? ? ? 1 in steps of 0.001. Note that ? = 1 corresponds to no forgetting, i.e., a standard linear regression. While the values of our measures fluctuate wildly, there is a noticeable dip in the NMAE near ? = 0.96, indicating that a small amount of forgetting is potentially beneficial. 4.3.2 Predictor results We now present the main results of our four prediction methods. To begin, we consider sample predictions at specific points in time using the U.S. case data, shown in Figure 4.6, and all fifty states plus the District of Columbia, shown in Figures 4.7 and 4.8. The main distinction with the reservoir computer approaches is the ability to learn the oscillations in the data, leading to perceived better accuracy in the daily predictions. Noticeably, the amplitude of oscillations in the predictions for the RC trained on the windowed average transformed data tends to grow in magnitude over time. Since we are 82 median one-week cumulative error median one-week NMAE mainly interested in short-term predictions (up to seven days), the growing oscillations are not a major concern. All four methods tend to follow the general mean behavior of the data. Even when considering the state data (Figures 4.7 and 4.8), we again see that the reservoir computer predictor is able to match the oscillations in the data quite effectively, which is an impossible task for a linear extrapolation. We do note that several states have failed reservoir computer predictions that ?blow up? almost immediately. It is not immediately clear what the cause of this blow-up is, but one possible explanation is a poor choice of RC hyperparameter values (Table 4.1). Ideally, the hyperparameters would be individually selected for each state to best match the data; however, doing so by hand for every state would be time consuming. The hyperparameter choices we use work well for the majority of the states, but the results would be aided by automated hyperparameter optimization. To validate these initial observations, we turn to the distributions of error presented in Figure 4.9 for predictions generated on the U.S. data set, Figure 4.10 for the New York data set, and Figure 4.11 for the Kansas data set. We generate these results by producing a prediction starting at each time t0 ? [50, 250], for a total of 201 sample predictions. We consider New York and Kansas as prime examples of the two extremes of data quality: the New York case data are similar in quality to the United States total data, while the Kansas data consist of irregular reporting that creates additional challenges for forecasting. As observed in the example predictions, all four methods perform fairly well in predicting the one-week cumulative case numbers, indicating good fit to the mean behav- ior, with the reservoir computer methods performing slightly better. The worst predictor 83 Predictions for U.S. COVID-19 cases starting at t = 70 1e4 4 truth 3 xavg(t); m = 6 2 40 50 60 70 80 1e4 4 xloc(t); m = 6 3 2 40 50 60 70 80 1e4 4 linear fit; 10 points 3 2 40 50 60 70 80 1e4 4 quadratic fit; 12 points 3 2 40 50 60 70 80 days after March 1, 2020 (t) Predictions for U.S. COVID-19 cases starting at t = 110 1e4 6 truth xavg(t); m = 6 4 2 80 90 100 110 120 1e4 6 xloc(t); m = 6 4 2 80 90 100 110 120 1e4 6 linear fit; 10 points 4 2 80 90 100 110 120 1e4 6 quadratic fit; 12 points 4 2 80 90 100 110 120 days after March 1, 2020 (t) Figure 4.6: The true daily U.S. COVID-19 case data (black) and example predictions for the four prediction methods: reservoir computer with the windowed average transforma- tion applied to daily cases withm = 6 (magenta), reservoir computer with the local linear fit transformation applied to daily cases withm = 6 (cyan), a linear fit applied to the daily case data using 10 points (orange), and a quadratic fit applied to the daily case data using 12 points. Predictions start at times t = 70 (top) and t = 110 (bottom). Vertical dashed lines mark the start and end of data points ?visible? to the corresponding predictors. 84 new cases new cases new cases new cases new cases new cases new cases new cases truth xloc(t) from RC (m = 6) linear fit (10 points) 51.0 1e2 1e1 1e2 AL 1.0 AK AZ 2.5 0.5 5.02.5 50 60 70 80 50 60 70 80 50 60 70 80 1e2 1e3 1e3 5.0 AR 3 2 CA 1.0 CO2.5 1 0.5 0.8 50 60 70 80 50 60 70 80 50 60 70 80 1e3 4 1e2 1e22 CT DC 5.0 DE 1 2 2.5 50 60 70 80 50 60 70 80 50 60 70 80 1e3 1e3 4 1e11.5 1.0 FL 1.5 1.0 GA HI 0.65 0.5 2 50 60 70 80 50 60 70 80 50 60 70 80 1e2 1e3 1e3 1.0 ID 5.0 IL 1.0 0.5 IN2.5 0.5 50 60 70 80 50 60 70 80 50 60 70 80 1e2 1e2 1e2 IA 570..450 5.0 KS 5.0 KY 2.5 2.5 2.5 50 60 70 80 50 60 70 80 50 60 70 80 1e3 1e1 1e3 1.0 LA 7.5 ME 25.0 MD0.5 2.5 1 0.2 50 60 70 80 50 60 70 80 50 60 70 801e3 1e3 1.0 1e33 MA 1.52 1.0 MI MN1 0.5 0.5 50 60 70 80 50 60 70 80 50 60 70 80 5.0 1e2 1e2 1e1MS 3 1.0MO MT2.5 2 0.5 0.0 1 0.050 60 70.2 80 50.4 60 70 0.860 50 0.680 70 80 1.0 days after March 1, 2020 (t) Figure 4.7: State-level COVID-19 case data (black) compared to a reservoir computer prediction using the local linear fit transformation (cyan) with m = 6 and a linear extrap- olation (orange) using 10 points. All predictions start on day 70, the first day after the vertical dashed black line. The vertical orange dashed line marks the start of points used by the linear extrapolation and the vertical cyan dashed line marks the start of points used to train the reservoir computer. Here we show the first twenty-six states alphabetically and the District of Columbia. The remaining states are shown in Figure 4.8. 85 new cases (x(t)) truth xloc(t) from RC (m = 6) linear fit (10 points) 71..0 1e2 1e2 5 NE 3 NV 2 1e2 52..0 NH 5 12 1 50 60 70 80 50 60 70 80 50 60 70 80 5.0 1e3 3 1e2 1e4NJ 2 NM 1.0 NY2.5 1 0.5 0.8 50 60 70 80 50 60 70 80 50 60 70 80 1e3 1e2 1e3 1.0 NC 1.5 ND 1.51.0 1.0 OH0.5 0.5 0.5 50 60 70 80 50 60 70 80 50 60 70 80 0.26 1e2 1e2 1e3OK 1.0 OR PA 1 0.5 21 50 60 70 80 50 60 70 80 50 60 70 80 1e2 4 1e2 1e25.0 RI SC 32 SD 2.5 2 1 0.4 50 60 70 80 50 60 70 80 50 60 70 80 1e3 1e3 1e2 1.0 TN 2 TX 2 UT 0.5 1 1 50 60 70 80 50 60 70 80 50 60 70 80 0.32 1e1VT 1.5 1e3 1e2 2 41.0 VA WA1 20.5 50 60 70 80 50 60 70 80 50 60 70 80 1e2 1e2 1e2 1.0 WV 5.0 WI 1.5 WY 0.5 1.02.5 0.50.0 0.050 60 70.2 80 50.4 60 70 0.860 50 0.680 70 80 1.0 days after March 1, 2020 (t) Figure 4.8: The continuation of Figure 4.7, showing state-level COVID-19 case data (black) compared to a reservoir computer prediction using the local linear fit transforma- tion (cyan) with m = 6 and a linear extrapolation (orange) using 10 points. All predic- tions start on day 70, the first day after the vertical dashed black line. The vertical orange dashed line marks the start of points used by the linear extrapolation and the vertical cyan dashed line marks the start of points used to train the reservoir computer. 86 new cases (x(t)) appears to be the quadratic fit, which is more likely to quickly diverge from the true data in the case of a bad fit, and diverges at a quadratic rate. It is difficult to determine a direct comparison between the RC predictors and the polynomial fit predictors, since the values of m used in the data transformations do not correspond well to the number of points used in the polynomial fits. Furthermore, the polynomial fits are directly applied to the raw case data, while the reservoir computer predictions include an additional data transformation step. While it is true that the value of m relates to the number of points used in the smoothing of the data, it does not include the 21 days of training advantage of the reservoir computer. Despite this advantage, the reservoir computer predictions in general do not appear to be significantly better than a simple linear extrapolation. When presented with a small amount of data, it may perhaps then be reasonable to assume a linear growth rate over the short-term. While we may be unable to directly compare between the predictors for a fixed number of points, we can compare the best-case results. From our national, New York, and Kansas results, we estimate that the best reservoir computer predictor is the local lin- ear fit transformation with m = 6, and the best polynomial fit predictor is the linear fit with 10 points. In Figures 4.9(c,f), 4.10(c,f), and 4.11(c,f), we include a direct comparison between the distributions of these best performers. In addition, in Figure 4.12, we produce a scatter plot comparison of the prediction errors for these two methods when applied to each state, the District of Columbia, and the United States case data. We find that most points lie near the line y = x, demonstrating no significant improvement in predictive power from one method or the other. We do note that the reservoir computer prediction does perform slightly better at predicting the cumulative number of cases for almost every 87 Results for U.S. COVID-19 case predictions 101 101 101 100 100 100 10 1 10 1 10 1 10 2 10 2 10 2 10 3 10 3 10 3 10 4 uavg 10 4 linear fit 10 4 uloc from RC (m = 6) uloc quadratic fit linear fit (10 points) 10 5 (a) 10 5 (b) 10 5 (c) 1 2 3 4 5 6 7 4 6 8 10 12 14 m number of points 101 101 101 100 100 100 10 1 10 1 10 1 10 2 (d) 10 2 (e) 10 2 (f) 1 2 3 4 5 6 7 4 6 8 10 12 14 m number of points Figure 4.9: Predictor statistics for the four different predictor methods with different choices ofm for the reservoir computer transformations (a,d) and number of points for the polynomial fits (b,e) applied to the U.S. COVID-19 daily case data. Statistics are taken by generating a prediction for each starting time t0 in [50, 250]. In (c,f), we compare the distributions for the errors of the reservoir computer predictions using the local linear fit transformation with m = 6 and the linear extrapolation using 10 points. 88 one-week NMAE one-week cumulative error Results for New York COVID-19 case predictions 101 101 101 100 100 100 10 1 10 1 10 1 10 2 10 2 10 2 10 3 10 3 10 3 10 4 uavg 10 4 linear fit 10 4 uloc from RC (m = 6) uloc quadratic fit linear fit (10 points) 10 5 (a) 10 5 (b) 10 5 (c) 1 2 3 4 5 6 7 4 6 8 10 12 14 m number of points 101 101 101 100 100 100 10 1 10 1 10 1 10 2 (d) 10 2 (e) 10 2 (f) 1 2 3 4 5 6 7 4 6 8 10 12 14 m number of points Figure 4.10: Predictor statistics for the four different predictor methods with different choices of m for the reservoir computer transformations (a,d) and number of points for the polynomial fits (b,e) applied to the New York COVID-19 daily case data. Statistics are taken by generating a prediction for each starting time t0 in [50, 250]. In (c,f), we compare the distributions for the errors of the reservoir computer predictions using the local linear fit transformation with m = 6 and the linear extrapolation using 10 points. 89 one-week NMAE one-week cumulative error Results for Kansas COVID-19 case predictions 102 102 102 100 100 100 10 2 10 2 10 2 10 4 uavg 10 4 linear fit 10 4 uloc from RC (m = 6) uloc quadratic fit linear fit (10 points)(a) (b) (c) 1 2 3 4 5 6 7 4 6 8 10 12 14 m number of points 102 102 102 101 101 101 100 100 100 10 1 10 1 10 1 (d) (e) (f) 1 2 3 4 5 6 7 4 6 8 10 12 14 m number of points Figure 4.11: Predictor statistics for the four different predictor methods with different choices of m for the reservoir computer transformations (a,d) and number of points for the polynomial fits (b,e) applied to the Kansas COVID-19 daily case data. Statistics are taken by generating a prediction for each starting time t0 in [50, 250]. In (c,f), we compare the distributions for the errors of the reservoir computer predictions using the local linear fit transformation with m = 6 and the linear extrapolation using 10 points. Note that the box plots for the reservoir computer predictors (a,d) with low values of m extend way beyond the vertical scale used. A zoom out is provided in Figure 4.13. 90 one-week NMAE one-week cumulative error CT CT RI RI VT VT 0.35 Median one-week cumulative error 0.8 Median one-week NMAE LA NH DE 0.30 0.7 DE HI NH 0.60.25 HIME KYNE AK KY LA 0.5 KSME TNMO 0.20 MTKS WY MTTN NEDC AKWMDYICAL SDSD WV AR MS 0.4 AORKWV MS ND TX OK MIAODAZL 0.15 NSCV ID N N CTJ M OX IA MNAZ MNIJ NV ORND ORMGAA CO IA 0.3 SC MAFL VWA I NFML MGNA OHIN MD VPAA CIMAW0.10 NW DAI NUW IL CPTIACALA NY 0.2 UT OHNNYC 0.05 U.S. 0.1 U.S. 0.00 0.0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 error from linear extrapolation (10 points) error from linear extrapolation (10 points) Figure 4.12: Scatter plots comparing the one-week cumulative error (left) and one-week NMAE (right) for the reservoir computer prediction using the local linear fit transforma- tion with m = 6 and the linear extrapolation using 10 points. The coordinates of each point are the median values of the error metrics for the labeled locations. The red line marks y = x. The plot includes forty-seven states, the District of Columbia, and the United States as a whole. Not included are Connecticut, Rhode Island, and Vermont, which each had significantly higher median errors from both predictors. state, indicating better prediction of the mean behavior. Perhaps counter-intuitively, the reservoir computer does worse at predicting the individual daily cases numbers, as mea- sured through the NMAE, despite being able to apparently match the daily oscillations better than a linear extrapolation. A probable cause is the oscillations in the predictions being slightly ?out-of-phase? with the true oscillations, which increases the daily errors but has no impact on the overall cumulative count. Deviating from the discussion of comparing predictors for a moment, we turn our attention to the results of the Kansas data set, shown in Figures 4.11 and 4.13. The Kansas data is quite poor in quality, when considered for a prediction task. The irregular reporting frequency leads to several days of zero cases reported, followed by a large spike in cases. The smoothing induced by our transformations are able to salvage the data. With a small 91 error from RC with local linear fit transformation (m = 6) error from RC with local linear fit transformation (m = 6) Results for Kansas COVID-19 case predictions 1021 uavg 1018 1018 uloc 1014 1015 1010 1012 9 106 10 106 102 103 10 2 100 1 2 3 4 5 6 7 1 2 3 4 5 6 7 m m Figure 4.13: Zoom out of Figure 4.11(a,d), showing the full distributions for low values of m. amount of smoothing (m = 4), the reservoir computer prediction accuracy of the Kansas data is of similar order to the well-behaved data sets of New York and the whole U.S. By returning to Figure 4.2, we can see that m = 4 is when the majority of spikes in the data set are no longer present over the timespan of [50, 250]. While proper data transformation is key in allowing for successful RC prediction, we again note that the predictions are not exceptionally better than a direct linear fit. 4.3.3 Library-augmented reservoir computing applied to U.S. COVID- 19 case data An alternative to a simple reservoir computer predictor is the metalearning approach of library-augmented reservoir computing (LARC). To apply LARC, we first must con- struct a library of reservoir computers that will be used to train an autoencoder. For our investigation, each library member consists of a single RC trained on a 50-day window of data, with the first 10 days of each window used for reservoir initialization. Each li- 92 one-week cumulative error one-week NMAE brary member uses a different window, but these windows may overlap. Data input into the reservoir computers are transformed according to the daily percent change Eq. (4.7) (equivalent to either transformation method with m = 1). Note that all library members use identical reservoir computers (in terms of network topology and other ?randomized? variables). All hyperparameters, other than the training length are identical to those spec- ified in Table 4.1. Before using LARC to generate predictions, we first validate the assumption that the latent embedding of the library results in clustering of similarly-behaved systems. That is, two library members close in the latent space will correspond to similar time series. We test this assumption using the U.S. data set, creating a library member for each window of data starting in t ? [30, 300]. We embed the library in a three-dimensional latent space, for visualization purposes. The resulting embedding is shown in Figure 4.14, where points are colored according to the training window?s start time, and demonstrates a clear clustering by this value. For evaluation of LARC-generated predictions, we make a few modifications. First, we increase the dimension of the latent space to twelve, and second, we restrict our library members to consist of windows starting in [30, 125]. We then generate predictions with starting times t0 ? 180 using the LARC method with new signals containing only five data points. Note that a new signal containing any points before day 175 would otherwise have some overlap with at least one library member?s window. In Figure 4.15, we present several sample predictions generated by LARC and com- pare them to a single reservoir computer prediction generated from 21 days of training data, as well as a linear extrapolation using only the five days of information. Qualita- 93 Embedding of U.S. COVID-19 library (time) 0.5 1.0 1.0 [30,100] 0.8 0.8 (100,200]0.4 (200,300] 0.3 0.6 0.6 0.2 0.4 0.4 0.1 0.2 0.2 0.0 (a) 0.0 (b) 0.0 (c) 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.0 0.2 0.4 e1 e1 e2 (d) 0.8 0.6 0.4 0.2 0.5 0.4 0.0 0.30.2 0.4 0.2 e 0.6 0.8 0.11 1.0 0.0 Figure 4.14: Embedding of the U.S. COVID-19 library into a three-dimensional latent space using an autoencoder. The library consists of reservoir computers trained over 50-day overlapping windows of data from t = 30 to t = 300. Daily case data were transformed using the percent daily change Eq. (4.7) before input into the RCs. Points are colored according to the first day of the training window, with green points corresponding to starting in [30, 100], blue points corresponding to starting in (100, 200], and red points corresponding to starting in (200, 300]. Panels (a?c) are projections of the points onto the three coordinate planes. Panel (d) shows all points in three-dimensions. 94 e2 e3 e e 32 e3 tively, the reservoir computer predictions generated from the five-day signals (aided by LARC) are similar to those generated from a 21-day training signal. When viewing the seven-day zoom in, the predictions between LARC applied to a five-day signal and a sin- gle RC with a 21-day signal are nearly identical in several cases. The linear extrapolaton based on five points appears to have difficulty matching the mean behavior, due to see- ing approximately half of the weekly oscillation. For instance, if the case numbers are mainly increasing towards the peak of the week over the five day period, then the linear extrapolation will predict a continuation of this increase, as seen in the first prediction in Figure 4.15. The power of LARC comes from being able to replace a large training data set with a small data set supplemented by a large library. This advantage would be particularly useful in the early stages of a pandemic, where only a few days of data are available. In this case, one would not be able to generate a library out of data from the current pandemic, but could instead rely on data from past pandemics. A signal that is better- represented by the library will produce a higher-quality prediction. 4.3.4 Autoencoder embedding of state-level COVID-19 data We can perform a similar treatment to state-level COVID-19 data as done for the U.S. data set to validate the assumption that time series with similar behavior would be close in latent space, thereby indicating LARC as a suitable forecasting approach. We construct a full library in the same way as before, taking reservoir computers trained on overlapping 50-day windows with start times in [30, 300]. We complete this process with 95 1e5 LARC applied to U.S. COVID-19 case data 1e5 LARC applied to U.S. COVID-19 case data 1.0 truth 1.0 truth LARC LARC 0.8 single RC (m = 1) 0.8 single RC (m = 1) linear fit (5 points) linear fit (5 points) 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 120 130 140 150 160 170 180 190 200 130 140 150 160 170 180 190 200 1e4 1e4 6 6 4 4 2 2 0 0 179 180 181 182 183 184 185 186 184 185 186 187 188 189 190 191 days after March 1, 2020 (t) days after March 1, 2020 (t) 1e5 LARC applied to U.S. COVID-19 case data 1e5 LARC applied to U.S. COVID-19 case data 1.0 truth 1.0 truth LARC LARC 0.8 single RC (m = 1) 0.8 single RC (m = 1) linear fit (5 points) linear fit (5 points) 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 130 140 150 160 170 180 190 200 210 140 150 160 170 180 190 200 210 1e4 1e4 6 6 4 4 2 2 0 0 189 190 191 192 193 194 195 196 194 195 196 197 198 199 200 201 days after March 1, 2020 (t) days after March 1, 2020 (t) 1e5 LARC applied to U.S. COVID-19 case data 1e5 LARC applied to U.S. COVID-19 case data 1.0 truth 1.0 truth LARC LARC 0.8 single RC (m = 1) 0.8 single RC (m = 1) linear fit (5 points) linear fit (5 points) 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 140 150 160 170 180 190 200 210 220 150 160 170 180 190 200 210 220 1e4 1e4 6 6 4 4 2 2 0 0 199 200 201 202 203 204 205 206 204 205 206 207 208 209 210 211 days after March 1, 2020 (t) days after March 1, 2020 (t) Figure 4.15: Predictions using the library-augmented reservoir computing method (red lines) using five points of training data compared to a single reservoir computer (blue lines) using 21 days of training data and a linear extrapolation (orange lines) using five days of data. Vertical red dashed lines indicate the start of the LARC training data and data used for the linear extrapolation, and vertical blue dashed lines mark the start of the single RC training data. Predictions start on the first day after the green dots. Each bottom panel is a one-week zoom in of the predictions shown in their corresponding top panels. The library used for LARC consists of RCs trained on 50-day overlapping windows in [30, 125]. 96 new cases new cases new cases new cases new cases new cases new cases new cases new cases new cases new cases new cases the New York data set and train an autoencoder with a twelve-dimensional latent space using the first stage of the LARC procedure. As shown in Figure 4.16, there is clear clustering of library members by time. We are not restricted to constructing a library from a single region. In fact, we may consider constructing a library consisting of RCs trained on data from every state or multiple countries, multiplying the size of the library and increasing the number of examples that a small training signal can be compared to. In Figure 4.17, we demonstrate the latent space embedding of the combined libraries using the U.S. and New York data sets. In this case, not only do we once again observe clustering by time, but there is also clear clustering by region. That is, library members from the U.S. data set are near each other in latent space and library members from the New York data set are near each other in latent space. We note that clustering is not immediately apparent in every realization of autoen- coder training, and introducing more regions makes clustering more difficult, without fur- ther increasing the size of the embedding dimension. Furthermore, smaller geographical regions tend to have poorer-quality data (due to smaller sample sizes or irregular report- ing) which can introduce examples that negatively impact the overall prediction quality of the library. Such examples can potentially be salvaged by determining a proper trans- formation, as shown in the Kansas data predictions, or by applying a ?filter? that prevents poor examples from being included in the library. 97 Embedding of NY COVID-19 library (ti 1.0 1.0 1 m.0e) [30,100] 0.8 0.8 0.8 (100,200] (200,300] 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.00 0.25 0.50 0.75 0.25 0.50 0.75 0.0 0.5 1.0 Figure 4.16: Embedding of the New York COVID-19 library into a twelve-dimensional latent space using an autoencoder. The library consists of reservoir computers trained over 50-day overlapping windows of data from t = 30 to t = 300. Daily case data were transformed using the percent daily change Eq. (4.7) before input into the RCs. Each panel is a projection onto a plane spanned by a pair of coordinate axes. Points are colored according to the first day of the training window, with green points corresponding to starting in [30, 100], blue points corresponding to starting in (100, 200], and red points corresponding to starting in (200, 300]. 98 Embedding of NY with U.S. COVID-19 library (time) 1.0 1.0 [30,100] 0.8 0.8 0.8 (100,200] (200,300] 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0.25 0.50 0.75 0.0 0.5 0.00 0.25 0.50 0.75 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 Embedding of NY with U.S. COVID-19 library (region) 1.0 1.0 NY 0.8 0.8 0.8 U.S. 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0.25 0.50 0.75 0.0 0.5 0.00 0.25 0.50 0.75 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 Figure 4.17: Embedding of the combined U.S. and New York COVID-19 library into a twelve-dimensional latent space using an autoencoder. The library consists of reservoir computers trained over 50-day overlapping windows of data from t = 30 to t = 300. Daily case data were transformed using the percent daily change Eq. (4.7) before input into the RCs. Each panel is a projection onto a plane spanned by a pair of coordinate axes. Top: Points are colored according to the first day of the training window, with green points corresponding to starting in [30, 100], blue points corresponding to starting in (100, 200], and red points corresponding to starting in (200, 300]. Bottom: Points are colored according to region, with red indicating New York and black indicating the U.S. 99 4.4 Discussion We have presented an application of reservoir computing to predict U.S. COVID- 19 case data at both the national level and individual state levels. Such a task presents multiple challenges, such as low quantities of data and low-quality data. Typical reser- voir computing tasks with data generated from a model are able to attain any number of data points (usually thousands) with arbitrary resolution and quality, limited only by the computational resources or artificial limitations imposed by the investigation. Despite these hurdles, we have demonstrated that a proper choice of smoothing through data preprocessing can salvage poor-quality data, as demonstrated in the Kansas data set. Furthermore, despite access to limited data, we find that 31 data points is suf- ficient to generate seven-day predictions with reasonable accuracy, occasionally (though not always) within 20% of the true cumulative case number. Unfortunately, a reservoir computing method does not seem to perform significantly better than a simple linear re- gression estimate that uses fewer data points. An alternative to a direct reservoir computing approach is the metalearning method of library-augmented reservoir computing. In this method, trained reservoir computers are encoded using an autoencoder, producing a low-dimensional latent space that can be searched to generate a reservoir computer that predicts off of a short training signal. Such an approach has the potential to further overcome the challenge of limited data access by replacing a large number of training points with a large number of library member examples. Ideally, library members would consist of RCs trained on time series exhibiting similar properties to the signals we wish to forecast, but are not required to be from the 100 exact same process. For example, library members can be trained on different pandemic data than the data we wish to forecast, but underlying properties, such as the spreading rate, may be similar. It is clear that a direct reservoir computer predictor is not yet able to significantly outperform a simpler approach, such as a linear fit. With the biggest issue being lack of data, we would need to augment our data set, such as through a library-based approach, as discussed, in order to produce more reliable results. Another easy improvement would be to introduce multiple input signals beyond just the case data, such as death counts, hospitalization numbers, or mobility data. While increasing the number of inputs provides access to more data, we do note that each additional input would also need to be predicted by the reservoir computer. Other than supplementing with additional data, we can consider augmenting the ba- sic reservoir computing architecture. For instant, we can introduce a network of reservoir computers connected to represent mobility between different subregions (i.e., states). In a similar manner, the basic reservoir computer can be replaced with a hybrid system, in which data is input in parallel with model predictions, such as from a basic SIR model or any successful variation already applied to COVID-19 data. While no longer truly ?model-free, the model is able to guide the reservoir computer predictions, despite poten- tial inaccuracies in the model itself. 101 Appendix A: Supplementary material for Chapter 2 A.1 Locally treelike features Here we demonstrate the locally treelike features for our simulated networks as they are grown through evolutionary time. In the semi-annealed approximation described in the main text, which allows us to map the stability of synchronous Boolean dynamics to an appropriately defined percolation problem, we make the assumption that pairs of nodes are rarely connected by multiple short paths of the same length (in the same direction). This is a weaker version of the more widely used locally treelike assumption. For our simulated networks, we consider all pairs of nodes connected by a path of length two and calculate the fraction that are connected by at least one other path of length two (in the same direction). This bi-parallel motif (i.e., A ? B ? C and A ? D ? C) is the smallest network motif that is relevant for the case of synchronous update of nodes in a Boolean network. The smaller feedforward motif (i.e., A ? B ? C and A ? C) does not cause an issue when applying the semi-annealed approximation because signal information from A to C that goes through B does not arrive in the same time step as the information from that goes directly from A to C. Results show in the figure below are for a network of size N = 105. We see that the prevalence of the bi-parallel motif is very small for full duration of the link addition processes considered in the main text. 102 Figure A.1: The fractions of nodes connected by a directed path of length two connected by at least one other directed path of length two. A.2 Selection with m = N In this section, we consider the structure-based rule that relies on full global infor- mation at each time step, i.e., m = N , with ties broken randomly. Recall that N is the number of genes and m is the number of candidate genes considered for both the source and target node each time a new link is added to the system. Results are shown below for a network of size N = 1000, averaged over 100 realizations. Due to the significant increase in runtime for m = N compared with m = 2 or m = 3 reported in the main text, we used a smaller value of N for this study. Even at this smaller value of N we can see that selection with m = N significantly outperforms all others in suppressing the growth of connected components. 103 Figure A.2: Growth of the connected components on the full network (left) and the dam- aged subnetwork (right). Results shown are the averages of 100 runs on networks of size N = 1000. We omit GIN and GIN* due to their symmetry with GOUT and GOUT*, respectively. 104 Appendix B: Complications with data preprocessing of COVID-19 case data In this appendix, we consider several issues that need addressing when developing a proper transformation for the COVID-19 case data. B.1 Negative number of cases reported To account for overestimates in the number of cases, several regions have single days with a ?negative? number of cases. Without taking into account reporting errors, the assumption of the underlying model represented by the reservoir computer would require the number of daily cases to always be non-negative. For simplicity, we focus our results in the main text only on regions and times in which all cases reported are non-negative. An alternative, simple solution would be to zero-out all days with negative cases. In most cases, the number of negative cases reported is small, so that the overall totals are minimally impacted. For more extreme cases, one could equally divide the negative cases across all previous days of data. 105 B.2 Irregular reporting While most regions report daily cases, certain locations report the number of new cases accumulated every few (e.g. three) days. For transformations in the form x??(t) u(t) = , (B.1) x?(t) several days of zero cases followed by a single day with a large non-zero number of cases will cause a ?spike? in u(t) that becomes challenging to predict. A form of smoothing or averaging will be required to avoid spikes in data, dividing their contribution over several days. Even in less extreme examples, where cases are reported every day, there is a clear oscillation in reporting frequency that follows the days of the week. These well- behaved oscillations are potentially useful in generating realistic-looking predictions that also match these oscillations. We have attempted to incorporate the 7-day oscillation period into the reservoir computer input, but such methods have shown negligible im- provement in prediction quality. B.3 Zero cases reported Unlike negative number of cases, a day with zero cases is valid within the frame- work of the underlying model. However, the transformation (B.1) is invalid for a denom- inator of zero. Smoothing has the potential to prevent this issue, but fails when a trail of zeros is longer than the smoothing window. As a straightforward solution, we replace 106 x(t) with x?(t) = x(t) + 1. In general, the data can be shifted by any value c, which intro- duces an additional hyperparameter to the system. During phases of epidemic spread, the number of daily cases are on the order of hundreds (at the state level) to several thousands (at the national level). Thus, shifting all values by one will have negligible impact on the total case counts over this period. Furthermore, after training the reservoir computer to predict x?(t), we can easily recover x(t) by shifting back. Note that in cases where x??(t) is calculated by differencing or the slope of a linear fit, the vertical shift will effectively cancel out. Our main concern is with guaranteeing x?(t) never becomes zero. B.4 Invertibility Even if a transformation of the data is well-defined, we run into issues if the in- verse has a limited domain. Specifically, restrictions on u(t) resulting from the forward transformation may not be preserved when future values are generated from a reservoir computer. For instance, the transformation ( ? ? )x?(t) x?(t 1) u(t) = tanh x?(t? , (B.2)1) has shown to be effective in producing predictions in initial tests. The main feature of the hyperbolic tangent is to suppress large intermittent spikes. However, the inverse trans- formation requires a domain of (?1, 1), which is not enforced by the reservoir computer. In the case of a prediction extending beyond this domain, the prediction is lost from lack of invertibility. Even worse, all future predicted values are lost, since each subsequent prediction depends multiplicatively on the value of the last. In the specific case of the 107 hyperbolic tangent, values that exist near 1 (even without passing it) can also lead to blow up of the inverse transformation, which is similarly detrimental, despite the inverse trans- formation existing. Consequently, it is prudent to restrict ourselves to transformations u(t) that have well-defined inverses for any possible predicted value. 108 Bibliography [1] B. J. Liebeskind, C. D. McWhite, and E. M. Marcotte, ?Towards consensus gene ages,? Genome Biol Evol, vol. 8, no. 6, pp. 1812?1823, 2016. (doi: 10.1093/gbe/evw113). [2] B. Alexander, A. Pushkar, and M. Girvan, ?Phase transitions and assortativity in models of gene regulatory networks evolved under different selection processes,? J R Soc Interface, vol. 18, Apr. 2021. (doi: 10.1098/rsif.2020.0790). [3] S. Kauffman, ?Metabolic stability and epigenesis in randomly constructed genetic nets,? J Theor Biol, vol. 22, no. 3, pp. 437 ? 467, 1969. (doi: 10.1016/0022- 5193(69)90015-0). [4] A. Pomerance, E. Ott, M. Girvan, and W. Losert, ?The effect of network topology on the stability of discrete state models of genetic control,? P Natl Acad Sci USA, vol. 106, no. 20, pp. 8209?8214, 2009. (doi: 10.1073/pnas.0900142106). [5] S. Squires, E. Ott, and M. Girvan, ?Dynamical instability in boolean networks as a percolation problem,? Phys Rev Lett, vol. 109, p. 085701, Aug 2012. (doi: 10.1103/PhysRevLett.109.085701). [6] D. Achlioptas, R. M. D?Souza, and J. Spencer, ?Explosive percolation in random networks,? Science, vol. 323, no. 5920, pp. 1453?1455, 2009. (doi: 10.1126/sci- ence.1167782). [7] A. Chicoli and D. A. Paley, ?Probabilistic information transmission in a network of coupled oscillators reveals speed-accuracy trade-off in responding to threats,? Chaos, vol. 26, no. 11, p. 116311, 2016. (doi: 10.1063/1.4966682). [8] R. Sole? and S. Valverde, ?Evolving complexity: how tinkering shapes cells, software and ecological networks,? Philos T R Soc B, vol. 375, no. 1796, p. 20190325, 2020. (doi: 10.1098/rstb.2019.0325). [9] T. Mora and W. Bialek, ?Are biological systems poised at criticality?,? J Stat Phys, vol. 144, pp. 268?302, Jul 2011. (doi: 10.1007/s10955-011-0229-4). [10] B. C. Daniels, H. Kim, D. Moore, S. Zhou, H. B. Smith, B. Karas, S. A. Kauffman, and S. I. Walker, ?Criticality distinguishes the ensemble of biological regulatory 109 networks,? Phys Rev Lett, vol. 121, p. 138102, Sep 2018. (doi: 10.1103/Phys- RevLett.121.138102). [11] E. Mart??nez, M. Villani, L. La Rocca, S. A. Kauffman, and R. Serra, ?Dynamical criticality in gene regulatory networks,? Complexity, vol. 2018, p. 5980636, 2018. (doi: 10.1155/2018/5980636). [12] B. Derrida and Y. Pomeau, ?Random networks of automata: A simple annealed approximation,? Europhys Lett, vol. 1, no. 2, p. 45, 1986. (doi: 10.1209/0295- 5075/1/2/001). [13] M. Aldana and P. Cluzel, ?A natural class of robust networks,? P Natl Acad Sci USA, vol. 100, no. 15, pp. 8710?8714, 2003. (doi: 10.1073/pnas.1536783100). [14] S. Squires, K. Sytwu, D. Alcala, T. M. Antonsen, E. Ott, and M. Girvan, ?Weakly explosive percolation in directed networks,? Phys Rev E, vol. 87, p. 052127, May 2013. (doi: 10.1103/PhysRevE.87.052127). [15] S. Kauffman, C. Peterson, B. Samuelsson, and C. Troein, ?Genetic networks with canalyzing boolean rules are always stable,? P Natl Acad Sci USA, vol. 101, no. 49, pp. 17102?17107, 2004. (doi: 10.1073/pnas.0407783101). [16] A. Pomerance, M. Girvan, and E. Ott, ?Stability of boolean networks with gen- eralized canalizing rules,? Phys Rev E, vol. 85, no. 4, p. 046106, 2012. (doi: 10.1103/PhysRevE.85.046106). [17] C. Seshadhri, A. M. Smith, Y. Vorobeychik, J. R. Mayo, and R. C. Armstrong, ?Characterizing short-term stability for boolean networks over any distribution of transfer functions,? Phys Rev E, vol. 94, no. 1, p. 012301, 2016. (doi: 10.1103/Phys- RevE.94.012301). [18] S. Squires, A. Pomerance, M. Girvan, and E. Ott, ?Stability of boolean networks: The joint effects of topology and update rules,? Phys Rev E, vol. 90, no. 2, p. 022814, 2014. (doi: 10.1103/PhysRevE.90.022814). [19] S. Bornholdt and T. Rohlf, ?Topological evolution of dynamical networks: Global criticality from local dynamics,? Phys Rev Lett, vol. 84, no. 26, p. 6114, 2000. (doi: 10.1103/PhysRevLett.84.6114). [20] P. Oikonomou and P. Cluzel, ?Effects of topology on network evolution,? Nat Phys, vol. 2, no. 8, p. 532, 2006. (doi: 10.1038/nphys359). [21] M. Aldana, E. Balleza, S. Kauffman, and O. Resendiz, ?Robustness and evolvability in genetic regulatory networks,? J Theor Biol, vol. 245, no. 3, pp. 433?448, 2007. (doi: 10.1016/j.jtbi.2006.10.027). [22] P. Grassberger, C. Christensen, G. Bizhani, S.-W. Son, and M. Paczuski, ?Explo- sive percolation is continuous, but with unusual finite size behavior,? Phys Rev Lett, vol. 106, no. 22, p. 225701, 2011. (doi: 10.1103/PhysRevLett.106.225701). 110 [23] O. Riordan and L. Warnke, ?Explosive percolation is continuous,? Science, vol. 333, no. 6040, pp. 322?324, 2011. (doi: 10.1126/science.1206241). [24] R. M. D?Souza and J. Nagler, ?Anomalous critical and supercritical phenom- ena in explosive percolation,? Nat Phys, vol. 11, no. 7, p. 531, 2015. (doi: 10.1038/nphys3378). [25] W. Viles, C. E. Ginestet, A. Tang, M. A. Kramer, and E. D. Kolaczyk, ?Percolation under noise: Detecting explosive percolation using the second-largest component,? Phys Rev E, vol. 93, no. 5, p. 052301, 2016. (doi: 10.1103/PhysRevE.93.052301). [26] A. Waagen, R. M. D?Souza, and T.-C. Lu, ?Explosive percolation on directed net- works due to monotonic flow of activity,? Phys Rev E, vol. 96, no. 1, p. 012317, 2017. (doi: 10.1103/PhysRevE.96.012317). [27] R. M. D?Souza, J. Go?mez-Garden?es, J. Nagler, and A. Arenas, ?Explosive phe- nomena in complex networks,? Adv Phys, vol. 68, no. 3, pp. 123?223, 2019. (doi: 10.1080/00018732.2019.1650450). [28] J. M. Carlson and J. Doyle, ?Highly optimized tolerance: A mechanism for power laws in designed systems,? Phys Rev E, vol. 60, pp. 1412?1427, Aug 1999. (doi: 10.1103/PhysRevE.60.1412). [29] J. M. Carlson and J. Doyle, ?Highly optimized tolerance: Robustness and design in complex systems,? Phys Rev Lett, vol. 84, pp. 2529?2532, Mar 2000. (doi: 10.1103/PhysRevLett.84.2529). [30] V. van Noort, B. Snel, and M. A. Huynen, ?Predicting gene function by con- served co-expression,? Trends Genet, vol. 19, no. 5, pp. 238?242, 2003. (doi: 10.1016/S0168-9525(03)00056-8). [31] S. A. Teichmann and M. M. Babu, ?Gene regulatory network growth by duplica- tion,? Nat Genet, vol. 36, no. 5, p. 492, 2004. (doi: 10.1038/ng1340). [32] P. D. Kuo, W. Banzhaf, and A. Leier, ?Network topology and the evolution of dy- namics in an artificial genetic regulatory network model created by whole genome duplication and divergence,? Biosystems, vol. 85, no. 3, pp. 177?200, 2006. (doi: 10.1016/j.biosystems.2006.01.004). [33] S. Manrubia and J. A. Cuesta, ?Evolution on neutral networks accelerates the ticking rate of the molecular clock,? J R Soc Interface, vol. 12, no. 102, p. 20141010, 2015. (doi: 10.1098/rsif.2014.1010). [34] L. Demetrius and T. Manke, ?Robustness and network evolution?an en- tropic principle,? Physica A, vol. 346, pp. 682?696, Feb. 2005. (doi: 10.1016/j.physa.2004.07.011). 111 [35] J. Demongeot, A. Elena, and S. Sene?, ?Robustness in regulatory networks: A multi-disciplinary approach,? Acta Biotheor, vol. 56, no. 1, pp. 27?49, 2008. (doi: 10.1007/s10441-008-9029-x). [36] S. Melnik, A. Hackett, M. A. Porter, P. J. Mucha, and J. P. Gleeson, ?The unreason- able effectiveness of tree-based theory for networks with clustering,? Phys Rev E, vol. 83, p. 036112, Mar 2011. (doi: 10.1103/PhysRevE.83.036112). [37] S. Chandra, E. Ott, and M. Girvan, ?Critical network cascades with re-excitable nodes: Why treelike approximations usually work, when they break down, and how to correct them,? Phys Rev E, vol. 101, p. 062304, Jun 2020. (doi: 10.1103/Phys- RevE.101.062304). [38] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, ?Random graphs with arbitrary degree distributions and their applications,? Phys Rev E, vol. 64, p. 026118, Jul 2001. (doi: 10.1103/PhysRevE.64.026118). [39] P. Erdo?s and A. Re?nyi, ?On the evolution of random graphs,? Publ Math Inst Hung Acad Sci, vol. 5, pp. 17?61, 1960. [40] R. A. da Costa, S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes, ?Explo- sive percolation transition is actually continuous,? Phys Rev Lett, vol. 105, no. 25, p. 255701, 2010. (doi: 10.1103/PhysRevLett.105.255701). [41] A. Santos-Zavaleta, H. Salgado, S. Gama-Castro, M. Sa?nchez-Pe?rez, L. Go?mez- Romero, D. Ledezma-Tejeida, J. S. Garc??a-Sotelo, K. Alquicira-Herna?ndez, L. J. Mun?iz-Rascado, P. Pen?a-Loredo, C. Ishida-Gutie?rrez, D. A. Vela?zquez-Ram??rez, V. Del Moral-Cha?vez, C. Bonavides-Mart??nez, C.-F. Me?ndez-Cruz, J. Galagan, and J. Collado-Vides, ?RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12,? Nucleic Acids Res, vol. 47, pp. D212?D220, 11 2018. (doi: 10.1093/nar/gky1077). [42] C. Chen, D. Zhang, T. R. Hazbun, and M. Zhang, ?Inferring gene regulatory net- works from a population of yeast segregants,? Sci Rep, vol. 9, no. 1, p. 1197, 2019. (doi: 10.1038/s41598-018-37667-4). [43] D. Y. Rhee, D.-Y. Cho, B. Zhai, M. Slattery, L. Ma, J. Mintseris, C. Y. Wong, K. P. White, S. E. Celniker, T. M. Przytycka, S. P. Gygi, R. A. Obar, and S. Artavanis- Tsakonas, ?Transcription factor networks in drosophila melanogaster,? Cell Rep, vol. 8, no. 6, pp. 2031?2043, 2014. (doi: 10.1016/j.celrep.2014.08.038). [44] H. Han, J.-W. Cho, S. Lee, A. Yun, H. Kim, D. Bae, S. Yang, C. Y. Kim, M. Lee, E. Kim, S. Lee, B. Kang, D. Jeong, Y. Kim, H.-N. Jeon, H. Jung, S. Nam, M. Chung, J.-H. Kim, and I. Lee, ?TRRUST v2: an expanded reference database of hu- man and mouse transcriptional regulatory interactions,? Nucleic Acids Res, vol. 46, pp. D380?D386, 10 2017. (doi: 10.1093/nar/gkx1013). 112 [45] M. B. Gerstein, A. Kundaje, M. Hariharan, S. G. Landt, K.-K. Yan, C. Cheng, X. J. Mu, E. Khurana, J. Rozowsky, R. Alexander, et al., ?Architecture of the human regulatory network derived from encode data,? Nature, vol. 489, no. 7414, p. 91, 2012. (doi: 10.1038/nature11245). [46] T. Vicsek, A. Cziro?k, E. Ben-Jacob, I. Cohen, and O. Shochet, ?Novel type of phase transition in a system of self-driven particles,? Phys Rev Lett, vol. 75, pp. 1226? 1229, Aug 1995. (doi: 10.1103/PhysRevLett.75.1226). [47] K. P. O?Keeffe, H. Hong, and S. H. Strogatz, ?Oscillators that sync and swarm,? Nat Commun, vol. 8, p. 1504, 2017. (doi: 10.1038/s41467-017-01190-3). [48] D. Levis, A. Diaz-Guilera, I. Pagonabarraga, and M. Starnini, ?Flocking-enhanced social contagion,? Phys Rev Res, vol. 2, p. 032056, Sep 2020. (doi: 10.1103/Phys- RevResearch.2.032056). [49] M. E. J. Newman, ?Spread of epidemic disease on networks,? Phys Rev E, vol. 66, July 2002. (doi: 10.1103/physreve.66.016128). [50] J. R. C. Piqueira, B. F. Navarro, and L. H. A. Monteiro, ?Epidemiological models applied to viruses in computer networks,? J Comput Sci, vol. 1, pp. 31?34, Jan. 2005. (doi: 10.3844/jcssp.2005.31.34). [51] F. Jin, E. Dougherty, P. Saraf, Y. Cao, and N. Ramakrishnan, ?Epidemiological modeling of news and rumors on twitter,? in Proceedings of the 7th Workshop on Social Network Mining and Analysis - SNAKDD '13, ACM Press, 2013. (doi: 10.1145/2501025.2501027). [52] K. Zhu and L. Ying, ?Information source detection in the SIR model: A sample- path-based approach,? IEEE ACM T Network, vol. 24, pp. 408?421, Feb. 2016. (doi: 10.1109/tnet.2014.2364972). [53] D. Sumpter, Collective animal behavior. Princeton, N.J: Princeton University Press, 2010. [54] I. Aihara, H. Kitahata, K. Yoshikawa, and K. Aihara, ?Mathematical modeling of frogs? calling behavior and its possible application to artificial life and robotics,? Artif Life Robotics, vol. 12, pp. 29?32, Mar. 2008. (doi: 10.1007/s10015-007-0436- x). [55] D. Helbing, I. Farkas, and T. Vicsek, ?Simulating dynamical features of escape panic,? Nature, vol. 407, pp. 487?490, Sept. 2000. (doi: 10.1038/35035023). [56] D. J. Low, ?Following the crowd,? Nature, vol. 407, pp. 465?466, Sept. 2000. (doi: 10.1038/35035192). [57] A. E. Motter, S. A. Myers, M. Anghel, and T. Nishikawa, ?Spontaneous syn- chrony in power-grid networks,? Nat Phys, vol. 9, pp. 191?197, Feb. 2013. (doi: 10.1038/nphys2535). 113 [58] Y. Kuramoto, ?Self-entrainment of a population of coupled non-linear oscillators,? in International Symposium on Mathematical Problems in Theoretical Physics, pp. 420?422, Springer-Verlag, 1974. (doi: 10.1007/bfb0013365). [59] Y. Kuramoto, Chemical Oscillations, Waves, and Turbulence. Springer Berlin Hei- delberg, 1984. (doi: 10.1007/978-3-642-69689-3). [60] A. Pluchino, V. Latora, and A. Rapisarda, ?Compromise and synchronization in opinion dynamics,? Eur Phys J B, vol. 50, pp. 169?176, Mar. 2006. (doi: 10.1140/epjb/e2006-00131-0). [61] H. Hong and S. H. Strogatz, ?Conformists and contrarians in a kuramoto model with identical natural frequencies,? Phys Rev E, vol. 84, Oct. 2011. (doi: 10.1103/phys- reve.84.046202). [62] K. Kaneko, ?Globally coupled circle maps,? Physica D, vol. 54, no. 1, pp. 5?19, 1991. (doi: 10.1016/0167-2789(91)90103-G). [63] G. Barlev, M. Girvan, and E. Ott, ?Map model for synchronization of systems of many coupled oscillators,? Chaos, vol. 20, no. 2, p. 023109, 2010. (doi: 10.1063/1.3357983). [64] W. O. Kermack, A. G. McKendrick, and G. T. Walker, ?A contribution to the mathe- matical theory of epidemics,? P R Soc A, vol. 115, no. 772, pp. 700?721, 1927. (doi: 10.1098/rspa.1927.0118). [65] L. J. Allen, ?Some discrete-time SI, SIR, and SIS epidemic models,? Math Biosci, vol. 124, no. 1, pp. 83?105, 1994. (doi: 10.1016/0025-5564(94)90025-6). [66] J. Kephart and S. White, ?Directed-graph epidemiological models of computer viruses,? in Proceedings of the 1991 IEEE Computer Society Symposium on Re- search in Security and Privacy, pp. 343?359, IEEE Comput. Soc. Press, 1991. (doi: 10.1109/risp.1991.130801). [67] D. J. Daley and D. G. Kendall, ?Epidemics and rumours,? Nature, vol. 204, pp. 1118?1118, Dec. 1964. (doi: 10.1038/2041118a0). [68] H. Daido, ?Discrete-Time Population Dynamics of Interacting Self-Oscillators,? Prog Theor Phys, vol. 75, pp. 1460?1463, 06 1986. (doi: 10.1143/PTP.75.1460). [69] J. Arino and S. Portet, ?A simple model for COVID-19,? Infec Dis Modelling, vol. 5, pp. 309?315, 2020. (doi: 10.1016/j.idm.2020.04.002). [70] I. Korolev, ?Identification and estimation of the SEIRD epidemic model for COVID-19,? J Econometrics, vol. 220, pp. 63?85, Jan. 2021. (doi: 10.1016/j.jeconom.2020.07.038). [71] A. Karaivanov, ?A social network model of COVID-19,? PLOS ONE, vol. 15, p. e0240878, Oct. 2020. (doi: 10.1371/journal.pone.0240878). 114 [72] K. jae Kim, ?Financial time series forecasting using support vector machines,? Neurocomputing, vol. 55, no. 1, pp. 307?319, 2003. (doi: 10.1016/S0925- 2312(03)00372-2). [73] H. Yan and H. Ouyang, ?Financial time series prediction based on deep learning,? Wireless Pers Commun, vol. 102, pp. 683?700, Dec. 2017. (doi: 10.1007/s11277- 017-5086-2. [74] Z. Han, J. Zhao, H. Leung, K. F. Ma, and W. Wang, ?A review of deep learning models for time series prediction,? IEEE Sens J, vol. 21, pp. 7833?7848, Mar. 2021. (doi: 10.1109/jsen.2019.2923982). [75] M. Han, J. Xi, S. Xu, and F.-L. Yin, ?Prediction of chaotic time series based on the recurrent predictor neural network,? IEEE T Signal Proces, vol. 52, pp. 3409?3416, Dec. 2004. (doi: 10.1109/tsp.2004.837418). [76] S. Pappas, L. Ekonomou, D. Karamousantas, G. Chatzarakis, S. Katsikas, and P. Liatsis, ?Electricity demand loads modeling using AutoRegressive Moving Av- erage (ARMA) models,? Energy, vol. 33, no. 9, pp. 1353?1360, 2008. (doi: 10.1016/j.energy.2008.05.008). [77] D. O?mer Faruk, ?A hybrid neural network and ARIMA model for water quality time series prediction,? Eng App Artif Intel, vol. 23, no. 4, pp. 586?594, 2010. (doi: 10.1016/j.engappai.2009.09.015). [78] S. H. Holan, R. Lund, and G. Davis, ?The ARMA alphabet soup: A tour of ARMA model variants,? Stat Surveys, vol. 4, no. none, pp. 232 ? 274, 2010. (doi: 10.1214/09-SS060). [79] R. Gupta, G. Pandey, P. Chaudhary, and S. Pal, ?SEIR and regression model based COVID-19 outbreak predictions in india,? medRxiv, Apr. 2020. Preprint. (doi: 10.1101/2020.04.01.20049825). [80] B. F. Maier and D. Brockmann, ?Effective containment explains subexponential growth in recent confirmed COVID-19 cases in china,? Science, vol. 368, pp. 742? 746, Apr. 2020. (doi: 10.1126/science.abb4557). [81] R. Gupta and S. K. Pal, ?Trend analysis and forecasting of COVID-19 outbreak in india,? medRxiv, Mar. 2020. Preprint. (doi: 10.1101/2020.03.26.20044511). [82] Z. Hu, Q. Ge, S. Li, L. Jin, and M. Xiong, ?Artificial intelligence forecasting of covid-19 in china,? 2020. Preprint. [83] Z. Yang, Z. Zeng, K. Wang, S.-S. Wong, W. Liang, M. Zanin, P. Liu, X. Cao, Z. Gao, Z. Mai, J. Liang, X. Liu, S. Li, Y. Li, F. Ye, W. Guan, Y. Yang, F. Li, S. Luo, Y. Xie, B. Liu, Z. Wang, S. Zhang, Y. Wang, N. Zhong, and J. He, ?Mod- ified SEIR and AI prediction of the epidemics trend of COVID-19 in china under public health interventions,? J Thorac Dis, vol. 12, pp. 165?174, Mar. 2020. (doi: 10.21037/jtd.2020.02.64). 115 [84] A. Tomar and N. Gupta, ?Prediction for the spread of COVID-19 in india and effec- tiveness of preventive measures,? Sci Total Environ, vol. 728, p. 138762, Aug. 2020. (doi: 10.1016/j.scitotenv.2020.138762). [85] M. Lukos?evic?ius, ?A practical guide to applying echo state networks,? in Lecture Notes in Computer Science, pp. 659?686, Springer Berlin Heidelberg, 2012. (doi: 10.1007/978-3-642-35289-8 36). [86] Z. Lu, J. Pathak, B. Hunt, M. Girvan, R. Brockett, and E. Ott, ?Reservoir observers: Model-free inference of unmeasured variables in chaotic systems,? Chaos, vol. 27, p. 041102, Apr. 2017. (doi: 10.1063/1.4979665). [87] J. Pathak, B. Hunt, M. Girvan, Z. Lu, and E. Ott, ?Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach,? Phys Rev Lett, vol. 120, Jan. 2018. (doi: 10.1103/physrevlett.120.024102). [88] J. Pathak, Z. Lu, B. R. Hunt, M. Girvan, and E. Ott, ?Using machine learning to replicate chaotic attractors and calculate lyapunov exponents from data,? Chaos, vol. 27, p. 121102, Dec. 2017. (doi: 10.1063/1.5010300). [89] D. Patel, D. Canaday, M. Girvan, A. Pomerance, and E. Ott, ?Using machine learn- ing to predict statistical properties of non-stationary dynamical processes: Sys- tem climate, regime transitions, and the effect of stochasticity,? Chaos, vol. 31, p. 033149, Mar. 2021. (doi: 10.1063/5.0042598). [90] J. Pathak, A. Wikner, R. Fussell, S. Chandra, B. R. Hunt, M. Girvan, and E. Ott, ?Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a knowledge-based model,? Chaos, vol. 28, p. 041101, Apr. 2018. (doi: 10.1063/1.5028373). [91] D. Canaday and A. Pomerance, ?Library-augmented reservoir computing: A meta- learning approach to time series prediction with limited data.? Preprint. [92] D. C. Farrow, L. C. Brooks, A. Rumack, R. J. Tibshirani, and R. Rosenfeld, ?Delphi epidata API,? 2015. https://github.com/cmu-delphi/delphi-epidata. 116