ABSTRACT Title of Dissertation: RESERVOIR COMPUTING WITH BOOLEAN LOGIC NETWORK CIRCUITS Heidi Baumgartner Komkov Doctor of Philosophy, 2021 Dissertation Directed by: Professor Daniel P Lathrop Department of Physics To push the frontiers of machine learning, completely new computing archi- tectures must be explored which efficiently use hardware resources. We test an unconventional use of digital logic gate circuits for reservoir computing, a machine learning algorithm that is used for rapid time series processing. In our approach, logic gates are configured into networks that can exhibit complex dynamics. Rather than the gates explicitly computing pre-programmed instructions, they are used col- lectively as a dynamical system that transforms input data into a higher dimensional representation. We probe the dynamics of such circuits using discrete components on a circuit board as well as an FPGA implementation. We show favorable machine learning performance, including radiofrequency classification accuracy comparable to a state of the art convolutional neural network with a fraction of the trainable parameters. Finally, we discuss the design and fabrication of a reservoir computing ASIC for high-speed time series processing. Reservoir Computing with Boolean Logic Network Circuits by Heidi Baumgartner Komkov Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2021 Dissertation Committee: Dr. Daniel P. Lathrop, Chair/Advisor Dr. Timothy Horiuchi Dr. Thomas Antonsen Dr. Timothy Koeth Dr. Brian Beaudoin Dr. Vedran Lekic ? Copyright by Heidi Baumgartner Komkov 2021 Acknowledgments I would like to express my deep appreciation to all current and former team members in the Lathrop group: Daniel Lathrop, Alessandro Restelli, Brian Hunt, Liam Pocher, Liam Shaughnessy, Itamar Shani, Anthony Mautino, and John Rzasa. Thanks as well to Brian Beaudoin, Timothy Koeth, and Irving Haber, from the University of Maryland Electron Ring (UMER), where I spent my first years of graduate school, as well as my fellow UMER graduate students Kiersten Ruisard and Levon Dovlatyan. Thank you to Advait Madhavan and Jabez McClelland in the Alternative Computing Group at the Nanoscale Device Characterization Division of the Physical Measurement Laboratory at the National Institute of Standards and Technology (NIST) for providing me with no-cost wafer space for the ASIC described in Chapter 6. I am grateful for the help and advice of the staff at the Institute for Research in Electronics and Applied Physics (IREAP) as well as the UMD NanoCenter. And most of all, I thank my family for supporting me in more ways than I can count. This material is based in part upon the work supported by the National Science Foundation (NSF) (Nos. EAR 1417148 and 1909055), as well as the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE 1840340. Our research is partially supported through a DoD contract under the Laboratory of Telecommunication Sciences Partnership with the University of Maryland. ii Table of Contents Acknowledgements ii Table of Contents iii Chapter 1: Introduction 1 1.1 Trends in Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 More Moore . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 More than Moore . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 The Machine Learning Explosion . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Machine Learning Opportunities . . . . . . . . . . . . . . . . . 4 1.2.2 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 2: Machine Learning and Reservoir Computing Theory and Literature 8 2.1 Neural Network Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 The Artificial Neuron . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 The Feed-forward Neural Network . . . . . . . . . . . . . . . . 9 2.1.3 Using a Neural Network . . . . . . . . . . . . . . . . . . . . . 10 2.1.4 Training a Neural Network . . . . . . . . . . . . . . . . . . . . 11 2.1.5 Backpropagation and Gradient Descent . . . . . . . . . . . . . 13 2.1.6 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . 14 2.2 Reservoir Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Comparison to Other RNN Methods . . . . . . . . . . . . . . 26 2.2.2 Reservoir Computing Applications . . . . . . . . . . . . . . . 27 2.2.3 Reservoir Computing Variants . . . . . . . . . . . . . . . . . . 30 2.3 Reservoir Computing Using Physical Systems . . . . . . . . . . . . . 33 2.3.1 Physical Implementation of Reservoirs . . . . . . . . . . . . . 33 2.3.2 Mechanical Reservoirs . . . . . . . . . . . . . . . . . . . . . . 36 2.3.3 Optical Reservoirs . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.4 Electronic Reservoirs . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 3: Boolean Network Theory 41 3.1 Boolean Network Dynmics and Circuits . . . . . . . . . . . . . . . . . 41 3.1.1 Boolean Networks Implemented in Digital Circuits . . . . . . 41 3.1.2 Unclocked Recurrent Boolean Circuits for Reservoir Computing 43 3.1.3 A Simple Unclocked Boolean Circuit: the Ring Oscillator . . . 44 iii 3.1.4 Boolean Networks for Reservoir Computing . . . . . . . . . . 46 3.2 Boolean Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 4: Boolean Reservoir Computing on an FPGA 51 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Prior Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.1 Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.2 Node Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.3 Network Configuration . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Adjustable Network Dynamics . . . . . . . . . . . . . . . . . . . . . . 58 4.4.1 Network Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5 Network Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5.1 Self-Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.2 Transient Time . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.5.3 Selection of Desirable Networks . . . . . . . . . . . . . . . . . 71 4.5.4 Output Distributions and Entropy . . . . . . . . . . . . . . . 75 4.6 Radio Frequency (RF) Classification . . . . . . . . . . . . . . . . . . 77 4.6.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.6.2 Subsampling in Time . . . . . . . . . . . . . . . . . . . . . . . 78 4.6.3 Machine Learning Training Procedure . . . . . . . . . . . . . . 79 4.6.4 Machine Learning Results . . . . . . . . . . . . . . . . . . . . 80 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Chapter 5: Boolean Reservoir Computing Using Discrete Digital Logic Gates on a Printed Circuit Board 90 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2 CMOS Reservoir Circuit Design . . . . . . . . . . . . . . . . . . . . . 90 5.2.1 Why Discrete Digital Logic Chips? . . . . . . . . . . . . . . . 90 5.2.2 Circuit Details . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.2.3 Waveforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2.4 Network activity with no external input . . . . . . . . . . . . 95 5.2.5 Pulse to Pulse Variation . . . . . . . . . . . . . . . . . . . . . 97 5.2.6 Transient Times . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3 Machine Learning Tests . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Chapter 6: Design of a 180nm ASIC for Reservoir Computing 104 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2 Process Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3 Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.4 Node Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.5 Chip Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.6 Speed and Power Estimates . . . . . . . . . . . . . . . . . . . . . . . 110 6.7 Inference chip details . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 iv 6.8 Automated Design Workflow . . . . . . . . . . . . . . . . . . . . . . . 114 6.9 Current Progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.10 Looking Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Chapter 7: Reservoir Computing Applied to Particle Accelerator Beam Tra- jectory Prediction 121 7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.2 Accelerator Description and Simulation Setup . . . . . . . . . . . . . 122 7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Chapter 8: Conclusion 130 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.2 Future Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Appendix A:ASIC Details 133 Bibliography 148 v Chapter 1: Introduction 1.1 Trends in Computing The development of powerful, energy-efficient, and low-cost computers in the last half century brought about the information age, profoundly changing the world we live in. In 2019, 93% of people on Earth lived within reach of a mobile broad- band service, and 53.6% were internet users, limited mainly by the affordability of computing devices in low-income countries [1]. As the affordability gap continues to shrink and a larger fraction of the global population gains access to the internet, demands for connectivity, computing power, and data storage will continue to rise dramatically [2]. Moore?s law ?- the observation that the number of transistors per unit area doubles roughly every two years [3] ?- has long held true, partly because the semi- conductor industry has used it as a performance target. However, fundamental physics limitations to the size of transistors are rapidly approaching, and Gordon Moore himself predicts that his eponymous rule of thumb will stop being valid in 2025 [4]. A related principle, Dennard scaling, observed that as transistors shrink, their voltage and current scale linearly, keeping power per unit area roughly con- stant. Dennard scaling came to an end in the mid-2000s as transistors became 1 small enough for leakage current to become non-negligible, and voltages became low enough to run into threshold voltage limitations. As a result, processor speeds have not exceeded about 4GHz since the mid-2000s. To compensate for stagna- tion in clock frequency, chips have become multi-cored; however not all tasks take advantage of being parallelized [5]. In a rapidly growing digital world, new fundamental computing paradigms are needed to meet increasing global demands for connectivity and number-crunching power. There are several general approaches that development can follow [6], from making the current state of the art more efficient, to changing computing paradigms completely. 1.1.1 More Moore While bottom-up improvements to processors have slowed down, top-down improvements to the computing stack may make computer applications faster in the post-Moore era. Approaches include software performance engineering, hardware streamlining, and new algorithm development. However, as Leiserson et al. argue, these piecemeal improvements are unlikely to create the broad gains in computer performance like bottom-up approaches have accomplished in the last half century [7]. 2 1.1.2 More than Moore Novel devices, processes, and materials may exceed CMOS performance and unlock new regimes of computation. Emerging devices include tunneling FETs, spintronic devices, graphene and carbon nanotube transistors [8]. Nonvolatile mem- ories showing promise include phase change memory, spin-torque transfer memory, and resistive memory [9]. Photonic and quantum computing also fall under the beyond-CMOS umbrella. Most beyond-CMOS approaches are still firmly within the realm of experi- mentation and are far from practical implementation. The beyond CMOS roadmap depends upon the development of new devices and materials, whose wide-scale man- ufacturability remains unproven. 1.2 The Machine Learning Explosion Machine learning (ML), a term popularized in a 1959 paper by computing pioneer Arthur Samuel [10], describes algorithms that give computers the ability to learn patterns and rules in data without being explicitly programmed with those rules. Today, it encompasses many sub-fields including computer vision, natural language processing, and deep learning. It is nearly impossible to escape the in- fluence of machine learning in our technological society ?- it is everywhere from voice-activated virtual assistants, to ads internet users are served, to increasingly popular self-driving vehicles. While there are many algorithms under the machine learning umbrella, neural 3 networks form the basis of deep learning and are arguably the most influential concept. They are a family of algorithms inspired by neurons in the brain. Biological neurons in the brain receive signals from other neurons. When a cumulative signal threshold is surpassed, a spike-like signal is passed to subsequent neurons. The synaptic connections between neurons change as the brain learns, strengthening or weakening based on the relative timing of signals. 1.2.1 Machine Learning Opportunities While machine learning has been studied since the dawn of computers, only in the last decade has it exploded in popularity due to several interrelated factors. The first is the decrease in cost of processors, memory, and network hardware which has enabled the generation and storage of vast quantities of data. Secondly, algorithmic developments allowed for more efficient training of deep neural networks. Finally, the availability of graphics processing units (GPUs), while originally for media pro- cessing and games, has allowed for efficient processing of neural networks as they parallelize matrix multiplication operations. Deep learning has in many ways enabled computers to surpass human abilities. Today, machines can translate language in real time [11], win against humans at games such as Go [12] and Starcraft [13], and autonomously drive vehicles that could make roads safer [14]. Computers have been outperforming human accuracy at the Imagenet image classification challenge since 2015 [15]. It is easy to take for granted the breathtaking progress that machine learning 4 has made, but the current pace cannot continue for much longer. Deep learning comes at a tremendous energy cost: between 2012 and 2018 there was a 300,000 times increase in the amount of computing power state-of-the-art networks required [15]. Deep learning is also nearing its computational limits on present-day hardware: Thompson et al. [16] studied the amount of processing incremental improvements to major ML models will require, and the point of diminishing returns is fast ap- proaching. As the need to improve the efficiency of deep learning has become clear, neural network compression techniques have been developed. For example, quantizing neu- ral networks, representing their internal states with fewer bits to reduce the compu- tational overhead. Also, pruning them, meaning removing insignificant connections to avoid needing to compute them. In fact, entirely binarized neural networks, in which weights are only one or zero, have been developed without much of an accu- racy reduction in some applications [17, 18]. As a major energy cost comes from repeated DRAM accesses, re-using terms in memory as much as possible is another strategy [5]. Specialized neural network hardware is also gaining in popularity. This is es- pecially significant as neural network demands move increasingly to edge devices like smartphones, which have stringent power limitations but abundant machine learn- ing needs [19]. Major companies are investing in dedicated hardware to accelerate their machine learning models, including Google which designed Tensor Processing Units (TPUs) in-house to accelerate the multiply-accumulate operations that many of Google?s services need [20]. Many neural network accelerator startups are ap- 5 pearing to address the needs of industry, including Cerebras 1, Mythic2, Fathom Computing3, Lightmatter4, Lightelligence5, Luminous Computing6, and LightOn7, to name a few. 1.2.2 Our Approach To push the frontiers of machine learning at a time when conventional hard- ware is nearing its limits, completely new computing architectures must be explored which efficiently use hardware resources. We propose an unconventional use of dig- ital logic gates which holds the promise to more densely encode information than conventional digital computation. In our approach, digital logic gate circuits are configured into large random graphs that can exhibit complex dynamics. Rather than the gates explicitly computing pre-programmed instructions, they are used col- lectively as a dynamical system that becomed generally synchronized to input data. The system evolves through time, exhibiting behavior that is dependent upon the external inputs. The state of the system is captured and then applied as an input to a simple readout map. Effectively, the network replaces many layers of a deep neural network, performing all of its operations in parallel. This method is a type of reservoir computing, in which our reservoir is a logic gate network. It can be used for classification, regression, or time series prediction. 1https://cerebras.net/ 2https://www.mythic-ai.com/ 3https://www.fathomcomputing.com/ 4https://lightmatter.co/ 5https://www.lightelligence.ai/ 6https://luminous.co/ 7https://lighton.ai/ 6 In contrast to neural networks with many layers, it uses only a single trained output layer, which allows it to be very fast. 1.2.3 Thesis Overview In this thesis, Chapter 2 explores machine learning and reservoir computing theory and background literature. Chapter 3 discusses background information about Boolean network theory and circuit implementations. Chapter 4 covers our results from studies on Boolean networks implemented on a field-programmable gate array (FPGA), based on designs published in Komkov et al. [21] and experiments in Shani et al. [22] and Komkov et al. [23]. Chapter 5 studies a network implemented using discrete logic gates on a printed circuit board, which allows the full analog behavior of the network to be probed [24]. Chapter 6 describes the design of a monolithic chip implementing the same network architecture, based on designs in patents [25, 26]. Chapter 7 describes the application of software-based reservoir computing to the prediction of a nonlinear particle accelerator time series from the University of Maryland Electron Ring, which was published in Komkov et al. [27]. 7 Chapter 2: Machine Learning and Reservoir Computing Theory and Literature 2.1 Neural Network Theory 2.1.1 The Artificial Neuron Neural networks are machine learning architectures based on the artificial neu- ron. While biological neurons transmit voltage pulses and adjust their connection strengths to other neurons based on relative pulse timing, the artificial neuron is a static concept amenable to computation with combinational logic. A diagram of the artificial neuron is shown in figure 2.1 and it is described by equations 2.1. Figure 2.1: The artificial neuron. Inputs aj are weighted by wjk and summed, then passed through a nonlinear activation function to arrive at output ak. ?n zk = ?( ajwjk), ak = ?(zk), (2.1) i=1 8 The inputs, or input activations of the artificial neuron, denoted aj, are mul- tiplied by weights wjk. All the weighted values are summed to arrive at the inter- mediate value zk. This value is then passed through a typically nonlinear function ?, called the activation function, to compute the neuron?s output activation ak. The activation function is commonly a hyperbolic tangent (tanh), but it can also be a step function (a hard threshold), sigmoidal, linear, a computationally friendly piecewise linear approximation called a rectified linear unit (ReLu), or a number of other options. A nonlinear activation function is needed to solve nonlinear problems; otherwise, the function of the artificial neurons simplifies to matrix multiplication. 2.1.2 The Feed-forward Neural Network Artificial neurons configured into sequential layers, such as in figure 2.2, form a feed-forward neural network (FFNN), a structure sometimes called a multi-layer perceptron (MLP) (though historically, perceptron refers to a neuron specifically with a step activation function). The first layer of neurons is called the input layer, the last layer is called the output layer, and any layers in between are called hidden layers. Deep learning encompasses architectures based in part on feed-forward neural networks, often with many layers ? hence ?deep?. In figure 2.2, a feed-forward neural network with one hidden layer is shown. In this diagram, activations have the subscript i in the input layer, j in the hidden layer, and k in the output layer, and weights are denoted wij and wjk between input and hidden, and hidden and output layers respectively. Equation 2.2 is an output 9 Figure 2.2: A feed-forward neural network with one hidden layer. Neuron activations are denoted ai, aj, and ak for the input, hidden, and output layers, respectively. Weights between them are wij and wjk. The output dependence upon all other parameters is in 2.2. activation written explicitly in terms of all of the other parameters. ? ? ak = ?k( (?j( aiwij))wjk) (2.2) 2.1.3 Using a Neural Network Artificial neural networks can be used for classification or regression problems. Inputs to the neural network are numerical features of data. Outputs represent what we desire to learn from the data. A classifier has as many output neurons as there are classes, and the output activations represent the probability of the input data belonging to a particular category. The outputs of a regressor are continuous values. 10 Forward propagation is the computation of all of the hidden and output neuron activations within the network given some input. For a network with one hidden layer, it would be the evaluation of equation 2.2 for all outputs. Training is the process of adjusting the weights within the neural network to arrive at a desired outcome. 2.1.4 Training a Neural Network Neural networks are trained by example rather than by programming explicit rules. To train the network, first forward propagate training data. Choose an loss function (also called error function or cost function) E, which quantifies how far the output is from a training goal. Then, determine how each of the trainable parameters?the weights in the network?depend upon the loss function by com- puting the gradient of the loss function with respect to each of the weights. Finally, take a step to decrease the loss function by updating each of the weights in the direction of the negative gradient. A popular loss function choice for regression tasks is mean squared error (MSE), which is the sum of squared errors: 1 ?N E = (tk ? a 2k) (2.3) 2N k=1 In this equation, ak are output activations, tk are the target values, and there are N output neurons. A different loss function often used for classification problems is cross-entropy loss, which can be used when output activations and targets are 11 between zero and one, and heavily penalizes outliers: 1 ?N E = aklog(tk) (2.4) N k=1 To compute the gradient of the loss function with respect to the weights, apply the chain rule to expand the partial derivatives. In a FFNN with one hidden layer, the gradient of the output layer can be written: ?E ?E ?ak ?zk = , (2.5) ?wjk ?ak ?zk ?wjk and the gradient of the hidden layer weights can be written ?E ?E ?ak ?zk ?aj ?zj = . (2.6) ?wij ?ak ?zk ?aj ?zj ?wij Each of the intermediate values in these expressions can be computed from equation 2.2 and the definition of the loss function. In some cases the result is very simple, for example in a neural network with linear activations and no hidden layers, using an MSE loss function, the gradient (equation 2.5) reduces to a straightforward expression: ?E = (ak ? t ?k)?k(zk)aj = (ak ? tk)zkaj (2.7)?wjk 12 2.1.5 Backpropagation and Gradient Descent In a process called gradient descent, weights are updated in the negative di- rection of the gradient, which is multiplied by a learning rate ? which determines the size of the step: ?E ?wjk = ?? (2.8) ?wjk The weight updates are computed layer by layer, starting from the output and moving backwards. This algorithm is called backpropagation and was introduced in the 1980s [28]. As the backpropagation routine is computationally intensive, it is advanta- geous to forward propagate batches of data and perform weight updates based on averaged results, rather than doing so one sample at a time. This is called batch gradient descent. On the other hand, batch size is limited by memory, and the whole dataset usually cannot be forward propagated all at once. Using sub-samples of the dataset to update the network is called stochastic gradient descent (SGD) as the trajectory through the error landscape is stochastic, depending upon the random subsamples of data in the batches. To solve a machine learning problem, the choice of algorithm and its related hyperparameters require careful consideration. Hyperparameters are variables af- fecting the a machine learning model?s structure, and are usually not learned but rather set by the designer. In a feedforward neural network, they would include the number of layers, the size of the layers, the choice of activation function, choice of 13 error function, and the choice of optimization algorithm and its configurable param- eters such as learning rate. There are many pitfalls that one can encounter while iteratively optimizing a high dimensional objective function. For example, using too small a learning rate, it?s possible to get stuck in a local error minimum or a plateau; too large of a learning rate, and the steps can completely skip over or climb away from the global minimum. Modern optimizers use adaptive learning rates for this reason. In fact, pure stochastic gradient descent as described in this section is rarely used in practice in such a simple form, but it is the basis for the more sophisticated optimization techniques commonly found in machine learning software packages today, such as Adam [29]. Another issue that deep neural networks encounter is the vanishing gradient problem, where the gradient becomes smaller and smaller as it passes backwards through the layers, and weights at the front of the network are practically no longer updated. A related issue is the exploding gradient problem, when the gradients are too large and result in an untrainable network. These are significant issues in neural networks with many hidden layers, and also in recurrent neural networks. 2.1.6 Recurrent Neural Networks Recurrent neural networks (RNNs) are used for processing temporal or sequen- tial information, where information in each time step has dependencies on activity of the past. There are many recurrent neural network architectures, all having in 14 common neurons, or groups of neurons, with feedback connections, as contrasted to the strictly feed forward designs discussed in the last sections. Recurrence gives the neural network memory, as information can circulate in a loop and persist. The predominant approach to training recurrent neural networks is back prop- agation through time (BPTT), in which the state of the system at every timestep is unfurled into a feed-forward configuration as shown in figure 2.3. Then, conven- tional gradient descent and backpropagation is used to train the weights. BPTT is computationally intensive and notoriously suffers from exploding and vanishing gradients that make training unstable. Figure 2.3: Unfolding of a recurrent neural network, with state xt and input ut, for backpropagation through time. Today?s most commonly used RNN architectures, long short term memory (LSTM) and the simpler gated recurrent unit (GRU), overcome the instability of BPTT by using much more complicated base units than the plain artificial neuron. Both of their recurrent computational units have the ability to keep or forget in- 15 formation, allowing them to retain long-term temporal associations in a way that a simple RNN cannot. These architectures have made great strides in fields reliant upon sequential data such as natural language processing. However, the LSTM and its variants are cumbersome to use and train, and are very computationally demanding. 2.2 Reservoir Computing A specialized neural network architecture that has shown promise in rapid time-series prediction is called the reservoir computer (RC), shown in figure 2.4. It is a machine learning model independently proposed around the same time as an echo state network (ESN) by Jaeger [30] using conventional artificial neurons, and as a liquid state machine (LSM) by Maass et al. [31] using more biologically plausible spiking neurons. Reservoir computing is a promising architecture for a variety of time series prediction tasks that rely on short-term associations. Figure 2.4: The reservoir computer. The reservoir is the fixed, randomly initialized recurrent neural network whose internal states at time n are X~n. The input layer Win is also fixed and randomly set. The output layer Wout is trained using a simple rule such as linear regression. Reservoir is the term for the RNN inside of the RC, which has random connec- 16 tivity and random weights. To avoid the computationally intensive step of training the weights W within the RNN, the reservoir is left fixed, and its state is put into an output layer whose weights Wout are trained. As are many machine learning meth- ods, reservoir computing is loosely brain-inspired; there is evidence that neuron groups in the prefrontal cortex [29] and the cerebellum [30] are reservoir-like. The internal reservoir states at timestep n can be described as follows: ( ) X~ (n) = f W U~in (n) +WX~ (n? 1) (2.9) where f is the transfer function of each node, often chosen to be a tanh function which constrains the node output to within [-1,1]. Win is uniformly sampled from [??, ?], where ? is one of the hyperparameters used in the designs, determines the strength of the input(s) and which nodes they are coupled to, which can be all of the nodes, or a subset of them. The external input U(~in) can be univariate or multivariate. The adjacency matrix W is populated by random numbers in the range [-1,1] and has both weights and connectivity information. Some implementations also vary the timescale of internal reservoir activity using leakage parameter ?, which determines how much of the reservoir state is carried over to the subsequent timestep: X~ (n) = (1? ?)X~ (n? 1) + ?f(W ~ ~inU(n) +W (X)(n? 1)) (2.10) The reservoir output Y is simply the reservoir state multiplied by the output 17 layer weights: Y~ (n) = W ~outX(n) (2.11) Any learning rule can be applied to calculate the weights of Wout in training. The simplest is linear regression, which minimizes the mean squared error and has the a closed-form solution: W ~ T ~ ?1 ~ T ~out = (X X) X Y (2.12) Alternatively, the output weights may be updated iteratively using gradient descent using equation 2.7. To avoid overfitting, Tikhonov regularization may be used, in which the L2 norm of the coefficients is added to the cost function, and the closed-form solution becomes: W ~ Tout = (X X~ + ?I) ?1X~ TY~ (2.13) where ? is the Tikhonov parameter which determines the magnitude of the smoothing effect, and I is the identity matrix. As an alternative to adding a penalty to the cost function to prevent overfitting, noise may be added to the training data. The reservoir computer is often used for time series prediction, but can also be used for classification tasks. For time series prediction, the training target is the value of the series one step ahead, X~n+1. After sufficient training steps, when the 18 reservoir has adequately learned the transfer map from one time step to the next and the training loss is low enough, the output Y~n is fed back into the reservoir input for a free-running prediction, as shown in figure 2.5. Figure 2.5: Reservoir configured for free-running prediction. 2.2.0.1 Reservoir Hyperparameters In a software reservoir computer, the main hyperparameters are: ? Reservoir size, N : the number of neurons in the recurrent neural network. N should be chosen so that it is much larger than the number of inputs. ? Connection probability, p: the probability of two nodes being connected. ? Input scaling coefficient, ?: the amount by which inputs are scaled. ? Spectral radius, ?: the magnitude of the largest eigenvalue of the adja- cency matrix. This determines the overall magnitude of activity within the network. After random initialization of W , the entire matrix is scaled so that the spectral radius is the desired value, typically slightly below 1. ? Memory leakage parameter, ?: a scalar determining how much of the previous state of the network is retained in every step. 19 ? Tikhonov regularization parameter, ?: A penalty added to the loss func- tion to prevent the regression algorithm from achieving the global minimum, which would cause overfitting. The reservoir is a complicated nonlinear structure for which there is little in the way of mathematical understanding to systematically guide choices of hyper- parameters. Why a particular setup yields a good or poor result on a particular machine learning problem remains an active research area. For example, a study of the effect of spectral radius on the prediction of several complex nonlinear systems found that there was an optimal range for ?, but admitted no analytical insight as to why those values were found [32]. To illustrate the interrelated effects of hyperparameters, the following figures (2.6, 2.7, and 2.8) are the states of a 100-node reservoir as it evolves through 100 timesteps with no external forcing, meaning that there is no external input data. In each of the figures, connectivity is fixed, connection probability p is fixed, and memory leakage parameter ? and spectral radius ? are set to three different values. The left-hand side column of each figure represents the states of each of the 100 nodes in the network, and in each figure this starting point is the same. The network evolution through time can be seen in the figure from left to right. General patterns emerge from examining the networks? self-excited behavior. Reservoirs with a greater leakage parameter ? retain less memory of prior net- work states, thereby having faster temporal dynamics than networks with smaller ?. Smaller ? means a longer settling time in networks that settle into a fixed state, 20 or lower frequency in networks that settle into a periodic pattern. A larger spectral radius ? of the adjacency matrix results in greater overall network activity. A larger connection probability p typically results in steady states of larger magnitude while small values of p generally ensure that the network activity dies down to zero. However, all of these observations are only guiding principles, and networks with different adjacency matrices can exhibit very different behavior. In practice, a particular choice of parameters must be empirically tested to determine whether a reservoir will perform well at a certain task. 21 Figure 2.6: Self-activity of a 100-node software echo state network with connection probability p=0.01 with various spectral radius ? and memory parameter ? values. The vertical axis is node number, and the horizontal axis is time. 22 Figure 2.7: Self-activity of a 100-node software echo state network with connection probability p=0.05 with various spectral radius ? and memory parameter ? values. The vertical axis is node number, and the horizontal axis is time. 23 Figure 2.8: Self-activity of a 100-node software echo state network with connection probability p=0.5 with various spectral radius ? and memory parameter ? values. The vertical axis is node number, and the horizontal axis is time. From the perspective of data processing, there are several complementary in- terpretations of the reservoir?s function: ? As short-term memory. Due to recurrence within the reservoir and the scaling of the spectral radius to be slightly below 1, information circulates for some time before being reduced asymptotically to zero. This is the reason 24 for the ?echo state? name ?? the network has fading ?echoes? of the previous inputs. Jaeger et al. [33, 34] subjected ESNs to various benchmarking tasks and characterized the networks? memory capacity, finding that the number of timesteps of memory was generally ? N. The reservoir?s function goes beyond simple memory, however: in a comparative study, the ESN outperformed a tapped delay line with no recurrence [35]. ? As a kernel expansion of features into higher-dimensional state space. Ker- nel methods are often used in combination with support vector machines or principal component analysis, to transform data into a space in which it be- come linearly separable. These methods use a mathematical trick that avoids explicit computation in the higher-dimensional space. Popular kernels include Gaussians, radial basis functions, or polynomials. Choosing which kernel to apply to data is often a matter of educated or blind guessing. The reservoir effectively acts as a random temporal kernel, nonlinearly casting the input and its recent history to a much higher-dimensional representation. The lin- ear readout then chooses which parts of the transformation produce a useful result. Maass et al. [36, 37] explore the kernel properties of LSMs. They pro- pose that the separation of inputs that a neuronal circuit creates could be an empirical measure for its quality as a kernel ? in other words, the complexity and diversity of nonlinear operations carried out by a neuronal circuit on its input stream in order to boost the classification power of a subsequent linear classifier. 25 ? As a universal function approximator. The universal approximation theo- rem is often cited as the reason for the wide-ranging utility of neural networks: it states that any continuous real-valued function can be approximated by feed-forward neural networks with nonlinear activation functions. In 1989, Cybenko showed that this holds true for networks of arbitrary width with only a single hidden layer, with sigmoidal activations [38]. In 1991, Hornik generalized the result to networks with any smooth nonlinear activation func- tion [39]. Recently, it was shown that any fading memory system in discrete time can be realized as a simple finite dimensional neural network-type state- space model with a static linear readout map, and that echo state networks are also universal uniform approximants [40]. 2.2.1 Comparison to Other RNN Methods As mentioned previously, notable competing methods to reservoir computing are long short-term memory (LSTM), and gated recurrent units (GRUs), both of which are trained with the backpropagation through time (BPTT) algorithm. While LSTMs and GRUs work very well for some tasks, reservoirs outperform them in dynamical system prediction. Gallicchio et al. found that ESNs and a sequence of serial ESNs (which they call DeepESN, also described in section 2.2.3.3) provided comparable accuracy as LSTMs and GRUs on a polyphonic music prediction task, while requiring much less computation time on the same hardware [41]. Figure 2.9, reproduced from [42], compares RC, GRU and LSTM networks for dynamical 26 system prediction tasks, and found that reservoir computing resulted in longer valid predictions after vastly less training time and less RAM used. The short training time in particular makes reservoir computing shine in adaptive control scenarios when the reservoir needs to be quickly re-trained to learn the dynamics of new data. Figure 2.9: Comparison of RC, LSTM and GRU networks for dynamical system prediction tasks. (a) shows the amount of time for which the network makes valid predictions after being trained. (b) shows the training time, and (c) shows the average RAM requirement to run the networks. Reproduced from [42]. 2.2.2 Reservoir Computing Applications The area in which reservoir computing has yielded the most impressive results is in the modeling of dynamical systems. It indeed makes sense to use a dynamical system to simulate a dynamical system. In particular, the results of modeling chaotic systems with reservoirs have shown success. Chaotic systems can often have simple underlying equations, but exhibit extraordinarily complex behavior. They are characterized by sensitivity to initial conditions and having behavior that can appear to be random, but is in fact deterministic. Complex behavior comes about from thorough topological mixing and dense periodic orbits in phase space. 27 Reservoirs have been used with much success for state of the art chaotic time series prediction [43], replication of chaotic attractors [44], and chaotic time series separation [45]. Notable chaotic time series prediction results are shown in figure 2.10. That is a prediction test of the Kuramoto-Sivashinsky system, described by the following partial differential equation (which has an additional spatial inhomogeneity term): 2?x yt = yyx + yxx ? yxxx + ?cos( ) (2.14) ? Twenty four time series of the spatiotemporal system are plotted in figure 2.10, as well as the free-running reservoir prediction of the system and the difference between the prediction and the solution. The horizontal axis is in units of Lyapunov time, which is a measure of the exponential divergence of the trajectories. In addition to making accurate chaotic time series predictions for more Lya- punov times than any other existing method, the reservoir can capture the overall behavior of the system, producing a time series that looks realistic even after it departs from the true solution (contrast this to the solution diverging or going to zero). This accurate capture of the climate of the system enables reconstructing the spectrum of Lyapunov exponents and understanding the attractors of the sys- tem?straightforward operations when the equations of motion are accessible, but extremely difficult from finite time series for systems such as Kuramoto-Sivashinsky. Reservoir computing has also been used for the following non-exhaustive list of applications: 28 . Figure 2.10: Spatiotemporal plots of the Kuramoto-Sivashinsky (KS) system and reservoir prediction. (a) Is the KS system, (b) is the reservoir prediction, and (c) shows the difference between them. The x-axis is in units of Lyapunov times. The Lyapunov time is a measure of the departure of the system from initial conditions by a factor of e. Reproduced from [43] ? Wireless signal recovery [46] ? Seizure detection from EEG signals [47] ? Pattern and signal generation [48] ? Phoneme recognition [49] ? Million words per second spoken digit classification [50] ? Control systems [51, 52] ? Human action recognition [53] ? Image recognition and radio transmitter classification from time series [22] ? Particle accelerator beam trajectory prediction [27], also described in this the- 29 sis. 2.2.3 Reservoir Computing Variants 2.2.3.1 Single-node time-delay reservoirs A single processing element with time-delayed feedback forms a delay-dynamical system that can be used as a reservoir. First proposed by Appeltant et al., [54], the setup relies on time-multiplexing with a single neuron, rather than the use of multiple neurons. The neuron?s output is fed through a delay, and then back into the input to be combined with new data. In that way, the neuron?s state at any time contains traces of the entire history of the signal. Neuron states throughout time are saved and then put through an output layer. This differs from a tapped delay line because the data is put through the nonlinear processing element multiple times, and allowed to mix in time. Photonic reservoirs are usually of this type, as electron-photon conversion is inefficient so processing element reuse makes sense. Additionally, optical processing often happens faster than I/O making time multi- plexing practical. Hart et al. [55] and Brunner et al. [56] analyze the dynamics of some time-delay optical reservoirs. 2.2.3.2 Parallel reservoirs Some spatiotemporal systems, including the Kuramoto-Sivashinsky system, have strong local interactions. Figure 2.11 shows parallel reservoirs that take inputs from a sliding window of the input time series. Taking inspiration from the slid- 30 ing window of a convolutional neural network, this configuration emphasizes local interactions between points in space or time. Figure 2.11: Parallel reservoirs taking inputs from a sliding window over the time series. This configuration emphasizes local interactions between points in space or time. Reproduced from [42]. 2.2.3.3 Deep reservoirs Figure 2.12: Serial reservoirs, called DeepESN, Reproduced from [41]. Reservoirs arranged in a serial configuration, with outputs of one going into inputs of another, are called deep reservoirs or deepESNs. The outputs of all reser- voirs are used to compute the result. Gallicchio et al. [41] explored the properties of several different deepESN configurations, finding some advantages in accuracy 31 and better processing at different time scales, at the expense of introducing more complexity to the model. 2.2.3.4 Extreme Learning Machine A related architecture is the extreme learning machine (ELM), which is not a reservoir as it lacks recurrence, but it is rather a shallow feed-forward network also exploiting random initialization [57]. The ELM has inputs connected to a single layer of hidden neurons via random weights. The weights between the hidden neurons and output neurons are then trained. Linear output neurons enable linear regression to be used, which makes the training process very fast. Despite their simplicity, ELMs and their variants perform comparably to deep neural networks while requiring much less training time [57, 58]. Figure 2.13: The extreme learning machine (ELM). Input-to-hidden connections are randomly chosen, and hidden-to-output weights are learned. Linear output neurons make training very fast. 32 2.3 Reservoir Computing Using Physical Systems The behavior of real-world systems can be used for information processing. Reservoir computing is arguably an ideal use case for leveraging the properties of physical nonlinear dynamical systems to perform computation. The reservoir?s function is to achieve a higher-dimensional representation of input data, mixed in time, and how exactly it achieves that goal does not necessarily matter. In software it is done by setting up a recurrent neural network with artificial neurons and explicitly computing each node?s value, step by step. Nonlinear physical systems, subject to nonlinear differential equations governing their behavior, have been shown to also effectively act as reservoirs, with the promise of exceeding software models in speed and power. A number of physical systems have been studied for reservoir computing, in- cluding memristor, spintronic, optical, and soft body reservoirs [59]. However, com- paratively evaluating the performance of physical substrates is an active area of study. In this section we give an overview of the current research. 2.3.1 Physical Implementation of Reservoirs Theory predicts that any physical system that contains the following properties can function as a reservoir [59]: ? high dimensionality, ? separation of inputs, 33 ? fading memory, ? nonlinearity, ? deterministic dynamics. 2.3.1.1 High Dimensionality As discussed previously, the reservoir nonlinearly casts a spatiotemporal signal to a higher-dimensional state space to enhance the function of a simple classifier. A physical dynamical system can perform this function as long as the dynamical dimensionality of the system is sufficiently large and repeatable to yield a useful kernel transformation. 2.3.1.2 Separation of Inputs Separation of inputs means that slightly different inputs result in distinguish- able network states. For this reason, reservoirs are designed to operate at the edge of chaos or near a phase transition: inputs push the network over the edge into a state of heightened activity whose state is very sensitive to inputs. The term edge of chaos emerged from the study of cellular automata, where it was found that conditions for information transmission, storage, and modification, are achieved in the vicinity of a phase transition between high and low activity [60]. In fact there is evidence of brain similarly operating at a point of dynamic instability, which in neuroscience is called the critical brain hypothesis [61]. 34 2.3.1.3 Nonlinearity Finally, nonlinearity in the system is necessary for the approximation of non- linear functions. In a physical system, the nonlinearity can come about from effects such as higher order terms in a spring constant, an irregularly shaped optical cavity, or chaotic mixing in the time domain. 35 2.3.2 Mechanical Reservoirs Figure 2.14: Example of a soft body being used as a reservoir. An actuator at the top moves the silicone tentacle, and the readout map is applied to bend sensors along its length. Reproduced from Nakajima et al. [62]. Mechanical systems are attractive candidates for morphological computation: the outsourcing of processing to inherent functions of the body [63]. The theory 36 of how mass-spring systems could be used for reservoir computing was developed by Hauser et al. [64]. Dion et al. demonstrated spoken word recognition with a micro-electromechanical cantilever exhibiting nonlinearities [65]. Tensegrity struc- tures have been controlled by using their own actuator readouts as reservoir node states [66]. Perhaps the least practical, but most amusing example is the paper by Fernando et al. entitled ?Pattern Recognition in a Bucket?, in which wave patterns in a bucket of water were used as a reservoir [67]. Soft robotics takes inspiration from the highly compliant materials that abound in biology to create flexible and adaptable robots. Controlling the many degrees of freedom of a soft body is a challenge for locomotion and control. However this can be repurposed as a computational resource. Figure 2.14 shows a silicone tentacle as a reservoir computer, which was able to perform a nonlinear auto-regressive moving average (NARMA) time series benchmarking task [62]. A motor at the top of the body provided the input stimulus, and bend sensors along the body were used as readouts of the reservoir state. 2.3.3 Optical Reservoirs Optical computers have been considered since the invention of the laser in the 1960s. Substantial challenges to optical computing must be overcome if it will ever be a serious contender to electronic processors: electron to photon conversion (and vice versa) is quite inefficient; an optical transistor that doesn?t distort data remains elusive; optical memory is nonexistent. However, optical processing is attractive for 37 Figure 2.15: A schematic of an optoelectronic time-delay reservoir implementation, reproduced from Hart et al. [55]. neural network computation as operations on a light beam can be highly parallel, parts of the beam can travel without crosstalk, and linear operations can be per- formed with minimal energy cost [68]. It can be argued that the only way machines will ever approach the cognitive capacity of the human brain is via photonic inter- connects which efficiently fan out neuron outputs to thousands of other neurons [69]. Photonic neural network accelerator chips have been explored for decades [70]. Optical RCs have been made using networks of waveguides, splitters, and com- biners; using signal-mixing cavities [71]; diffractive resonator mixing laser signals [72]; systems of nonlinearly coupled lasers [73], and single-node time-delay feed- back loops [50, 55, 65]. Figure 2.15, reproduced from [55], shows an optoelectronic time-delay reservoir implementation using a Mach-Zender modulator, which per- forms amplitude modulation on an optical signal. Laser light enters the modulator, whose gain is controlled by the voltage x(t). The light is then passed through an optical fiber delay line with delay ?D. The photodiode converts the light back to an 38 electronic signal, which amplified by gain ?. The amplifier output is then used to modulate the Mach-Zender interferometer, completing the feedback loop. 2.3.4 Electronic Reservoirs Electronic reservoirs arguably have the greatest potential to be easily adopted into existing hardware workflows as machine learning accelerators if they do not rely on exotic materials or novel manufacturing methods. Some prior work has focused on more efficiently implementing either static or spiking artificial neurons in dedicated hardware. For example Schrauwen et al. [74] who used a FPGA to implement a liquid state machine with spiking neurons and Canaday et al. [51] who used an ESN-variety reservoir computer implemented on an FPGA. Penkovsky et al. [75] performed a genetic algorithm optimization of a spiking reservoir before transferring the configuration to an FPGA for fast processing. Other electronic approaches instead used single nonlinear elements in a time- delay reservoir approach. An example in hardware was performed by Soriano et al. [76] who simulated the Mackey-Glass system using a single BJT as a nonlinear element, with a DAC to convert the input to analog, then an ADC at the end to convert back to digital for output layer computation on a computer. Jensen and Tufte [77] used a Chua oscillator, a simple chaotic circuit with only four components, and applied a driven input. To make a system with very low hardware resource needs, Alomar et al. [78] took a probabilistic computing approach and implemented a stochastic reservoir computer on an FPGA. 39 Yet other implementations used the behavior of electronic networks for reser- voir computing, without specifically using artificial neurons. Haynes et al. [79] used a single XOR gate in a feedback configuration in a time-delay reservoir. Canaday et al. and Shani et al. have both used FPGA-based dynamical networks as reservoirs, demonstrating success in time series classification and prediction [22, 80]. Recently, Komkov et al. [23] demonstrated radio frequency classification with about one-tenth of the trainable parameters of a state of the art convolutional neural network using careful preprocessing and an FPGA-based Boolean reservoir. 40 Chapter 3: Boolean Network Theory 3.1 Boolean Network Dynmics and Circuits The field of network science studies networks of all kinds, such as social net- works, neural networks, communication networks, and biological networks, to name a few. Boolean networks were originally proposed in the 1960s by Kauffman as a very simple dynamical model of gene expression, in which genes can be either on or off, and are affected by neighboring genes. Kauffman investigated the properties of randomly connected Boolean networks, such as the length of cycles within the network, and drew parallels with cell replication cycles [81]. More sophisticated mathematical formalism to describe Boolean networks was developed later [82, 83]. Boolean networks can evolve in discrete steps, or be continuously updated, which has implications for their behavior and the quality of the analogy they make to real-world systems. 3.1.1 Boolean Networks Implemented in Digital Circuits While ideal Boolean networks can exhibit complex and emergent behavior, they obey the well-behaved rules of Boolean algebra. The same networks imple- 41 mented in circuitry are subject only to the laws of physics, which can make their behavior much less Boolean. A digital gate has a finite rise and fall time, a possibly nonlinear transfer function, and will cause a pulse to incur delay and distortion as it travels from input to output. If the duration of the pulse is comparable to the gate?s delay time, the pulse may spend most of its time in the intermediate voltage range, an ambiguous logical region. Figure 3.1: Nonidealities of real Boolean logic gates. Physical devices have rise and fall times, and large but finite amplitude gains. They also have a low-pass filtering effect, and pulses that are too short will be rejected. For computer processors to have high reliability and noise immunity, they use the digital logic abstraction, which discretizes the available voltage range into only high and low values. A voltage below a certain threshold represents a zero, and above another threshold represents a one, a and voltage in the ?forbidden region? in between is considered ambiguous. To ensure robustness to variations in delay times of logical blocks, digital computing usually uses synchronous logic, reading the outputs of combinational logic blocks upon transitions of a global clock. While clocking ensures robustness to timing variations and prevents race conditions, it slows down overall computation. 42 3.1.2 Unclocked Recurrent Boolean Circuits for Reservoir Comput- ing Because machine learning algorithms are trained from real-world examples rather than by using explicit rules, these algorithms are good candidates to be accelerated in hardware in which results are not always repeatable. In machine learning, training data is often inherently noisy or distorted when it comes from physical measurements (in fact, a method of preventing overfitting in a machine learning model is to add noise to its training data). Because they are robust to noise, machine learning algorithms can take advantage of denser information encoding that comes from quantizing the available voltage range more finely. Beyond the dynamics of pure Boolean networks, circuit implementations intro- duce further useful time delays and nonlinearities. As is shown in figure 3.1, a pulse traveling through a digital logic gate will incur a time delay which gives the network some useful memory. The signals may also become distorted, depending upon the gate?s voltage transfer function and transition times. These effects strongly depend upon the supply voltage and network topology and sensitivity. A Boolean logic gate reservoir is made by connecting digital logic devices into recurrent configurations and letting them run freely. Signals in the reservoir have frequencies determined by the device delay time, and the voltage spends much of its time in the intermediate region between 0 and the supply voltage. Unclocked recurrent digital logic networks are of particular interest for reser- voir computing because their continuous-time dynamics result in an infinite number 43 of available states. Thus, they may exhibit chaos. Clocked recurrent digital logic networks have only a finite number of available states, and therefore must be either stationary or periodic. A chaotic system has a strong dependence upon its initial conditions, meaning that two slightly different inputs will result in a large difference in the system?s state as it evolves through time. This can be exploited for machine learning classification problems, in which the output map applied to the reservoir state (or a subset thereof) must distinguish between classes. 3.1.3 A Simple Unclocked Boolean Circuit: the Ring Oscillator The simplest example of a unclocked recurrent digital logic gate circuit is a sin- gle inverter (logical NOT) with feedback. The input connected to the output results in a logical contradiction (TRUE cannot equal FALSE), however when implemented as a circuit, a TRUE value at the input appears as a FALSE value at the output only after the high-to-low transition time. Likewise, FALSE at the input appears as TRUE on the output only after a low-to-high transition time. As a result, a single inverter with feedback creates a ring oscillator whose period is the average of the two transition times?in the parlance of nonlinear dynamics, this is a limit cycle attractor. In practice a single inverter with feedback may also get stuck at half the supply voltage?a fixed point attractor. Chaos has even been exhibited in simple ring circuits [84]. Boolean network circuits in the chaotic regime have been studied as sources of random numbers [85, 86]. A ring oscillator may be built out of any 44 odd number of inverters; a ring oscillator with three inverters is shown in figure 3.2. Figure 3.2: A ring oscillator, the simplest unclocked Boolean network, can be made with a chain of an odd number of inverters.. A ring oscillator exhibits regular periodic behavior, but circuits that are only slightly more complicated can yield rich dynamical behavior including chaos. Zhang et al. constructed the circuit in figure 3.3 using just three two-input Boolean logic gates. Node 1 is an XOR gate taking inputs from nodes 2 and 3. Nodes 2 and 3 are XOR and XNOR, respectively, taking one input from node 1?s output, and the second input being a feedback connection. The uneven switching times of the logic elements, and the presence of feedback connections, cause thorough topological mixing of the circuit states, resulting in chaotic behavior. As the transfer function and delay times of the gates are dependent on the power supply voltage, the circuit voltage is a parameter that can be used to vary circuit behavior. Figure 3.3: 3-element Boolean chaos circuit. (a) state diagram showing transition times ?ij between each node. (b) schematic representation (c) photo of the circuit. Reproduced from Rosin [87]. 45 Figure 3.4: Chaotic waveform and broad frequency characteristics that the circuit in the preceding figure produced. Reproduced from Zhang et al. [88]. Unclocked Boolean networks implemented on FPGAs have been used to study chaos [89], oscillator synchronization [90], and chimera states (coexistence of syn- chronized and desynchronized regions) [91]. They have also been used for ultrafast generation of random numbers [92]. Because the behavior of Boolean networks at the edge of criticality is heav- ily dependent upon device-specific gains and delays, as well as temperature and hysteresis, circuit simulation often does not yield an accurate result. 3.1.4 Boolean Networks for Reservoir Computing The rich dynamics of Boolean networks make them an attractive candidate for reservoir computing. Boolean networks stand out from other hardware reservoir implementations because of their potential for seamless integration into existing manufacturing workflows for digital electronics. Whether the reservoir is a block of Verilog that?s loaded into an FPGA, a chip placed on a motherboard, or a design directly on silicon, Boolean networks in CMOS fit naturally into today?s technology, and have the potential for quick adoption if proven to be successful machine learning accelerators. 46 3.2 Boolean Sensitivity Boolean sensitivity describes how excitable a Boolean logic gate is by its inputs. The Boolean sensitivity of a single gate (Sg) is defined as the proportion of 1-bit transitions in the input that result in a change in the output. This may be most easily understood by looking at the transition graphs, such as in figures 3.5, 3.6, and 3.7 for two, three, and four-input gates, respectively. Taking a look at 2-input gates, for example, the change of inputs from (0,0) to (0,1) results in the transition of an 2-input OR gate?s output from 0 to 1, while the transition (0,1) to (1,1) does not result in a change in output state. Two out of four possible 1-bit changes in 2-bit input states result in a change in output state, and therefore the 2-input OR gate?s sensitivity is 0.5. Average Boolean sensitivity (S) is a parameter describing the excitability of a Boolean network, and is the average of the individual gate sensitivities. 47 NAND graph, 2 inputs NOR graph, 2 inputs In:(1,0),Out:1 In:(1,0),Out:0 In:(0,0),Out:1 In:(1,1),Out:0 In:(0,0),Out:1 In:(1,1),Out:0 XOR graph, 2 inputs In:(1,0),Out:1 In:(0,1),Out:1 In:(0,1),Out:0 In:(0,0),Out:0 In:(1,1),Out:0 In:(0,1),Out:1 Figure 3.5: Transition graphs for 2-input gates. The sensitivities of these 2-input gates are: SNAND = 0.5, SNOR = 0.5, SXOR = 1. 48 NAND graph, 3 inputs NOR graph, 3 inputs In:(1,1,0),Out:1 In:(1,1,0),Out:0 In:(0,1,0),Out:1 In:(0,1,0),Out:0 In:(1,0,0),Out:1 In:(1,0,0),Out:0 In:(1,1,1),Out:0 In:(1,1,1),Out:0 In:(0,0,0),Out:1 In:(0,0,0),Out:1 In:(0,1,1),Out:1 In:(0,1,1),Out:0 In:(1,0,1),Out:1 In:(1,0,1),Out:0 In:(0,0,1),Out:1 XOR graph, 3 inputs In:(0,0,1),Out:0 In:(1,1,0),Out:0 In:(0,1,0),Out:1 In:(1,0,0),Out:1 In:(1,1,1),Out:1 In:(0,0,0),Out:0 In:(0,1,1),Out:0 In:(1,0,1),Out:0 In:(0,0,1),Out:1 Figure 3.6: Transition graphs for 3-input gates. The sensitivities of these 3-input gates are: SNAND = 0.25, SNOR = 0.25, SXOR = 1. 49 NAND graph, 4 inputs NOR graph, 4 inputs In:(0,0,1,0),Out:1 In:(0,0,1,0),Out:0 In:(1,0,1,0),Out:1 In:(1,0,1,0),Out:0 In:(0,1,1,0),Out:1 In:(0,1,1,0),Out:0 In:(1,1,1,0),Out:1 In:(1,1,1,0),Out:0 In:(0,0,0,0),Out:1 In:(0,0,1,1),Out:1 In:(0,0,0,0),Out:1 In:(0,0,1,1),Out:0 In:(1,0,0,0),Out:1In:(1,0,1,1),Out:1 In:(1,0,0,0),Out:0In:(1,0,1,1),Out:0 In:(0,1,0,0),Out:1In:(0,1,1,1),Out:1 In:(0,1,0,0),Out:0In:(0,1,1,1),Out:0 In:(1,1,0,0),Out:1 In:(1,1,1,1),Out:0 In:(1,1,0,0),Out:0 In:(1,1,1,1),Out:0 In:(0,0,0,1),Out:1 In:(0,0,0,1),Out:0 In:(1,0,0,1),Out:1 In:(1,0,0,1),Out:0 In:(0,1,0,1),Out:1 In:(0,1,0,1),Out:0 In:(1,1,0,1),Out:1 In:(1,1,0,1),Out:0 XOR graph, 4 inputs In:(0,0,1,0),Out:1 In:(1,0,1,0),Out:0 In:(0,1,1,0),Out:0 In:(1,1,1,0),Out:1 In:(0,0,0,0),Out:0 In:(0,0,1,1),Out:0 In:(1,0,0,0),Out:1In:(1,0,1,1),Out:1 In:(0,1,0,0),Out:1In:(0,1,1,1),Out:1 In:(1,1,0,0),Out:0 In:(1,1,1,1),Out:0 In:(0,0,0,1),Out:1 In:(1,0,0,1),Out:0 In:(0,1,0,1),Out:0 In:(1,1,0,1),Out:1 Figure 3.7: Transition graphs for 3-input gates. The sensitivities of these 3-input gates are: SNAND = 0.125, SNOR = 0.125, SXOR = 1. 50 Chapter 4: Boolean Reservoir Computing on an FPGA 4.1 Introduction Field-programmable gate arrays (FPGAs) are flexible, reconfigurable plat- forms in which digital logic circuits can be implemented using a hardware description language (HDL). They are often used for applications in which a device needs to be reconfigured in the field, for production parts in quantities too small to warrant manufacturing a custom chip, or for prototyping. At the core of the FPGA are look-up tables (LUTs), which are configurable truth tables whose logical function can be changed with a setting loaded into an SRAM bank. Between the LUTs are programmable interconnects. A system on a chip (SoC) contains both a mi- croprocessor and programmable logic on the same chip and a multiprocessor SoC (MPSoC) contains more than one processor, to combine the functionality of a ded- icated processor and the flexibility of FPGA programmable logic. Figure 4.1 shows a high-level overview of how LUTs and interconnects are used to create a particular reservoir configuration. In Chapter 2, we described reservoir computing: the idea of using a recur- rent neural network with fixed weights and fixed connectivity followed by a simple trained readout layer. In Chapter 3, we introduced the concept of using recur- 51 rent Boolean logic circuits for reservoir computing. In this chapter, we test both clocked and unclocked recurrent Boolean circuits on an FPGA for machine learning applications1. It is worth repeating: our implementation is not a traditional recurrent neu- ral network, as one would find in a software reservoir, streamlined in the FPGA?s programmable logic. Rather, we are configuring the digital logic gates to function as a dynamical system exhibiting complex behavior, and exploiting the dynamics to perform information processing. 4.2 Prior Art The collective properties of autonomous (unclocked) Boolean networks on a field-programmable gate array (FPGA) were studied and applied to time series classification using machine learning in Shani et al. [22]. We extend that work with a more detailed examination of the parameter space of similar networks that are both unclocked and clocked, and their implementation on faster hardware, the Xilinx Zynq Ultrascale+ MPSoC. Previous works [22, 80] have both studied autonomous Boolean networks on FPGAs for reservoir computing, demonstrating RF classification and time series prediction, respectively. Haynes et al. [79] studied time-delay reservoirs using a single XOR gate as a node. Other FPGA implementations of reservoirs have fo- cused on efficiently implementing a traditional software-like reservoir in streamlined hardware [75]. 1Portions of this chapter appear in Komkov et al. [23] 52 . Figure 4.1: LUTs and interconnects in the FPGA implementing a reservoir, with internal states x, external inputs u, and global reset signal r, reproduced from Shani et al. [22] 4.3 Experimental Setup Our experiments were performed on the Xilinx Ultrascale+ MPSoC evaluation board2 shown in figure 4.2. HDL describing the Boolean network was efficiently written using Python scripts that took a directed graph description and node type list as inputs, and generated Verilog as output. The network was then synthesized in Xilinx Vivado and loaded into the FPGA over an Ethernet connection. Our application is an unconventional use of FPGAs. Design rules check (DRC) procedures in Vivado throw errors at the detection of loops and logical contradic- tions, but this is precisely what is needed to create complex network behavior for reservoir computing. Synthesis of a network in Vivado takes about an hour on an average desktop computer and sometimes results in crashes. For this reason, we 2https://www.xilinx.com/products/boards-and-kits/zcu104.html 53 Figure 4.2: Xilinx ZCU104 evaluation board containing the Zynq UltraScale+ MP- SoC. 54 designed networks whose functionality is adjustable by loading in a control setting, rather than reconfiguring the entire logic fabric with new Verilog descriptions for each configuration under test. A block diagram of our design is shown in figure 4.3, as well as a detailed diagram of the internals of a node. We used a fixed-connectivity network with 4096 nodes in all FPGA experiments. RESET RAM 1 x 1024 INPUT RAM 1024 2048 OUTPUT RAM 1024 x 1024 2048 x 1024 CONFIG RAM 4096 4096 x 128 control w 4 resetx Oi w A 0y 1. . . B DL D Q wnA I Bm node n clock Figure 4.3: FPGA setup and reservoir node detail. Each network contains 4096 nodes, evolves 1024 inputs for 1024 time steps and returns 2048 outputs for the same 1024 time steps. Either the delay line or the flip-flop is used at a time. 4.3.1 Data Flow In our experiment, reservoir inputs, a reset pattern, and the node control settings are pre-loaded into RAM. Data flow in and out of the FPGA is shown in 55 figure 4.4. The input RAM is size 1024 ? 1024 bits. Each of the 1024 columns of 1024 bits is presented to the reservoir sequentially, in 5 nanosecond steps. Upon every timestep, 2048 outputs, which are arbitrarily chosen out of the 4096 reservoir nodes, are saved to an output RAM, which has size 1024 ? 2048. If a time series is shorter than 1024 bits, multiple series are tiled to fit into the input RAM. In the example in figure 4.4, a 128-bit time series was binarized and encoded, and then tiled so that seven series could fit into the input RAM, with a few time steps in between each new series. Between subsequent inputs, the reset signal is applied. The reset RAM contains the reset signal pattern, which is loaded in only once for a particular data series with a consistent pattern. Figure 4.4: Data flow into the FPGA. 56 The complete output RAM is sent back to the host computer for further pro- cessing, where individual data samples are sliced out of the output RAM block. If our system is eventually transitioned to a practical implementation, pre-processing and post-processing functions could be performed on the FPGA itself, as the pe- ripheral resources around the programmable logic are vastly underutilized in our experiment. In fact, The Ultrascale+ MPSoC has two ARM processors in addi- tion to programmable logic, and the evaluation board supports many digital signal processing (DSP) and input/output (I/O) features that we do not use here. 4.3.2 Node Details As shown in the lower half of figure 4.3, each node of the network receives two output lines Wx and Wy from two other nodes. Im and Oi are external inputs and outputs present at some nodes. The reset line is connected to all nodes in the network that zeros the node?s output, which is done every time before the introduction of new data to ensure a consistent network starting state. The 4-bit control word selects which of 16 logic functions described in table 4.2, is performed by the node on its inputs Wx and Wy. An optional inverter-based delay line (DL in 4.3), where each delay is a pair of inverters, can be added to all nodes to delay their output. The gate transition times on the Ultrascale+ are around 70 picoseconds, faster than the 200 MHz clock can sample. However, the interconnects between LUTs also add delay time. 57 4.3.3 Network Configuration While FPGA logic fabric is reconfigurable, the process of synthesizing a Boolean network from Verilog and transferring it to the FPGA takes too long to efficiently tests hundreds of configurations. To speed up the testing of hundreds of networks, the control setting directly on the FPGA stores 128 network configurations that can be selected with a command. These configurations are stored in the config RAM, which has size 4096 ? 128. The 16 different logical functions that a node can be configured to perform are described in table 4.2. The strategy for selecting configurations is described in the next section. 4.4 Adjustable Network Dynamics We previously discussed the conditions that a physical system needs to meet to be useful for reservoir computing: high dimensionality, separation of inputs, fading memory, and nonlinearity. We also discussed the effect of various hyperparameters on network behavior and showed examples of the evolution of a software reservoir?s state with no input. Also, there is no well-established theory for which network configurations perform better than others. For this reason, it is critical to build in ways to adjust the network to find an optimal configuration. In chapter 3 we examined the definition of Boolean sensitivity (the fraction of 1-bit transitions between inputs that results in a change in output state), which is a way of quantifying the sensitivity of a digital logic gate to its inputs. To be able to examine network behavior as a function of average Boolean sensitivity of the 58 nodes, we use mixtures of the gate types described in table 4.1. CHOICE refers to a 2-input gate where one of the two inputs is transmitted to the output. Refer to table 4.2 for greater detail on gate types. In this work, we used all two-input gates, and we did not systematically explore the influence of the in-degree of gates. Sg Gate Types 1 XOR, XNOR 0.5 AND, OR, CHOICE 0 YES, NO Table 4.1: 2-Input Gate Sensitivity 4.4.1 Network Strategy The proportion of gates in the network with sensitivity Sg is denoted rS. The overall sensitivity of the network is the sensitivity averaged across all of the gates. These two constraints can be written: r1 + r0.5 + r0 = 1 (4.1) r1 + 0.5 r0.5 = S (4.2) Two networks with the same sensitivity can have varying dynamics due to the different composition of gates within them. For example, a sensitivity 0.5 network can have all Sg = 0.5 gates or half Sg = 1 and half Sg = 0 gates. In our study, we choose an equal proportion of each gate type for every sensitivity - for example Sg = 0.5 in the configuration with largest r1 proportion would have ? XOR, ? XNOR, 59 ? YES gates and ? NO gates. A symmetrical distribution of gate types is used to mitigate network imbalance to either 1 or 0. In the configuration with smallest r1 proportion, AND, OR, NAND, NOR, and CHOICE would be in an equal ratio. To interrogate the space of available configurations, we parameterize the so- lution to the equations in terms of r1, sweeping it in 128 even steps within the allowable range for each sensitivity: S > r1 > max(0, 2S ? 1) (4.3) Each configuration is then given by: (r1, r0.5, r0) = (r1, 2(S ? r1), r1 ? 2S + 1) (4.4) The variables S and the allowable range of r1 can therefore be varied indepen- dently. In all subsequent plots, config n refers to the n-th configuration out of 128 that is generated using the routine just described. As alternative (or in addition to) the use of delay lines, the output state can also be sampled by a flip-flop operating at the sampling frequency. This solution quantizes the propagation time of signals between the nodes to the sampling rate and turns the network in a clocked state machine. We perform tests on networks with nodes whose outputs are sampled by a flip-flop, and networks with nodes having 20 delays and no flip-flop. 60 Figure 4.5: Planes of constraints described in equation 4.4. The range of parameters to be probed, at an arbitrary operating point, is the red intersection line. Sweeping S between 0 and 1 moves the relative position of these planes, and traces the red and blue lines in figure 4.6. 4.5 Network Studies The realizable parameter space of gate configurations is extremely large (Nstates ? O(164096)) due to the 4096 different gates within the FPGA-based reservoir that can each have 16 different configurations, described in table 4.2. Even in the case of software reservoirs which have been studied extensively [30], finding reservoir hyper- parameters that work well for particular tasks remains difficult [93]. The extremely large parameter space of the Boolean networks in our experiments necessitates an efficient searching method for potentially useful configurations. For this purpose, 61 Gate proportions 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.0 0.60.2 0.4 0.4 0.6 0.2 S 0.8 1.0 0.0 Figure 4.6: Relationship between gate ratios and average sensitivity. For a particular sensitivity S, the allowable range of r1 is evenly divided into 128 steps and the other rs are computed. Each of these is an independent configuration labeled config in subsequent figures. we implement tests of network behavior that are dataset-agnostic, measuring self- activity C and transient time T . 4.5.1 Self-Activity Research stemming from the study of cellular automata has found that the maximum capacity of a network for memory and information processing is achieved in the vicinity of a phase transition?a threshold between regions of highly ordered and highly disordered network behavior [94, 95]. A Boolean reservoir should there- fore be near a region of transition between low activity and high activity. 62 r1 r0.5 (blue), r0 (red) Self-activity of unclocked networks after reset config 0 C=0.02 config 64 C=0.01 config 127 C=0.01 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 C=0.02 C=0.03 C=0.04 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 C=0.25 C=0.2 C=0.13 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 Figure 4.7: Behavior of individual reservoir nodes after a network reset. Activity of 60 of the 2048 output nodes for 50 timesteps at a variety of average network sensitivities and gate configurations, for unclocked networks. The calculated self- activity C is shown next to each network snapshot. 63 S=0.6 S=0.5 S=0.4 Self-activity of clocked networks after reset config 0 C=0.01 config 64 C=0.01 config 127 C=0.01 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 C=0.04 C=0.04 C=0.01 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 C=0.28 C=0.22 C=0.14 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 Figure 4.9: Behavior of individual reservoir nodes after a network reset. Activity of 60 of the 2048 output nodes for 50 timesteps at a variety of average network sensitivities and gate configurations, for clocked networks. The calculated self- activity C is shown next to each network snapshot. 64 S=0.6 S=0.5 S=0.4 Gate type Proportion True r0/2 False r0/2 A XOR B r1/4 A OR B r0.5/10 A AND B r0.5/10 A XOR B r0/4 A OR B r0.5/10 A AND B r0.5/10 A XOR B r0/4 A OR B r0.5/10 A AND B r0.5/10 A XOR B r0/4 A OR B r0.5/10 A AND B r0.5/10 CHOOSEA r0.5/10 CHOOSEB r0.5/10 Table 4.2: Gate type details Network self-activity, C, is a way of quantifying overall network activity with- out any input, after a reset which forces the output of every node to be zero. As in [22], it is defined as the fraction of nodes changing state, averaged over the networks outputs and over timesteps t: C = ?|xi(t)? xi(t? 1)|?i,t (4.5) where xi(t) is the Boolean state at node i at time t. It is informative to look directly at the network state to understand what various self-activities look like. Figure 4.9 shows the state of 60 out of the 2048 output network nodes for 50 timesteps after a reset across various sensitivities and configurations. Config 0 corresponds to the least amount of r1 in the allowable range, and config 127 having the most amount of r1 allowable for a given S (see 65 equation 4.4). Both clocked and unclocked networks were tested. A variety of behavior ap- pears in both responses. As sensitivity is increased, the overall network activity increases. Clocked networks appear to settle into a fixed state in fewer time steps, in part due to the fast transition times of the gates and the under-sampling of the activity due to the limited clock speed within the FPGA. In both cases, some networks exhibit only transient behavior and settle into a static fixed state. Some networks settle into limit cycles. Networks with higher excitation do not appear to settle into any repeatable pattern. A sweep across sensitivities from S=0.3 to S=0.7 and spanning all gate con- figurations is shown in figure 4.10. This figure shows results only for a clocked network, but unclocked network results look very similar. However, unclocked net- works caused the FPGA to crash at around S=0.68 due to high activity, which terminated the scan. Furthermore, the sweep was repeated using multiple random seeds, which determine the placement of gates, and similar results were observed. As is evident from 4.10, the self-activity (C) dependence on average sensitivity (S), shows significant structure. As we have kept the node graph topology constant in these tests, the features likely indicate the presence of dynamical loops in the directed graph. Evident in that figure are tails below S = 0.5 that are likely limit cycles (ring oscillators) in the setup that persist among the gradually varying con- figurations. Above S = 0.5, the cone-shaped structures are suggestive of a joining of various oscillators present in the FPGA structure that are cooperating in more complicated dynamics. Higher activity is seen in the lower-numbered configurations, 66 Figure 4.10: Self activity C as a function of S and configuration in a clocked network. Structures in the image indicate the presence of interacting dynamical loops in the reservoir?s directed graph. 67 Response of unclocked network to perturbation config 0 T=4 config 64 T=4 config 127 T=3 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 T=12 T=7 T=7 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 T=2 T=2 T=2 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 Figure 4.11: Unclocked network which have a smaller relative fraction of r1 gates. 4.5.2 Transient Time An important property of a reservoir is fading memory?the ability to retain past states while weighting the recent ones the most highly. To provide insight into a network?s memory, we perturb the network with an input of all ones for five timesteps, and observe the network state afterwards. Activity after a perturbation 68 S=0.6 S=0.5 S=0.4 Response of clocked network to perturbation config 0 T=15 config 64 T=10 config 127 T=13 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 T=39 T=15 T=14 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 T=4 T=6 T=9 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 Figure 4.13: Response of individual reservoir nodes to a perturbation. An input of all ones, 1024 inputs nodes, is introduced between the timesteps of 5 and 20. The corresponding transient time T is annotated in red. 69 S=0.6 S=0.5 S=0.4 that decays with time is called transient activity. To quantify the duration of activity after a perturbation, we calculate an average network value X for 30 timesteps before an input is presented: X = ?X?i,t?(0,30). (4.6) We then calculate the departure away from this calculated average value: dt = ?X?t ?X. (4.7) We define the transient time T as the time it takes for the network to return to a value close to its pre-perturbation average state: X+0.01. As can be seen from the network images in 4.9 at higher sensitivities, the concept of a network?s average value becomes nebulous with networks having a large amount of self-activity, as there?s no steady state. In fact, some of these highly excited networks never return to X+0.01, in which case they are discarded from consideration for machine learning and appear as white pixels in 4.16. Figure 4.13 shows a scan across the average sensitivities and configurations for clocked and unclocked networks. The unclocked networks crash the FPGA around S = 0.68, and as a result the values are all zero above this S in the plot. Here is a large distinction between clocked and unclocked network versions. This is in large part due to the effective sampling frequency relative to the network dynamics. The unclocked networks have each node?s activity slowed down by the delay line, but are 70 still under-sampled by the 200MHz clock. Clocked networks only advance upon a Response of unclocked network to perturbation clock pulse, and every state of the network is measured. config 0 T=4 config 64 T=4 config 127 T=3 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 T=12 T=7 T=7 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 T=2 T=2 T=2 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 Figure 4.14: Response of unclocked networks to a perturbation. Corresponding transient time T is annotated in red. 4.5.3 Selection of Desirable Networks A network suitable for machine learning will satisfy the following properties: ? Quiescent with no input. Activity within the network when there is no 71 S=0.6 S=0.5 S=0.4 Response of clocked network to perturbation config 0 T=15 config 64 T=10 config 127 T=13 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 T=39 T=15 T=14 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 T=4 T=6 T=9 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 Figure 4.15: Response of clocked networks to a perturbation. Corresponding transient time T is annotated in red. 72 S=0.6 S=0.5 S=0.4 Figure 4.16: Transient times of unclocked networks. 73 Figure 4.17: Transient times of clocked networks. 74 input consumes power but does not do useful computation. To satisfy this requirement we choose only networks with self-activity C < 0.0175, a number chosen qualitatively. The self-activities of only unclocked networks fitting this criterion are shown in figure 4.5.3. ? Close to a phase transition. Networks should be close to S=0.5 at which there is an experimentally observed inflection point in network activity, as can be seen in 4.10. Networks near the ?edge of chaos? have been shown to outperform networks in other regimes [95, 96, 97]. ? Fading memory. For this we select transient times greater than 3 timesteps. The transient times of only those networks meeting all criteria are shown in fig- ure 4.5.3. The resulting configurations in the parameter space of S and configuration number is a band below S=0.5 containing between 1000 and 3000 configurations, depending upon the network type and random seed. Of these, 100 were randomly chosen for application to a machine learning classification test. 4.5.4 Output Distributions and Entropy Looking at the frequency at which reservoir nodes are on or off, given a uniform input, may provide insight into the networks? expressive power. Distributions with the greatest entropy may have the greatest expressive power, as entropy is a measure of information content. Due to the presence of Sg = 0 gates in many configurations, as well as the random connectivity within the networks, not all of the reservoir outputs contribute usefully to the result. The Sg = 0 gates have fixed high or low 75 outputs. Other gates may get two inputs from fixed-output nodes, thereby shutting off that gate. Other gates simply rarely change their state due to the particular network configuration. These scenarios are not checked before network synthesis. 100 random uniformly distributed input vectors were introduced to the net- work, with each bit in the input having equal probability of being on or off: P (Xi = 1) = P (Xi = 0) = 0.5. The average value of each output bit was then computed: ? P (X 100i) = k=0Xi,k. If the average value was below 0.1 or greater than 0.9, meaning that it changed less than 10% of the time, the bit was discarded, as large tails at 0 and 1 skewed the distributions. Histogram of these distributions across various average sensitivites and con- figurations is shown in figure 4.5.4. From each of these histograms, the Shannon entropy was computed: 1?024 H(X) = ? P (Xi)log2(Xi) (4.8) i=0 Each configuration had a different number of output gates after being trimmed. Configurations with fewer r1 gates have broader output distributions, and more nodes remaining after the trimming procedure, than distributions with more r1 gates. From left to right in figure 4.5.4, the proportion of Sg = 0 YES and NO gates increases, which are gates that shut off the output from a node completely. It can be seen that after pruning inactive outputs, there are fewer nodes left in the networks with higher r0 proportion. As for the remaining distribution, the networks with 76 higher r0 proportion have a sharper peak around P = 0.5, while networks on the left-hand side, with more varied gate types, have a broader distribution. How to use these distributions to predict information processing utility of a Boolean network remains an area of research. 4.6 Radio Frequency (RF) Classification To benchmark the machine learning performance of various network configu- rations, the FPGA-based reservoir was used for classification of the DeepSig 2016 dataset [98]. This dataset contains 1000 example waveforms per signal-to-noise ra- tio (SNR) for each of 11 various radiofrequency modulation types across SNRs from ?20 : 2 : 18. As the overall amount of data is very large after processing by the reservoir, we use only SNR 18 to efficiently test many configurations. 65% of the samples were used for training, 15% for validation, and 20% for testing. 4.6.1 Preprocessing Each sample in the dataset contains 128 timesteps of in-phase (I) and quadra- ture (Q) waveforms. Recall that the raw (I) and (Q) data may be modeled as a ? complex number z = I + j Q = rej? where j is ?1. Eight distinct different data from these raw (I) and (Q) waveform have been computed, see 4.21: ? raw I ? raw Q ? Modulus of signal: |z| 77 ? Modulus difference ratio: zr = |zi+1zi| ? Modulus forward difference: ?z = |zi+1| ? |zi| ? Phase of signal: ?i (?i = arg(z)) ? Phase forward difference: ?? = arg (zi+1zi) ? Phase difference ratio: ?r = arg(zi+1)arg(zi) Each of these waveforms was then thermometrically encoded. Each signal was mapped to [i?128, (i+1)128] bit inputs for i ? (0, 7) for the 8 different encodings. The amplitude encodings, the first five, were scaled by sample minimum and maximum so that [sample minimum, sample maximum] mapped to [i ?128, (i+1)128]. The phase encodings were mapped from [??, ?]. This encoding is necessary for the reservoir as it accepts 1024 binary inputs. The inputs are then ?tiled? so that they fit into the 1024-bit input RAM, with 10-timestep pauses between them, and a network reset right before the introduction of a new input. 4.6.2 Subsampling in Time To improve regression speed, we sub-sample the outputs into 16 samples, spaced apart by 8 timesteps. This was tested to only reduce machine learning ac- curacy by a few percent while greatly reducing the number of trainable parameters and improving regression speed. 78 4.6.3 Machine Learning Training Procedure While a typical attraction of reservoir computing is rapidly evaluating the output with a closed-form solution of linear regression, logistic regression is more suitable for a classification problem which maps the data to multiple discrete classes. While for the purposes of this experiment, the reservoir state is transmitted to a supervisory computer for post-processing and training, these functions can be implemented directly on the FPGA for high speed in the future. The size of the DeepSig 2016 dataset and the amount of data generated by the FPGA-based reservoir necessitates an iterative approach to training. In pytorch, we use a single neural network with linear neurons, and use stochastic gradient de- scent to minimize cross-entropy loss. To optimize the learning rate and L2 penalty, Hyperopt was used with trials scheduled and early stopping implemented using the asynchronous hyperband optimizer in Ray Tune [99]. Each input into the regres- sor was allowed 20 Bayesian optimization trials optimizing learning rate within the range (1e-5, 1e-1) and L2 penalty within the range (1e-9,1e-1), and could train up to 50 epochs. Convergence was observed with nearly all trials. While Bayesian optimization is not necessary for only two parameters to optimize, we developed this infrastructure using Ray Tune to wrap reservoir hyperparameters into the op- timization loop in the future. Additionally, Ray Tune can distribute a task among multiple CPUs or GPUs for faster processing. All reservoir configurations were evaluated with the validation set. Test set results were consistently a few percent lower. 79 Variation Preprocessing Trainable Accuracy Parame- ters 0 Eight time series variations 11,176 14% 1 I and Q series 2,816 22% 2 (1) thermometrically encoded 1,441,792 44% 3 (5) sub-sampled to 16 timesteps 180,224 66% 3 Convolutional neural network [98] 2,214,475 76% 4 (5) passed through best clocked RPU 259,952 76% configuration and sub-sampled 5 (0) thermometrically encoded 1,430,528 78% 6 (5) passed through best unclocked RPU 161,744 82% configuration and sub-sampled Table 4.3: DeepSig 2016 SNR 18 time series classification results using logistic regression, with various preprocessing methods. 4.6.4 Machine Learning Results Accuracy results of logistic regression applied to raw data, preprocessed data, encoded data, and Boolean reservoir-processed data are shown in 4.3. While pre- processing by itself degrades classification accuracy, after thermometric encoding it provides a surprising boost to the results. The encoded time series then processed by the reservoir show a further improvement in accuracy: even when sub-sampled to 16 timesteps, the accuracy is competitive with that of the state-of-the-art convo- lutional neural network with less than one-tenth of the trainable parameters. The corresponding confusion matrix applied to the test set on variation (6) from 4.3 is in 4.25. Machine learning accuracy on unclocked network-processed data on the same axes as the prior figures of S and config is shown in 4.25. We found no discernible dependence of accuracy on C or T within the region selected by the criteria described 80 in a prior section. A similar scatter plot was generated for clocked networks and overall accuracy was a few percent lower. 4.7 Conclusion We explored the dynamics of Boolean networks for reservoir computing, nar- rowing down the extremely large parameter space of configurations using dataset- agnostic methods to identify networks that would function well as reservoirs for reservoir computing. We validated these chosen networks on the DeepSig 2016 dataset, achieving state-of-the-art accuracy with less than one-tenth of the train- able parameters with an unclocked network on a Xilinx Ultrascale+ MPSoC. We also tested clocked networks which yielded slightly lower accuracy. A notable observation is that pre-processing the RF time series by computing ratios and first differences of the complex modulus and angle of the I and Q signals, and thermometrically encoding this data, yields comparable accuracy with simple logistic regression to a convolutional neural network. Even with taking only one- eighth of the thermometrically encoded data by slicing in time to reduce the number of trainable parameters, the accuracy remains high. 81 Self-activity less than 0.0175 120 100 80 60 40 20 0 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 Sensitivity, S 0.008 0.010 0.012 0.014 0.016 Figure 4.18: Selected configurations based on self-activity C, of unclocked networks. 82 config Transient times for configs with C<0.0175 and T>3 120 100 80 60 40 20 0 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 Sensitivity, S 5 10 15 20 25 30 Figure 4.19: Selected configurations based on self-activity C and transient time T , of unclocked networks. 83 config Figure 4.20: Output frequency distributions with a uniform random input. Nodes that are on or off more than 90% of the time are discarded for this computation. The number of remaining nodes, and the entropy of the distribution, is noted in red. 84 8PSK 0.01 I 0.010 Q 0.00 0.005 |z| z 0.000 0.01 0 50 100 0 50 100 2 2 0 0 z 2 r2 r 0 50 100 0 50 100 Figure 4.21: All time series variations of a single example from the DeepSig 2016 dataset. Raw I and Q signals, and all applied preprocessing methods. 85 86 8PSK AM-DSB AM-SSB BPSK CPFSK GFSK PAM4 QAM16 QAM64 QPSK WBFM 1.0 I Q0.8 |z| 0.6 zr z 0.4 i 0.2 r 0.0 00.0 50 100 0 50 100 0 0.2 50 100 0 50 100 0 05.04 100 0 50 100 0 50 0.6100 0 50 100 0 50 100.80 0 50 100 0 50 100 1.0 Timestep Figure 4.22: Thermometrically encoded time series variations. For each sample in the dataset, all eight time series variations shown in 4.21 are scaled to fill the range [0,127] and thermometrically encoded. The encoded waveforms are stacked into a vector of 1024 data bits by 128 timesteps, which is presented to the reservoir one step at a time. 120 0.80 100 0.75 80 60 0.70 40 0.65 20 0 0.60 0.375 0.400 0.425 0.450 0.475 0.500 0.525 S Figure 4.23: ML accuracy on DeepSig 2018 SNR 18 data processed using an unclocked network. Color represents validation accuracy, which was up to 82%. 87 config Predicted Label Figure 4.24: Confusion Matrix for the test set applied to the best config- uration. 88 True Label Figure 4.25: Confusion Matrix for the convolutional neural network on a different randomly selected subset of the dataset. 89 Chapter 5: Boolean Reservoir Computing Using Discrete Digital Logic Gates on a Printed Circuit Board 5.1 Introduction In this chapter1 we describe the design and test of a printed circuit board for reservoir computing, using discrete logic chips to implement a randomly connected Boolean network. If use of these networks is adopted for practical machine learning problems, then lessons learned from tests of these networks on a PCB can be applied to a miniaturized version implemented in more modern technology such as on an ASIC. 5.2 CMOS Reservoir Circuit Design 5.2.1 Why Discrete Digital Logic Chips? In this experiment we designed a 64-node reservoir on a printed circuit board using 4000-series digital logic chips. One might very reasonably ask why we would use a decades old technology when there are more modern tools at our disposal. The reasons are: 1Portions of this chapter reproduced from Komkov et al. [24]. 90 ? In our application, slow speed is beneficial, as we want to study the dynamics of free-running networks and sample the waveforms with high density. Being one of the oldest digital logic families, 4000-series CMOS is relatively slow. In fact, we would have preferred to use a logic family with completely unbuffered outputs for transfer functions with the smallest gains, but to keep the compo- nent count reasonable we used a buffered version as not all Boolean functions (namely, XOR) are available as discrete chips in an unbuffered series. ? We are interested in probing the effect of finer voltage quantization than digital tools allow. How much more information can be extracted out of an analog signal versus a digital one in reservoir computing is an open research ques- tion. In Komkov et al. [23], similar Boolean networks were studied as the ones presented here. However, on an FPGA, the inputs and output are bina- rized by design. Digitization can unnecessarily add processing time in some applications when analog inputs and/or outputs are desired. ? Our networks of interest are cumbersome to simulate using Spice tools. They are large and have lots of switching activity. The presence of interacting loops in the circuit makes high time resolution necessary. As it?s not clear a priori what circuit topology will make the best reservoir, we want to efficiently test many configurations, which can be impractical if every simulation is slow. ? Finally, the components and assembly are economical. Each chip retails for around $0.50 USD in small quantities, and assembly was done by hand in under four hours. 91 5.2.2 Circuit Details The printed circuit board is shown in 5.1. The circuit consists of 64 blocks whose design is shown in 5.2. In the middle of the node is a three-input XOR, comprised of two XORs with two inputs each (only two-input XORs are available in the logic family used, see 5.1 for part numbers). The XOR was chosen because it is equivalently sensitive to changes of ones and zeros in its input. The inputs to the XOR can selectively be shut off by the three NAND gates, controlled by control bits C1, C2, and C3. The output of the XORs is AND?ed with a RESET, which is a global control line that forces the output of every block to be low, which is used to ensure the network has a consistent starting point. Logic gate Part number XOR MC14070B NAND MC14011B AND MC14081B Shift register MC14094B Level shifter MC14504B Table 5.1: Part numbers with hyperlinked datasheets. There are 192 total control bits C, which are stored in a shift register. The shift register is loaded with a control word serially, prior to the testing of any network configuration. The board?s power supply voltage (Vdd) can be varied to probe the dynamics of the reservoir at various voltages. A Teensy2 3.2 is used for a serial interface with a control PC to set the digital RESET and shift register lines on the board. The Teensy?s 3.3V outputs are converted to the board?s voltage using a level shifter. Three external analog inputs each fan out to three reservoir nodes. MMCX 2https://www.pjrc.com/teensy/ 92 Figure 5.1: Printed circuit board reservoir using discrete logic gates. coaxial connectors, chosen for their small footprint, were connected to every node. Figure 5.2: Design of a single node in the reservoir. To design the board, first the randomly populated adjacency matrix describing the network configuration was generated in MATLAB and was checked for no loops of length one. The code generated a simplified net list description that was used to 93 design the circuit schematic by hand in KiCAD3, open-source EDA software which was chosen as it supports Python scripting if we would ever want to generate more circuit layouts in an automated manner. Components were placed in a printed circuit board layout by hand in a grid pattern, and the circuit was then auto-routed using the software FreeRouting4. The printed circuit board is 6 layers with alternating ground and signal layers. 5.2.3 Waveforms Waveforms of nodes in the reservoir circuit exhibit a variety of behaviors, including DC steady states, periodic oscillations, and turbulent behavior, any of which may be transient or persistent. The behavior of the node is highly dependent upon the network configuration, supply voltage, and input signal. Figure 5.3 shows four arbitrarily selected channels in a network configuration that is very excitable. In the frequency domain, these waveforms exhibit broadband frequency behavior without any amplitude peaks, an indication of turbulence. While the most frequent voltages are 0 or Vdd, these waveforms also contain many incomplete transitions and intermediate voltage values. The 4000-series chips are rated for operation with a supply voltage in the range 3V-18V. The internal frequencies in the network are highly dependent upon the supply voltage, particularly in the lower part of the voltage range. Dependence of gate transition times?which influence the internal frequencies?on supply voltage 3https://www.kicad.org/ 4https://freerouting.org/ 94 Figure 5.3: Node waveforms in an undriven network configuration with high self- activity. can be found in the component datasheets. Except where otherwise noted, all tests were performed at 3.3V. 5.2.4 Network activity with no external input We used circuit current draw as a simple measure of overall network activity. Figure 5.4 shows the current readout from the power supply for 100 network con- figurations at each of 64 steps of S between 0 and 1. For each value of S, random seeds 1 through 100 were used to randomize the placement of ones in the 192-bit control word. The RESET signal was applied prior to letting the network run freely 95 to ensure a consistent starting condition. The resulting characteristic S-shape is consistent with prior FPGA-based mea- surements of Boolean network activity as a function of average sensitivity [22, 23]. If the behavior of this circuit is similar to FPGA-based Boolean networks, the optimal machine learning performance can be expected in configurations at the base of the curve, at the threshold between low and high excitation. Self-excited network current draw at Vdd=3.3 20 15 10 5 0 0.0 0.2 0.4 0.6 0.8 1.0 S Figure 5.4: Current draw as a function of network Boolean sensitivity. Current draw as a function of supply voltage Vdd is a quadratic curve whose magnitude is proportional to the network?s sensitivity. Figure 5.5 shows relationship of steady-state current draw and supply voltage Vdd, for 64 self-excited networks with S evenly spaced from 0 to 1, with a single random seed. Intermittent excited behavior can be seen at some very low values of S, where the current jumps from 96 current (mA) close to zero to a higher value following the quadratic curve. I-V curves at various S 120 100 80 60 40 20 0 2 3 4 5 6 Vdd Figure 5.5: Circuit current draw at 64 values of S at varying power supply voltages Vdd. 5.2.5 Pulse to Pulse Variation As we are limited to only sixteen high resolution analog channels that are available on our lab?s test equipment, all tests were only on outputs 1 through 61, in intervals of 4. The network can be sensitive to the capacitance of a cable, so the connections were not moved. In all measurements, 1000 data points were captured in 100ns increments. A condition for reservoir computing is that a network?s behavior must be repeatable. For that reason we want to identify networks whose response varies between repetitions of the same stimulus. We apply one period of a 100kHz sine 97 Current (mA) pulse, whose minimum and maximum are 0 and Vdd. To quantify the repeatability, we compute a pulse to pulse variation of the network response to five repetitions of an input stimulus, using the voltage V (t, c, p) which is a function of time t, oscilloscope channel c and time t. The pulse to pulse variation is the standard deviation of a signal between pulses, averaged over time and between channels: ??p(V (t, c, p) )?t,c. We focus on the middle of the sensitivity range as we expect that very inactive and very active networks will be of minimal utility for information processing. The pulse to pulse variation follows the general same trend as the current draw, with an inflection point around S = 0.3, see figure 5.6. Networks that pass this critical point amplify noise, which pushes them into divergent states, thereby making the network response to a stimulus not repeatable. Variation beteween responses to a repeated stimulus 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.3 0.4 0.5 0.6 S Figure 5.6: Pulse to pulse variation as a function of sensitivity. 98 Pulse to pulse variation 5.2.6 Transient Times Quantifying the amount of time that a network departs from a steady state value after a perturbation can give insight into the amount of memory the network has. To compute the average transient time of the network before it settles into a quiescent steady state, we initially compute the first difference of each waveform in the data to make steady state DC values equivalent. Then we take the absolute value, to equalize the influence of rising and falling edges. Finally we compute a moving average with a window of 10 microseconds to smooth out the signal. We define the transient time as the time it takes for the signal to pass an arbitrarily defined threshold of 0.1. The shortcoming of this method is that periodic steady states would appear no different from chaotic activity. Networks around the critical point exhibit the longest transient times. At low S, networks lack sufficient connection density for information to propagate far, and many outputs never change their state. At high S, networks are so excited that they never reach a steady state (for the purpose of this plot those states are shown as having zero transient time rather than infinite). 5.3 Machine Learning Tests We test the performance of the reservoir circuit using a time series classification task. As inputs, we use 50 each of sine, triangle, and square pulse waveforms with added noise. The signal amplitude is 2.1V peak to peak, centered at Vdd/2, and the noise peak to peak amplitudes are varied from 0.3 to 5.5V. The voltage ratios in all 99 Average transient time after stimulus 80 70 60 50 40 30 20 10 0 0.2 0.3 0.4 0.5 0.6 S Figure 5.7: Transient times as a function of sensitivity. plots are all peak to peak voltages. Figure 5.8 shows a sample noisy sine waveform, where one sine pulse is present from 0-10?s, and the noise persists after that. The state of 15 reservoir nodes (outputs 4 through 60 in even steps) are also recorded with a resolution of 0.1?s; these are also shown on the same axes in the figure. We use scikit-learn?s LogisticRegressionCV5, a logistic classifier with cross validation, to compute the reservoir output layer on a supervisory computer. First we compute the accuracy of classification using only the raw input waveform, which we call the baseline accuracy. Then we compare it to the classification accuracy using both the raw input waveform, and 15 reservoir outputs combined, as input to the classifier. There is a 50% train/test split, and 200 samples per waveform are 5https://scikit-learn.org/stable/modules/generated/sklearn.linear_model. LogisticRegressionCV.html 100 Transient length ( s) 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Time (microseconds) Figure 5.8: Noisy sine wave pulse input, and 15 reservoir outputs on the same axes. used. Based upon the results in the prior section, we selected two networks at S = 0.36 with different random seeds that are repeatable, with a small pulse to pulse variation (0.025 and 0.02) and a reasonable transient time (3?s and 5?s). Machine learning results from these two network are shown in figure 5.9. The reservoir improves classification accuracy significantly at high noise levels, an improvement of up to nearly 40% above classification done on the input alone. The accuracy improvement increases at higher noise levels only because the baseline classifier performs so well at lower noise levels. We probed the effect of S on classification accuracy as well. The accuracy improvement at several noise to signal ratios over a range of S from 0 to 1 is 101 Voltage Accuracy improvement at various noise levels 100 80 60 input waveform only input waveform and reservoir 40 difference 20 0 0.5 1.0 1.5 2.0 2.5 Vnoise/Vsignal Figure 5.9: Classification accuracy improvement from the reservoir, at various noise levels. shown in figure 5.10. Networks throughout the whole sensitivity range show strong accuracy improvement. Networks with low excitation contribute appreciably to machine learning accuracy, while overexcited networks seem to help less. 5.4 Conclusion Using discrete logic chips on a printed circuit board, we implemented a free- running Boolean logic gate network for reservoir computing. We explore the param- eter space of configurations using a control setting that selectively shuts off the the inputs to a node in the reservoir, and probe the overall network dynamics that result. Using the reservoir circuit, we demonstrate significant improvement in accuracy in waveform classification, when compared with using the input waveform on its own. 102 Accuracy (%) Figure 5.10: Machine learning accuracy improvement over baseline as a function of overall network sensitivity. While the benchmark task that we used is simplistic, the results are a promising demonstration of the time series processing capabilities of random Boolean networks. How to quantify the information processing capability of a reservoir remains a challenge. In future work we hope to achieve a better understanding of the added information content from a reservoir within analog waveforms, compared to the information encoded temporally in the timing of digital pulses. 103 Chapter 6: Design of a 180nm ASIC for Reservoir Computing 6.1 Overview An application-specific integrated circuit (ASIC) is a microchip designed to implement a specific electronic design and application. We designed seven ASICs for reservoir computing in 180nm CMOS, with the aim to demonstrate rapid ma- chine learning using Boolean logic gate networks. The chip, shown in 6.1, has been manufactured and now awaits post-processing and testing after this dissertation is complete. Designs are based on the designs in the patent application by Lathrop, Restelli and Komkov [25]. 6.2 Process Details The wafer was manufactured using the SilTerra C18G 180nm CMOS process1, which has a 1.8V core voltage, 3.3V pad voltage, and up to six metal layers. Our design was sized to fit one-sixth of a multi-project reticle2. We were provided a standard cell library from the foundry, as well as an ARM I/O pad library. All of the standard cells and I/O pad designs, timing specifications, and power specifications 1https://www.silterra.com/c18g-180nm-cmos-logic-1-8v-3-3v 2We were generously provided wafer space at no cost by Advait Madhavan and Jabez McClelland from the National Institute of Standards and Technology (NIST). 104 Figure 6.1: Our group of seven reservoir ASICs is shown in the middle of this multi-project wafer. are proprietary, so any details discussed this chapter are intentionally left without proprietary details. 6.3 Design Overview We chose to use the provided standard cell library, despite its proprietary nature not in line with the spirit of open research, because it would allow us to immediately use automation tools in Cadence to aid in the layout and routing of the cells. The standard cells contain the designs of various logic blocks down to the silicon level. The original intention was to create large networks with thousands of nodes, but upon further discussion it was decided to instead create relatively small, 64-node 105 reservoirs so that every node state could be probed. Probing only a subset of the nodes would unevenly load the reservoir, as routing a connection from the chip core to outside test equipment adds significant capacitance which slows down that node. Additional digital buffering is antithetical to the aim of using the analog dynamics of the circuit for computation. To connect the chip?s core, in which the standard cells are laid out, to the outside world, input/output (I/O) pads are used. The I/O pads are rectangular elements with a standard size that contain a connection to the core one side, and a metal pad on the other side, to which a wire bond is made. The two categories of I/O pads available are analog and digital. The digital pads are buffered, while the analog pads are straight-through pads with a few hundred ohms of resistance between the core and the metal pad. Each analog pad has about 2pF of capacitance. As a result of reading out each of the 64 nodes, the design is pad-limited, having a large pad ring but very low utilization within the core. Reservoir performance can be highly dependent up the graph connectivity in ways that are not fully understood. To mitigate the risk of any one particular design not working well, seven separate 64-node reservoir chips were designed with various combinations of adjacency matrices, logic gate mixtures, and I/O configurations were manufactured across the seven chips. We call each of the chips a chiplet. The chiplets can later be diced apart, or they can be wire-bonded together directly on the wafer to implement one of the multi-reservoir configurations described in Chapter 2. The mixture of configurable logic blocks in each of the seven chiplets is in table 106 6.1. Each chiplet has a mixture of blocks that allow a range of sensitivities to be interrogated. Four different randomly generated adjacency matrices were used. 6.4 Node Designs The configurable node designs used are shown in figure 6.2 and are described below. Each chiplet was designed with a combination of gates to fully sweep the parameter space between S = 0 to S = 1. ? Figure 6.2(a) shows a 3-input NAND block with a control bit (thereby using a 4 input gate). The control input C has the ability to effectively turn off the node by forcing the output high when C=0. When C=1, the output is equivalent to a 3-input NAND with X1, X2, X3 as inputs. When C=0, S=0. When C=1, S=0.25. ? Figure 6.2(b) shows a 3-input NOR block with a control bit (thereby using a 4 input gate). Like in the NAND case, C has the ability to turn off the gate by forcing the output low when C=1. When C=0, the NOR acts like a 3-input NOR with X1, X2 and X3 inputs. When C=1, OUT=0 and S=0. When C=0, OUT=NOR(X1,X2,X3), S=0.25. Graphs of 3-input NAND, NOR and XOR are shown in figure 3.6. ? Figure 6.2(c) shows a 2-NAND-3XOR block. When C=1, the inputs X1 and X2 are inverted and passed on to the 3-input XOR. When C=0, the inputs X1 and X2 are blocked, and the three-input XOR effectively only has one input. When C=0, S=0.33. When C=1, S=1. 107 ? Figure 6.2(c) shows a 2-NAND-3XOR block. When C=1, the inputs X1 and X2 are inverted and passed on to the 3-input XOR. When C=0, the inputs X1 and X2 are blocked, and the three-input XOR effectively only has one input. When C=0, S=0.33. When C=1, S=1. The graph when C=0 is shown in figure 6.4. ? Figure 6.2(d) shows a 3-NAND-3-XOR. The NANDs selectively turn off the inputs to a 3-input XOR gate. When C?s are 0, S=0. When one C=1 and two are 0, S=0.33, When two C?s are 1 and the other is 0, S=0.66. When all C?s are 1, S=1. Graphs with various control settings are shown in figure 6.3. Figure 6.2: Configurable node designs. 6.5 Chip Control Figure 6.5 shows two reservoir blocks along with the supporting circuitry. We will discuss the elements of the diagram from left to right. 108 3-NAND-3XOR,C=100 graph, 3 inputs 3-NAND-3XOR,C=110 graph, 3 inputs In:(1,1,0),Out:0 In:(1,1,0),Out:1 In:(0,1,0),Out:0 In:(0,1,0),Out:1 In:(1,0,0),Out:0 In:(1,0,0),Out:0 In:(1,1,1),Out:1 In:(1,1,1),Out:0 In:(0,0,0),Out:0 In:(0,0,0),Out:0 In:(0,1,1),Out:1 In:(0,1,1),Out:0 In:(1,0,1),Out:1 In:(1,0,1),Out:1 In:(0,0,1),Out:1 In:(0,0,1),Out:1 Figure 6.3: Graphs of configurable XOR gates with various control settings. Control shift register: SR C CLK and SR C DATA are clock and data lines that feed a shift register containing the 64-bit control word, C, which determines node configurations. The control vector needs to be loaded in serially before the reservoir is run. Reservoir blocks: Blocks have three data inputs (X1, X2, X3) which are either recurrent connections from other blocks, or external inputs. In each chiplet, there are a total of 18 external inputs to blocks, from three analog pads that connect to 6 blocks each. Reset line: To ensure that the reservoir always has a repeatable starting condition before evolving autonomously, the RESET line is AND?ed with the output from each block. When RESET is zero, the output from every block is zero. Multiplexer select line: MUX SEL selects whether recurrent connections to the back to the reservoir blocks are selected from before or after the output buffer (BUF). The buffer helps to drive the pad capacitance, but it introduces on the order of 1ns more delay. If the feedback connection before the buffer is selected, the 109 Figure 6.4: Graph of the 2-NAND-3XOR gate with C=0. reservoir will run faster than if the feedback connection is selected after the buffer. However, output pads may not be able to transmit the full range of the reservoir?s activity if its constituent frequencies are too fast due to their own slew rates. Details will vary by chip, which use different output pads. More information is in table 6.1. 6.6 Speed and Power Estimates While details are left somewhat vague by necessity due to the proprietary nature of the standard cell designs and speeds, chiplets with analog pads (Andy, Billy and Denny) will exhibit frequencies of about 150MHz, while those with digital pads (Charlie) will exhibit about 3GHz frequencies, depending upon the configu- ration. Exact timing and power consumption information of the standard cells is proprietary. Therefore these numbers below are gross estimates. 110 Figure 6.5: Reservoir block diagram. Chiplet Adjacency Gate mixture Output pads name matrix Andy A 25% NAND,25% NOR, 64 PANALOG 50% 2-NAND-3XOR Billy A 100% 3-NAND-3-XOR 64 Analog Charlie A 100% 3-NAND-3-XOR 19 digital output pads (highest slew rate, high- noise pad) 45 digital out- put pads (slowest slew rate, low-noise pad) Denny B 25% NAND,25% NOR, 64 Analog 50% 2-NAND-3XOR Ernie A 100% 3-NAND-3-XOR slowest slew rate, low- (infer- noise digital pads, 16-bit ence) binary output Freddy C 100% 3-NAND-3-XOR slowest slew rate, low- (infer- noise digital pads, 16-bit ence) binary output Gary D 100% 3-NAND-3-XOR slowest slew rate, low- (infer- noise digital pads, 16-bit ence) binary output Table 6.1: Chiplet details. 111 Figure 6.6: The multiplexer selects the source of the recurrent connections to other reservoir blocks. Recurrent connections from before the buffer result in a faster configuration, but the slew rate of the output pad may be too large to keep up, resulting in significant signal degradation. Recurrent connections from after the buffer will be significantly slower in speed due to the load capacitance. 1pF loading 10pF loading Analog I/O pads 150MHz, 45MHz, 0.57mW 0.15mW Digital I/O pads 3GHz, 11.5mW 3GHz, 11.5mW Table 6.2: Slow configuration estimated frequency and maximum steady-state power draw. 1pF loading 10pF loading Analog I/O pads 3GHz, 10mW 3GHz, 10mW Digital I/O pads 3GHz, 11.5mW 3GHz, 11.5mW Table 6.3: Slow configuration estimated frequency and maximum steady-state power draw. 6.7 Inference chip details While output layers for four chips (Andy, Billy Charlie and Denny) must be computed externally to the chip, three chips (Ernie, Freddie and Gary) are capable of multiplying the 64 Boolean reservoir states by 10-bit weights, performing output calculations. While this scheme does not take advantage of the analog nature of the reservoir, it performs full inference once trained, avoiding the latency of 112 communicating with an external controller. Figure 6.7 shows a block diagram of the inference chips. Each of the blocks are separate Verilog modules which were tested independently for correct digital functionality. The chip was then synthesized entirely from Verilog. Figure 27 shows a picture of Ernie, a much more crowded layout than the reservoir chips. The majority of the utilization in the chip is taken up by the shift registers that store the weights (the only memory type available for this project), and the adder that sums all the weights. Control shift register: SR rescontrol is the shift register storing the reservoir control word (two of the constituent D-flip flops are visible in Figure 23). It has one readout for debugging purposes, SR C191. Reservoir: As before, the 64-node reservoir has three analog inputs fanning out to 6 internal blocks each. The reservoir has a reset signal. In all of the inference chips, the reservoir will have a frequency of about 4 GHz, as it is not loaded down by output pins. Reservoir buffer: Upon the rising edge of RB clk, the reservoir state is copied to a PIPO buffer. Reservoir buffer readout: Using a collection of 4 to 1 multiplexers, the four buffer select lines, Buf sel[0:3], can be used to read out the reservoir buffer four states at a time. This is necessary to compute the weights outside of the chip. Weight shift register: 64 weights of 10 bits each can be serially fed in using SR W data and SR W clk lines. The last D-flip flop at the end of the shift register has a readout, SR W639, for debugging. Mixed precision multiplier: The 64 one-bit states of the reservoir are mul- 113 tiplied by the 10-bit weights. One-bit multiplication is performed quite simply by an AND gate. Adder: The 64 outputs from the multiplier are then summed to produce a 16-bit binary output which goes to 16 pins on the chip. Figure 6.7: Block diagram of the inference chips (Ernie, Freddie, and Gary). Each box is a separately tested module. All of the logic was placed automatically using Cadence tools, which made layout easy. 6.8 Automated Design Workflow The design flow starts in Matlab, where a 64? 64 binary adjacency matrix is randomly populated, describing how logic blocks inside of the reservoir connect to each other. A simplified 3-block example is shown in figure 20. The configuration in the matrix shown in figure 6.8 (a) is used to auto-generate Verilog code 6.8 (c) describing the network 6.8 (b). 114 Figure 6.8: Network synthesis steps. (a) Is a randomly populated adjacency ma- trix. (b) is a diagram representing the same information as in matrix A. (c) is the equivalent Verilog code. Cadence Genus was used for logic synthesis, translating the structural Ver- ilog into synthesized Verilog describing the configuration using available standard cells. As network configurations like this are very different from the synchronous logic designs that these tools are typically used for, care was taken so that no log- ical simplification would happen in this process (for example, synthesis tools may simplify three inverters in a ring oscillator to only one inverter, as they are logi- cally equivalent). Cadence Innovus was used for floorplanning the chip, placing and routing the cells. Finally, the design was loaded into Cadence Virtuoso for manual error-checking and DRC. Screenshots from Innovus and Virtuoso, respectively, are shown in figure 6.9. 6.9 Current Progress ASIC samples were received in late 2020. Due to the needs of other groups with designs on the wafer, the top layer manufactured was a layer of vias which 115 Figure 6.9: (a) Cadence Innovus takes synthesized Verilog files, auto-places standard cells and auto-routes tracks between them. (b) The final design is imported into Cadence Virtuoso for DRC. are intended to connect the fifth and sixth metal layers. The sixth metal layer was not manufactured. For this reason, post-processing of the wafer in the UMD NanoCenter was necessary to deposit metal over the vias to create pads for wire bonding. Effort in early 2021 to metallize the chips was only partially successful. Several iterations of photomasks were made to expose only the pad areas; the final photomask that allowed for easy alignment was made using iron oxide which is transparent in the visible part of the spectrum but opaque to ultraviolet light. A 50nm titanium adhesion layer and 100nm of gold was deposited using the e-beam evaporator. However, the deposited metal lifted off the chip easily during an attempt at wire bonding, likely due to the presence of organic contaminants on the surface of the wafer which was not adequately removed. The existing metal will need to be etched off and the process attempted anew. A printed circuit board for testing the chip, shown in figure 6.10, was designed and manufactured. Each chip will be placed into a pin grid array (PGA) package 116 which will be inserted into a zero insertion force (ZIF) socket. The test board, shown in figure 6.10, routes each chip output to a coaxial connector. The chip?s digital controls will come from a Teensy3 microcontroller, which will be inserted into a socket directly on the board. A similar workflow that was developed for the PCB described in Chapter 5 will be applied to testing the ASIC. A group of chiplets positioned in a PGA package in the wire bonder is shown in figure 6.11, and the same group of chiplets viewed through the wire bonder?s microscope is shown in figure 6.12. 6.10 Looking Ahead For future work, it would be advantageous to use an open standard cell library, or even to design custom standard cells at the transistor level, so that waveforms will be measurable and publishable without concern. At this stage of research, an older process node and slower speed chips would only be advantageous, to allow for higher-density sampling of waveforms and/or using less specialized lab equipment. See Appendix A for all chip images, pinouts, and suggested connectivity in a PGA package. 6.11 Conclusion We designed and manufactured seven 180nm ASIC designs of reservoirs, which await testing. Three of the seven can perform inference, while the other four are 3https://www.pjrc.com/teensy 117 Figure 6.10: Printed circuit board for testing the ASIC. only reservoirs with direct readouts from every node. While reservoir topology in the ASIC is fixed, using control settings to change the logical function of each node in the reservoir allows for some amount of reconfigurability. To mitigate the risk of any one particular design not working well, a variety of designs with different adjacency matrices, logic gate mixtures, and I/O configurations were manufactured across the seven chips. A highly efficient workflow was developed in MATLAB and Cadence tools to 118 Figure 6.11: Chip inside of a PGA package situated in the wirebonder. generate a complete chip layout from an adjacency matrix. If initial tests with these chips are promising, many more chip configurations could be laid out in a mostly-automated manner for further testing. 119 Figure 6.12: A view of the chip through the microscope of the wirebonder in the UMD Nano Center. 120 Chapter 7: Reservoir Computing Applied to Particle Accelerator Beam Trajectory Prediction 7.1 Overview As part of this dissertation, we considered applications in which rapid time series processing hardware could be used1. While tasks such as realtime phoneme recognition may only require processing at the rate of human speech, other appli- cations such as active control of dynamical systems need much higher prediction rates. Particle accelerators are used in a wide variety of research applications as well as in industry and medicine. A challenge common to all accelerators is tracking and measuring beam parameters as the beam evolves from its source, through a long distance of beam pipe, to its final target. Diagnostics along this path are only present in a limited number of fixed locations. Machine learning techniques are gaining popularity to perform inference tasks such as ?virtual diagnostics,? or estimation of what instruments would read in locations where they cannot be placed [100]. In a particle accelerator, a particle bunch is confined in the beam pipe by a 1Portions of this chapter reproduced from Komkov et al. [27] 121 periodic focusing system with linear restoring forces. However, nonlinearities arise from a number of sources, including nonlinear magnets, nonidealities of physical components, beam-beam effects, and Coulomb repulsion between the particles (an effect known as space charge). These effects on the particle beam combine to create a dynamical system for which no complete explicit model exists. Prediction of the evolution of a particle beam can therefore be a challenge. In this chapter we use a software-based reservoir computer to predict the transverse beam evolution of an electron beam in the University of Maryland Electron Ring. 7.2 Accelerator Description and Simulation Setup The University of Maryland Electron Ring (UMER) is a 10 keV storage ring designed for the study of physics of high-current electron beams. UMER consists of 18 nearly identical sections 32 cm in length, each containing the same arrangement of steering and focusing magnets as shown in figure 7.1. Each section consists of dipole magnets, which steer the beam, and quadrupole magnets, which confine the beam within the pipe. Diagnostics include imaging screens at regular locations around the ring. The electron bunch current can be varied from 150?A to 80 mA, with larger currents corresponding to stronger nonlinear forces due to space charge. While imaging screens placed in the beam?s path are a destructive diagnostic, they provide a wealth of information about the beam?s transverse characteristics. UMER has 11 phosphor screens which can be inserted into the ring to intercept the beam on its first turn. In 3 locations, an electrostatic deflector can kick the beam 122 into a screen located to the side, allowing any turn to be visualized. More details about UMER can be found in Kishek et al. [101] and Dovlatyan et al. [102]. The size of the electron beam is governed by the envelope equations: 2 x?? 2Ksp  (s) + k2x(s)? ? x = 0 (7.1) [x(s) + y(s)] x3(s) ?? 2K 2 sp  y (s) + k2y(s)? ? y = 0 (7.2) [x(s) + y(s)] y3(s) where k is the linear restoring force provided by quadrupoles, Ksp is the beam perveance (space charge) parameter, and 2x,y is the emittance, a pressure-like term [103]. These are all constants set by the initial conditions of the beam and the settings of magnets in the accelerator lattice. The beam perveance, a dimensionless parameter proportional to the beam current, determines the amount of the nonlinear Coulomb repulsion force in the system, where Ksp = 0 is a purely linear system. For the simulations in this paper a 0.6 mA beam with Ksp = 9.029? 10?6 is used. The results here are from a physics-based simulation of UMER made in WARP, a 3D particle-in-cell code for modeling plasmas and high current particle beams [104]. Simulations are initialized with 10, 000 particles in a semi-Gaussian initial beam distribution, and are checked to converge with increasing particle number. Images of the transverse particle distribution at 288 evenly spaced locations around the ring are extracted from WARP. While WARP tracks every particle indi- 123 vidually, the images are purposely downsampled to 15x15 pixels as shown in figure 7.1 (b) for the sake of the reservoir speed. Each pixel represents a physical size of about 400 microns. Pixel values are normalized to be between 0 and 1. Images are generated at a close enough spacing to resolve centroid and envelope oscillations around the ring. Dipoles Quadrupoles Imaging Screens Electron gun Pipe Flange Figure 7.1: A diagram of the University of Maryland Electron Ring (UMER). In some accelerators, only the first turn can be visualized with a phosphor screen. For that reason, we use only the first turn for training. A 4000-node reservoir in MATLAB is trained on the first 288 images, which is the full first turn around the ring, and generates predicted images with equal spacing in the next two turns, 124 High Resolution (2048x2048) Downsized Resolution (64x64) Figure 7.2: (left) A high resolution simulated beam profile from WARP image. (right) A downsampled image used for predictions. as shown in figure 7.2. To prevent overfitting, 2% noise is added to training data. Results are generally observed to degrade with fewer nodes and not to improve with more nodes. The value of each pixel is presented to a subset of nodes in the reservoir determined by randomly initialized matrix Win. For each image in the series, the output of the reservoir is trained to match the subsequent image to learn the dynamics of the particle beam. The reservoir is then allowed to run freely for prediction. The output layer is trained using least squares optimization. The reservoir is sensitive to a variety of hyperparameters, including input strength (the magnitude of values in Win), spectral radius (the size of the largest eigenvalue of A), ? (the memory parameter), regularization parameter (a parameter that determines the amount of error allowed during training), reservoir size, and the particular choice of adjacency matrix A. Reservoir size must be large in comparison to dimensionality of input data, but there is no known way to optimize A other than through trial and error. After A is fixed, the other hyperparameters are manually 125 Figure 7.3: Training is done on the first turn, and the subsequent two turns are predicted. tuned until a satisfactory result is obtained. The reservoir is generated and the result is computed in under a minute on a PC. To flatten data from the image series, four parameters are extracted from the predicted images: the centroid motion and beam size in both vertical and horizontal planes. These are the first and second central moments of the images, (?01, ?10, ?20, ?02), and are calculated using the following equation: ?? ?pq = (x? x?)p(y ? y?)qI(x, y) (7.3) x y where I(x, y) is the pixel intensity, x, y are the pixel locations, and p, q are the 2D moment orders. Centroid motion evolves in a semi-periodic fashion dictated by the magnetic focusing lattice. The beam envelope, which evolves according to the envelope equa- tions, is subject to substantial nonlinearities, and is aperiodic. Cross moments such as ?11 can appear if there are rotational offsets in magnets, which cause x and y motion to couple, but for these initial studies there is no skew in the magnets. 126 127 Prediction of Beam Moments Simulation Prediction Error 4.00 4.00 3.00 3.00 10 (mm) 01 2.00 2.00 1.00 1.00 0.00 0.00 0 100 200 300 400 500 600 0 100 200 300 400 500 600 1.75 2.00 1.50 1.50 1.25 20 (mm) 02 1.00 1.00 0.75 0.50 0.50 0.25 0.00 0.00 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Time steps Figure 7.4: Plots of four beam parameters: horizontal and vertical centroid position (?10 , ?01) and beam size (?20 , ?02), from simulation and from reservoir prediction. The error is measured as the absolute difference between the simulation and prediction values. Error Turn 1 Turn 2 ?10 1.6% 5.0% ?01 5.4% 14.3% ?20 21.1% 37.7% ?02 37.5% 45.0% Table 7.1: Average errors calculated between moments from simulations and mo- ments from reservoir predictions, in the first and second predicted turns. 7.3 Results The results, summarized in figure 7.4, show good agreement between predic- tion and experiment, particularly in tracking the relatively regular centroid motion. The beam envelope is subject to more nonlinear forces, and the reservoir?s error is relatively higher, yet the general character of the time series is captured, with the prediction closely matching the periodicity of the data from simulation. Error per- centages for each turn are summarized in table 7.3. We plan to use measured data as the initial conditions in a reservoir computer trained with simulated data and compare the reservoir computer?s forecast to real measurements at available beam imaging locations. 7.4 Conclusion We have shown promising initial results for prediction of transverse electron beam parameters using a reservoir computer in MATLAB, allowing beam properties to be inferred in locations where they cannot be physically measured. Time series forecasting using reservoir computing could be utilized in active control scenarios, 128 in particle accelerators or other systems. 129 Chapter 8: Conclusion 8.1 Summary Reservoir computing in hardware leverages the computational properties of dynamical systems at the edge of criticality for computation. A CMOS implemen- tation is of interest because it has potential for seamless integration into modern electronics manufacturing. In this thesis, we have explored Boolean network circuits for reservoir computing using various hardware platforms, and we designed an ASIC that awaits testing. Using an FPGA-based Boolean reservoir described in Chapter 4, we demon- strated RF waveform classification accuracy competitive with a state of the art con- volutional neural network with an order of magnitude fewer trainable parameters. We extensively probed the parameter space of network configurations using a con- trol word to set node functions, and developed dataset-agnostic methods for rapidly discarding unsuitable network configurations. This work is based upon designs in [21] and published in [23]. In Chapter 5, we showed the design of a printed circuit board reservoir made with discrete logic chips. While compared to the 2048 outputs of the FPGA there were many fewer channels that could be simultaneously measured, we were able 130 to sample the analog waveforms with high density and measure a machine learning improvement in a waveform classification task. Across Boolean network implementa- tions in different platforms, we found consistent measurements of network behavior including self-activity and transient time as a function of Boolean sensitivity. This experiment awaits publication in [24]. The design of a reservoir computing ASIC is shown in Chapter 6, based on the designs in patent application [25]. An efficient workflow for rapid chip layout using Cadence tools was developed, and seven different chip designs await post-processing and testing. Finally, we showed a potential application of reservoir computing in doing rapid time series prediction using data from the University of Maryland Electron Ring. Training on time series from a simulation of a high-current electron beam subject to nonlinear forces, we predicted the shape and trajectory of this beam as it traveled around the storage ring, an experiment published in [27]. While this was a software study, trajectory prediction is an example of an application that could be accelerated in hardware with a Boolean network co-processor. 8.2 Future Vision A free-running, unclocked Boolean circuit reservoir circuit can, within a time scale on the order of a few transition times of the logic gates, perform massively parallel recurrent neural network computations that would take a conventional CPU or GPU substantially longer. There is potential to accelerate time series regression 131 or classification tasks considerably. However, an unclocked Boolean circuit reservoir would not be a general-purpose neural network accelerator, nor would it be able to implement pre-trained models. Due to the high sensitivity to device variations of these edge-of-chaos circuits, each circuit would have to be separately trained. On the other hand, use cases where rapid re-training is necessary are precisely the niche that unclocked Boolean circuits fulfill. These applications include rapid time series prediction, in which the reservoir computer must quickly train on the first portion of a time series, and then perform a free-running prediction. Another application is time series classification, including in rapidly changing environments. Finally, the reservoir?s high speed and parallelism can also be used for static appli- cations, although this does not fully take advantage of their temporal capabilities. A specific example of a potential use case is the active control of tokamaks and particle accelerators, both of which which need to abort their operation in advance of destructive instabilities to avoid machine damage. While we have demonstrated a few variants of unclocked Boolean circuits for reservoir computing, reservoir optimization is still an active area of research?both in hardware and in software. In the Boolean reservoir case, quantifying how much information can be extracted from pulse timing versus analog voltages is open ques- tion. Overall, the work done in this thesis is an encouraging step towards obtaining more processing power from limited chip resources in a reservoir co-processor. 132 Appendix A: ASIC Details 133 134 Figure A.1: Chiplet 1, Andy. 135 Figure A.2: Chiplet 2, Billy. 136 Figure A.3: Chiplet 3, Charlie. 137 Figure A.4: Chiplet 4, Denny. 138 Figure A.5: Chiplet 5, Ernie. 139 Figure A.6: Chiplet 6, Freddy. 140 Figure A.7: Chiplet 7, Gary. Figure A.8: Overall placement of the chiplets on the wafer. 141 142 Figure A.9: Pinout of chiplets 1, 2 and 4. 143 Figure A.10: Pinout of chiplet 3. 144 Figure A.11: Pinout of chiplets 5, 6, and 7. 145 Figure A.12: PGA package internal connections for chiplets 1, 2, and 4 146 Figure A.13: PGA package internal connections for chiplet 3. 147 Figure A.14: PGA package internal connections for chiplets 5, 6 and 7 Bibliography [1] ITU, ?Measuring digital development: Facts and figures,? tech. rep., Interna- tional Telecommunications Union, 2019. [2] ITU, ?Measuring Digital Development: ICT Price Trends,? tech. rep., Inter- national Telecommunications Union, 2019. [3] G. E. Moore, ?Cramming more components onto integrated circuits,? Pro- ceedings of the IEEE, vol. 86, pp. 82?85, apr 1998. [4] T. Cross, ?Technology Quarterly: After Moore?s law,? The Economist, mar 2016. [5] M. Horowitz, ?Computing?s Energy Problem (and what we can do about it),? in ISSCC, pp. 10?14, IEEE, 2014. [6] W. Arden, M. Brilloue?t, P. Cogez, M. Graef, B. Huizing, and R. Mahnkopf, ??More-than-Moore? White Paper,? tech. rep., International Technology Roadmap for Semiconductors (ITRS), 2010. [7] C. E. Leiserson, N. C. Thompson, J. S. Emer, B. C. Kuszmaul, B. W. Lamp- son, D. Sanchez, and T. B. Schardl, ?There?s plenty of room at the top: What will drive computer performance after Moore?s law?,? Science, vol. 368, jun 2020. [8] ?Beyond CMOS,? tech. rep., International Technology Roadmap for Semicon- ductors (ITRS), 2015. [9] R. O. Topaloglu and H. S. Wong, eds., Beyond-CMOS technologies for next generation computer design. Springer, 2019. [10] A. Samuel, ?Some Studies in Machine Learning Using the Game of Checkers,? IBM Journal of Research and Development, 1959. [11] Y. Jia, R. J. Weiss, F. Biadsy, W. Macherey, M. Johnson, Z. Chen, and Y. Wu, ?Direct speech-to-speech translation with a sequence-to-sequence model,? Pro- ceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 1123?1127, 2019. 148 [12] D. Silver, A. Huang, and C. J. Maddison, ?Mastering the game of Go with deep neural networks and tree search,? Nature, vol. 529, no. 7587, 2016. [13] O. Vinyals, I. Babuschkin, and W. M. Czarnecki, ?Grandmaster level in Star- Craft II using multi-agent reinforcement learning,? Nature, vol. 575, no. 7782, pp. 350?354, 2019. [14] M. M. Morando, Q. Tian, L. T. Truong, and H. L. Vu, ?Studying the Safety Impact of Autonomous Vehicles Using Simulation-Based Surrogate Safety Measures,? Journal of Advanced Transportation, 2018. [15] J. Dean, ?The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design,? Digest of Technical Papers - IEEE Interna- tional Solid-State Circuits Conference, pp. 8?14, 2020. [16] N. C. Thompson, K. Greenewald, K. Lee, and G. F. Manso, ?The Computa- tional Limits of Deep Learning,? tech. rep., 2020. [17] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, ?Xnor-net: Imagenet classification using binary convolutional neural networks,? in European con- ference on computer vision, pp. 525?542, Springer, 2016. [18] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, ?Binarized Neural Networks: Training Deep Neural Networks with Weights and Activa- tions Constrained to +1 or -1,? in Neural Information Processing Systems, 2016. [19] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, ?Edge Computing: Vision and Challenges,? IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637?646, 2016. [20] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaem- maghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khai- tan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Ma- hony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Sama- diani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, ?In-Datacenter Performance Analy- sis of a Tensor Processing Unit,? in Proceedings of the 44th Annual Inter- national Symposium on Computer Architecture, ISCA ?17, (New York, NY, USA), pp. 1?12, Association for Computing Machinery, 2017. 149 [21] H. Komkov, A. Restelli, B. Hunt, L. Shaughnessy, I. Shani, and D. Lathrop, ?The recurrent processing unit: Hardware for high speed machine learning,? arXiv preprint arXiv:1912.07363, 2019. [22] I. Shani, L. Shaughnessy, J. Rzasa, A. Restelli, B. R. Hunt, H. Komkov, and D. P. Lathrop, ?Dynamics of analog logic-gate networks for machine learning,? Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 29, no. 12, p. 123130, 2019. [23] H. Komkov, L. Pocher, A. Restelli, D. P. Lathrop, and B. Hunt, ?RF Signal Classification using Boolean Reservoir Computing on an FPGA,? in IJCNN Int Jt Conf Neural Network, 2021. [24] H. Komkov, L. Pocher, A. Restelli, B. Hunt, and D. P. Lathrop, ?Reservoir Computing Using Networks of CMOS Logic Gates,? in International Confer- ence on Neuromorphic Systems, 2021. [25] A. Restelli, D. Lathrop, and H. Komkov, ?United States Patent Application 17/208,399: Variable Sensitivity Node,? April 2021. [26] D. Lathrop, I. Shani, P. Megson, A. Restelli, and A. R. Mautino, ?United States Patent WO2018213399A1: Integrated circuit designs for reservoir com- puting and machine learning,? 2018. [27] H. Komkov, A. Perevalov, L. Dovlatyan, D. D. P. Lathrop, A. Perevalov, and D. D. P. Lathrop, ?Reservoir Computing for Prediction of Beam Evolution in Particle Accelerators,? in NeurIPS ML4PS Workshop, 2019. [28] D. E. Rumelhart, G. E. Hinton, R. J. Williams, and Rumelhart, ?Learning Representations by Back-Propagating Errors,? in Neurocomputing: Founda- tions of Research, pp. 696?699, Cambridge, MA, USA: MIT Press, 1988. [29] D. P. Kingma and J. Ba, ?Adam: A method for stochastic optimization,? arXiv preprint arXiv:1412.6980, 2014. [30] H. Jaeger, ?The ?echo state? approach to analysing and training recurrent neural networks-with an Erratum note,? Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, vol. 148, no. 34, p. 13, 2001. [31] W. Maass, T. Natschla?ger, and H. Markram, ?Real-time computing without stable states: A new framework for neural computation based on perturba- tions,? Neural Computation, vol. 14, no. 11, pp. 2531?2560, 2002. [32] J. Jiang and Y.-C. Lai, ?Model-free prediction of spatiotemporal dynamical systems with recurrent neural networks: Role of network spectral radius,? Phys. Rev. Research, vol. 1, p. 33056, oct 2019. 150 [33] H. Jaeger, ?Short term memory in echo state networks,? GMD Report 152, no. December, p. 60, 2002. [34] H. Jaeger, ?Long Short-Term Memory in Echo State Networks: Details of a Simulation Study,? tech. rep., School of Engineering and Science, Jacobs University Bremen, 2012. [35] A. Goudarzi, M. R. Lakin, and D. Stefanovic, ?DNA Reservoir Computing: A Novel Molecular Computing Approach,? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8141 LNCS, no. 1, pp. 76?89, 2013. [36] W. Maass, T. Natschla?ger, and H. Markram, ?Fading memory and kernel prop- erties of generic cortical microcircuit models,? Journal of Physiology Paris, vol. 98, no. 4-6 SPEC. ISS., pp. 315?330, 2004. [37] W. Maass, R. Legenstein, and N. Bertschinger, ?Methods for estimating the computational power and generalization capability of neural microcircuits,? Advances in Neural Information Processing Systems, 2005. [38] G. Cybenko, ?Mathematics of Control, Signals, and Systems Approximation by Superpositions of a Sigmoidal Function*,? Math. Control Signals Systems, vol. 2, pp. 303?314, 1989. [39] K. Hornik, ?Approximation capabilities of multilayer feedforward networks,? Neural Networks, vol. 4, no. 2, pp. 251?257, 1991. [40] L. Grigoryeva and J.-p. Ortega, ?Echo State Networks are Universal,? Neural Networks, vol. 108, pp. 495?508, 2018. [41] C. Gallicchio, A. Micheli, and L. Pedrelli, ?Comparison between DeepESNs and gated RNNs on multivariate time-series prediction,? in 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, no. April, pp. 619?624, 2019. [42] P. R. Vlachas, J. Pathak, B. R. Hunt, T. P. Sapsis, M. Girvan, E. Ott, and P. Koumoutsakos, ?Backpropagation algorithms and Reservoir Computing in Recurrent Neural Networks for the forecasting of complex spatiotemporal dy- namics,? Neural Networks, vol. 126, pp. 191?217, 2020. [43] J. Pathak, B. Hunt, M. Girvan, Z. Lu, and E. Ott, ?Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach,? Physical Review Letters, vol. 120, no. 2, 2018. [44] J. Pathak, Z. Lu, B. R. Hunt, M. Girvan, and E. Ott, ?Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data,? Chaos, vol. 27, no. 12, 2017. 151 [45] S. Krishnagopal, M. Girvan, E. Ott, and B. R. Hunt, ?Separation of chaotic signals by reservoir computing,? Chaos, vol. 30, no. 2, p. 23123, 2020. [46] H. Haas and H. Jaeger, ?Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication,? Science, vol. 304, no. 5667, pp. 78?80, 2004. [47] P. Buteneers, B. Schrauwen, D. Verstraeten, and D. Stroobandt, ?Epileptic seizure detection using Reservoir Computing,? Proceedings of the 19th Annual Workshop on Circuits, Systems and Signal Processing, pp. 1?4 (CD?ROM), 2008. [48] P. Antonik, M. Hermans, M. Haelterman, and S. Massar, ?Random Pattern and Frequency Generation Using a Photonic Reservoir Computer with Output Feedback,? Neural Processing Letters, vol. 47, no. 3, pp. 1041?1054, 2018. [49] F. Triefenbach, A. Jalalvand, B. Schrauwen, and J.-P. Martens, ?Phoneme Recognition with Large Hierarchical Reservoirs,? in Advances in neural infor- mation processing systems, pp. 2307 ? 2317, 2010. [50] L. Larger, A. Baylo?n-Fuentes, R. Martinenghi, V. S. Udaltsov, Y. K. Chembo, and M. Jacquot, ?High-speed photonic reservoir computing using a time-delay- based architecture: Million words per second classification,? Physical Review X, vol. 7, no. 1, 2017. [51] D. M. Canaday, Modeling and Control of Dynamical Systems with Reservoir Computing. Doctoral thesis, Ohio State University, 2019. [52] A. Vandesompele, G. Urbain, F. Wyffels, and J. Dambre, ?Populations of spiking neurons for reservoir computing: Closed loop control of a compliant quadruped,? Cognitive Systems Research, vol. 58, pp. 317?323, dec 2019. [53] H. Soh and Y. Demiris, ?Iterative temporal learning and prediction with the sparse online echo state gaussian process,? Proceedings of the International Joint Conference on Neural Networks, pp. 10?15, 2012. [54] L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, et al., ?Information processing using a single dynamical node as complex system,? Nature Communications, vol. 2, p. 468, Sept. 2011. [55] J. D. Hart, L. Larger, T. E. Murphy, and R. Roy, ?Delayed dynamical sys- tems: Networks, chimeras and reservoir computing,? Philosophical Transac- tions of the Royal Society A: Mathematical, Physical and Engineering Sci- ences, vol. 377, no. 2153, 2019. [56] D. Brunner, B. Penkovsky, B. A. Marquez, M. Jacquot, I. Fischer, and L. Larger, ?Tutorial: Photonic neural networks in delay systems,? Journal of Applied Physics, vol. 124, no. 15, p. 152004, 2018. 152 [57] X. Huang, T. H. Klinge, J. I. L. B, X. Li, and J. H. Lutz, Unconventional Computation and Natural Computation, vol. 7445. 2012. [58] J. Tang, C. Deng, and G. B. Huang, ?Extreme Learning Machine for Multi- layer Perceptron,? IEEE Transactions on Neural Networks and Learning Sys- tems, vol. 27, no. 4, pp. 809?821, 2016. [59] G. Tanaka, T. Yamane, J. B. He?roux, R. Nakane, N. Kanazawa, S. Takeda, H. Numata, D. Nakano, and A. Hirose, ?Recent advances in physical reservoir computing: A review,? Neural Networks, vol. 115, pp. 100?123, jul 2019. [60] C. G. Langton, ?Computation at the edge of chaos: Phase transitions and emergent computation,? Physica D: Nonlinear Phenomena, vol. 42, no. 1, pp. 12?37, 1990. [61] L. Cocchi, L. L. Gollo, A. Zalesky, and M. Breakspear, ?Criticality in the brain: A synthesis of neurobiology, models and cognition,? 2017. [62] K. Nakajima, H. Hauser, T. Li, and R. Pfeifer, ?Information processing via physical soft body,? Scientific Reports, vol. 5, pp. 1?11, 2015. [63] V. C. Mu?ller and M. Hoffmann, ?What Is Morphological Computation? On How the Body Contributes to Cognition and Control,? Artificial Life, vol. 23, no. 1, pp. 1?24, 2017. [64] H. Hauser, A. J. Ijspeert, R. M. Fu?chslin, R. Pfeifer, and W. Maass, ?To- wards a theoretical foundation for morphological computation with compliant bodies,? Biological Cybernetics, vol. 105, no. 5-6, pp. 355?370, 2011. [65] G. Dion, S. Mejaouri, and J. Sylvestre, ?Reservoir computing with a single delay-coupled non-linear mechanical oscillator,? Journal of Applied Physics, vol. 124, p. 152132, oct 2018. [66] K. Caluwaerts, M. D?Haene, D. Verstraeten, and B. Schrauwen, ?Locomo- tion without a brain: Physical reservoir computing in Tensegrity structures,? Artificial Life, vol. 19, pp. 35?66, jan 2013. [67] C. Fernando and S. Sojakka, ?Pattern Recognition in a Bucket,? in Advances in Artificial Life (W. Banzhaf, J. Ziegler, T. Christaller, P. Dittrich, and J. T. Kim, eds.), vol. 2801, (Berlin, Heidelberg), pp. 588?597, Springer Berlin Heidelberg, 2003. [68] X. Sui, Q. Wu, J. Liu, Q. Chen, and G. Gu, ?A review of optical neural networks,? IEEE Access, vol. 8, pp. 70773?70783, 2020. [69] J. M. Shainline, ?The Largest Cognitive Systems Will be Optoelectronic,? 2018 IEEE International Conference on Rebooting Computing, ICRC 2018, pp. 1?10, 2019. 153 [70] H. J. Caulfield, J. Kinser, and S. K. Rogers, ?Optical Neural Networks,? Proceedings of the IEEE, vol. 77, no. 10, pp. 1573?1583, 1989. [71] F. Laporte, J. Dambre, and P. Bienstman, ?Neuromorphic Computing with Signal-Mixing Cavities,? 2018 IEEE International Conference on Rebooting Computing (ICRC), pp. 1?4, 2019. [72] D. Brunner and I. Fischer, ?Reconfigurable semiconductor laser networks based on diffractive coupling,? Optics Letters, vol. 40, no. 16, p. 3854, 2015. [73] G. Van Der Sande, D. Brunner, and M. C. Soriano, ?Advances in photonic reservoir computing,? Nanophotonics, vol. 6, no. 3, pp. 561?576, 2017. [74] B. Schrauwen, M. D?Haene, D. Verstraeten, J. Van Campenhout, and J. V. Campenhout, ?Compact hardware liquid state machines on FPGA for real- time speech recognition,? Neural Networks, vol. 21, no. 2-3, pp. 511?523, 2008. [75] B. Penkovsky, L. Larger, and D. Brunner, ?Efficient design of hardware- enabled reservoir computing in FPGAs,? Journal of Applied Physics, vol. 124, no. 16, p. 162101, 2018. [76] M. C. Soriano, S. Ortin, L. Keuninckx, L. Appeltant, J. Danckaert, L. Pes- quera, G. der Sande, S. Ort??n, L. Keuninckx, L. Appeltant, J. Danckaert, L. Pesquera, and G. V. D. Sande, ?Delay-Based Reservoir Computing: Noise Effects in a Combined Analog and Digital Implementation,? IEEE transac- tions on neural networks and learning systems, vol. 26, no. 2, pp. 388?393, 2015. [77] J. H. Jensen and G. Tufte, ?Reservoir Computing with a Chaotic Circuit,? in Artificial Life, no. September, pp. 222?229, MIT Press, 2017. [78] M. L. Alomar, V. Canals, N. Perez-Mora, V. Mart??nez-Moll, and J. L. Rossello?, ?FPGA-Based Stochastic Echo State Networks for Time-Series Forecasting,? Computational Intelligence and Neuroscience, p. 3917892, 2016. [79] N. D. Haynes, M. C. Soriano, D. P. Rosin, I. Fischer, and D. J. Gauthier, ?Reservoir computing with a single time-delay autonomous Boolean node,? Physical Review E, vol. 91, no. 2, p. 20801, 2015. [80] D. Canaday, A. Griffith, and D. J. Gauthier, ?Rapid Time Series Prediction with a Hardware-Based Reservoir Computer,? Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 28, no. 12, p. 123119, 2018. [81] S. A. Kauffman, ?Metabolic Stability and Epigenesis in Randomly Con- structed Genetic Nets,? Journal of Theoretical Biology, vol. 22, no. 3, pp. 437? 467, 1969. 154 [82] D. Dee and M. Ghil, ?Boolean Difference Equations: Formulations and Dy- namic Behavior,? SIAM Journal on Applied Mathematics, vol. 44, no. 1, pp. 111?126, 1984. [83] M. Ghil and A. Mullhaupt, ?Boolean Delay Equations II. Periodic and Ape- riodic Solutions,? Journal of Statistical Physics, vol. 41, no. 1, pp. 125?173, 1985. [84] E. Farcot, S. Best, R. Edwards, I. Belgacem, X. Xu, and P. Gill, ?Chaos in a ring circuit,? Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 29, no. 4, p. 43103, 2019. [85] D. P. Rosin, D. Rontani, D. J. Gauthier, and E. Scho?ll, ?Experiments on autonomous Boolean networks,? Chaos: An Interdisciplinary Journal of Non- linear Science, vol. 23, no. 2, p. 25102, 2013. [86] D. P. Rosin, ?Ultra-fast physical generation of random numbers using hybrid boolean networks,? in Dynamics of Complex Autonomous Boolean Networks, pp. 57?79, Springer, 2015. [87] D. P. Rosin, Dynamics of Complex Autonomous Boolean Networks. Springer, 2014. [88] R. Zhang, H. L. D. d. S. Cavalcante, Z. Gao, D. J. Gauthier, J. E. S. Socolar, M. M. Adams, D. P. Lathrop, H. L. Hugo, Z. Gao, D. J. Gauthier, J. E. S. Socolar, M. M. Adams, and D. P. Lathrop, ?Boolean chaos,? Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, vol. 80, no. 4, pp. 1?4, 2009. [89] H. L. De Cavalcante, D. J. Gauthier, J. E. Socolar, and R. Zhang, ?On the origin of chaos in autonomous Boolean networks,? Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 368, no. 1911, pp. 495?513, 2010. [90] D. P. Rosin, D. Rontani, and D. J. Gauthier, ?Synchronization of coupled Boolean phase oscillators,? Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, vol. 89, no. 4, pp. 1?7, 2014. [91] D. P. Rosin, D. Rontani, N. D. Haynes, E. Scho?ll, and D. J. Gauthier, ?Tran- sient scaling and resurgence of chimera states in networks of Boolean phase os- cillators,? Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 2014. [92] D. P. Rosin, D. Rontani, and D. J. Gauthier, ?Ultrafast physical generation of random numbers using hybrid Boolean networks,? Physical Review E, vol. 87, no. 4, p. 040902, 2013. [93] M. Lukos?evic?ius, ?A Practical Guide to Applying Echo State Networks,? in Neural Networks: Tricks of the Trade, pp. 659?686, Springer, 2012. 155 [94] C. G. Langton, ?Computation at the edge of chaos: Phase transitions and emergent computationhase transitions and emergent computation,? Physica D: Nonlinear Phenomena, vol. 42, no. 1-3, pp. 12?37, 1990. [95] T. Natschla?ger, N. Bertschinger, and R. Legenstein, ?At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks,? tech. rep., 2005. [96] P. Baranc?ok and I. Farkas?, ?Memory Capacity of Input-Driven Echo State Networks at the Edge of Chaos,? in International Conference on Artificial Neural Networks, pp. 41?48, Springer, 2014. [97] D. Snyder, A. Goudarzi, and C. Teuscher, ?Computational capabilities of random automata networks for reservoir computing,? Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, vol. 87, no. 4, pp. 1?8, 2013. [98] T. J. O?Shea and N. West, ?Radio Machine Learning Dataset Generation with GNU Radio,? in Proceedings of the GNU Radio Conference, vol. 1, 2016. [99] R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica, ?Tune: A Research Platform for Distributed Model Selection and Training,? arXiv preprint arXiv:1807.05118, 2018. [100] A. Edelen, C. Mayes, D. Bowring, D. Ratner, A. Adelmann, R. Ischebeck, J. Snuverink, I. Agapov, R. Kammering, J. Edelen, I. Bazarov, G. Valentino, and J. Wenninger, ?Opportunities in machine learning for particle accelera- tors,? 2018. [101] R. A. Kishek, B. L. Beaudoin, S. Bernal, M. Cornacchia, D. Feldman, et al., ?The University of Maryland Electron Ring Program,? Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrometers, De- tectors and Associated Equipment, vol. 733, pp. 233?237, Jan. 2014. [102] L. Dovlatyan, D. Matthew, I. Haber, B. Beaudoin, T. Antonsen, and K. Ruis- ard, ?Preliminary Lattice Studies for the Single-Invariant Optics Experiment at the University of Maryland,? 2019. [103] M. Reiser, Theory and Design of Charged Particle Beams. Wiley, Mar. 2008. [104] A. Friedman, R. H. Cohen, D. P. Grote, S. M. Lund, W. M. Sharp, et al., ?Computational methods in the warp code framework for kinetic simula- tions of particle beams and plasmas,? IEEE Transactions on Plasma Science, vol. 42, no. 5, pp. 1321?1334, 2014. 156