ABSTRACT

Title of Dissertation: DEVELOPING MACHINE LEARNING TECHNIQUES
FOR NETWORK CONNECTIVITY INFERENCE
FROM TIME-SERIES DATA

Amitava Banerjee
Doctor of Philosophy, 2022

Dissertation Directed by: Professor Edward Ott
Department of Physics
Department of Electrical and Computer Engineering
Institute for Research in Electronics and Applied Physics

Inference of the connectivity structure of a network from the observed dynamics of the states

of its nodes is a key issue in science, with wide-ranging applications such as determination of the

synapses in nervous systems, mapping of interactions between genes and proteins in biochemical

networks, distinguishing ecological relationships between different species in their habitats etc.

In this thesis, we show that certain machine learning models, trained for the forecasting of

experimental and synthetic time-series data from complex systems, can automatically learn the

causal networks underlying such complex systems. Based on this observation, we develop new

machine learning techniques for inference of causal interaction network connectivity structures

underlying large, networked, noisy, complex dynamical systems, solely from the time-series of

their nodal states.

In particular, our approach is to first train a type of machine learning architecture, known as


the ‘reservoir computer’, to mimic the measured dynamics of an unknown network. We then use

the trained reservoir computer system as an in silico computational model of the unknown network

to estimate how small changes in nodal states propagate in time across that network. Since small

perturbations of network nodal states are expected to spread along the links of the network, the

estimated propagation of nodal state perturbations reveal the connections of the unknown network.

Our technique is noninvasive, but is motivated by the widely used invasive network inference

method, whereby the temporal propagation of active perturbations applied to the network nodes

are observed and employed to infer the network links (e.g., tracing the effects of knocking down

multiple genes, one at a time, can be used infer gene regulatory networks).

We discuss how we can further apply this methodology to infer causal network structures

underlying different time-series datasets and compare the inferred network with the ground truth

whenever available. We shall demonstrate three practical applications of this network inference

procedure in (1) inference of network link strengths from time-series data of coupled, noisy

Lorenz oscillators, (2) inference of time-delayed feedback couplings in opto-electronic oscillator

circuit networks designed the laboratory, and, (3) inference of the synaptic network from publicly-

available calcium fluorescence time-series data of C. elegans neurons. In all examples, we also

explain how experimental factors like noise level, sampling time, and measurement duration

systematically affect causal inference from experimental data.

The results show that synchronization and strong correlation among the dynamics of

different nodal states are, in general, detrimental for causal network inference. Features that

break synchrony among the nodal states, e.g., coupling strength, network topology, dynamical

noise, and heterogeneity of the parameters of individual nodes, help the network inference.

In fact, we show in this thesis that, for parameter regimes where the network nodal states are


not synchronized, we can often achieve perfect causal network inference from simulated and

experimental time-series data, using machine learning techniques, in a wide variety of physical

systems. In cases where effects like observational noise, large sampling time, or small sampling

duration hinder such perfect network inference, we show that it is possible to utilize specially-

designed surrogate time-series data for assigning statistical confidence to individual inferred

network links.

Given the general applicability of our machine learning methodology in time-series

prediction and network inference, we anticipate that such techniques can be used for better

model-building, forecasting, and control of complex systems in nature and in the lab.


DEVELOPING MACHINE LEARNING TECHNIQUES FOR
NETWORK CONNECTIVITY INFERENCE FROM TIME-SERIES DATA

by

Amitava Banerjee

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2022

Advisory Committee:
Professor Edward Ott, Chair/Advisor
Professor Rajarshi Roy
Professor Michelle Girvan
Professor Yanne K. Chembo
Professor Pratyush Tiwary


© Copyright by
Amitava Banerjee

2022


Preface

Causality in complex systems

This thesis is about inference of cause and effect relations from data. In complex systems

where there are many interacting agents, such causal interactions form networks that ultimately

dictate how the system as a whole behaves over time. Examples of such systems include neurons

in our brain affecting firing patterns of one another, predators and preys in an ecosystems causing

growth and decline of each other’s population, genes in our cells turning each other on and off, and

so on. A general theme of the study of networked dynamical systems has been identifying how

these individual causal interactions between parts of such complex systems result in the collective

dynamics of these systems as a whole (e.g., how do neurons in the visual cortex fire together when

we see an image). An artist’s impression of how the functioning of a multi-component system

could depend on the cause and effect interactions among its parts is depicted in the drawing of an

example of a scheme known as ‘Rube Goldberg machines’ in Fig. 1.

Since the functioning of a complex system depends crucially on the causal interactions

among its parts, it is important to identify such interactions to construct mathematical models of

the system, or for the purpose of predicting or controlling such systems. Regarding this, a natural

question to ask is: given only the observed behavior of different parts of a complex system over

time, is it possible to construct the network of cause and effect interactions that connects them?

For example, looking at how neurons in the visual cortex fire over time, can we reconstruct how

they are coupled via synaptic interactions? This brings us to the question of casual inference -

ii


Figure 1: An example of a Rube Goldberg machine to light up the electric bulb shown in the right
with the boxing glove and spring shown in the left, from Scout Life website.

connecting a set of observed events or dynamical agents with causal interactions. In some cases,

such causal interactions among different agents can be simple and propagating linearly from one

agent to another, without any feedback and with each agent being causally affected by only one

other agent and causally affecting only one (a different) agent - like falling dominoes. An example

is given by the machine in Fig. 1. There, cause and effect interactions flow along the following

sequence, from left to right, as the extending of the gloved spring on the left eventually lights up

the bulb on the right: boxing glove and spring, bowling ball, bowling pin, left string, birdcage,

small falling ball, dominoes, axe, right string, hammer, weighing scale, model of human hand,

light bulb. But, in more general cases, if there are many agents in a complex system, the cause

iii

https://scoutlife.org/hobbies-projects/projects/159359/how-to-make-a-rube-goldberg-machine/


and effect interactions among them constitute a directed network. An example is shown in Fig. 2,

where the causal predator-prey interactions form a network among the Chesapeake Bay waterbirds.

In such cases, identifying causal interactions amounts to inferring the directed network of cause

and effect relationships among the agents of a complex system. Moreover, as we shall see later

in this thesis, unlike the example network shown in Fig. 2, there could be bi-directional causal

effects between different agents, adding to a further layer of complication in causal inference. In

this thesis, we shall devise and test methods to infer these causal networks in various complex

systems.

Figure 2: An example of a directed network of predator-prey interactions, from Wikipedia.

The ladder of causation

Since inference of causality in complex systems is of utmost importance in understanding

iv

https://en.wikipedia.org/wiki/Food_chain


their inner workings, it is imperative to think about the different levels at which the task of

causal inference can be achieved. A formalization of this is found from ‘The Book of Why’ by

Judea Pearl and Dana Mackenzie [1], where the authors classify causal inference to three levels

- association (‘seeing’), intervention (‘doing’), and counterfactual (‘imagining’) - forming the

‘ladder of causation’, as shown in Fig. 3, since each of the levels build on its lower levels. To

consider this in more details, let us use the example of baking butter cookies. Suppose we wish to

understand whether adding butter causes the cookie to have a certain texture. How do we go about

it?

Figure 3: The ladder of causation showing three levels of abilities of a causal learner, for the
example of baking a butter cookie.

To begin with, we need data. Suppose we have baked many butter cookies with different

proportions of the baking ingredients and recorded the outcomes each time. The data relates the

v


amount of butter in the cookies to different textures of the cookies, but this relationship is merely

an association, or, ‘correlation’, in the sense of statistics. In the language of probability, it means

to be able to calculate only the probability of a cookie to have a certain texture, given a certain

amount of butter in it (Fig. 3). This does not tell us for sure if the amount of butter is causing an

attribute of the texture of the cookie. So, this is the most basic information that we can extract

from the situation, if we only have the data available. At this stage, which is the first rung of the

ladder of causation, we can predict the texture of the cookie by the amount of butter in it - but

we do not know for sure whether it was the amount of the butter that caused this texture. This is

because we cannot answer questions like ‘what would the texture be like if the amount of butter

was halved?’ simply by looking at the data, since the amount of other ingredients are also varying

in the data. So, the change in texture when the amount of butter is changed may not actually be

due to the butter but, say because of the differing amounts of flour and sugar. So, relationships

based on simple associations can be misleading. How to know for sure that the butter is causing a

particular aspect of the texture?

The way to deduce if it was really the amount of butter that caused a particular texture

of the cookie is to do skillfully designed controlled experiments - baking many cookies with

different amounts of butter with the amount of all other ingredients and baking conditions held

fixed. This implies directly interacting with the system, using tools to actively intervene into it.

Such controlled experiments enable us to calculate the probability of a cookie to have a certain

texture, if the butter in it is changed by a certain amount (Fig. 3). This is the second rung of the

ladder of causation, and it involves doing the experiments, like the ones that generated the data of

rung one, in controlled way. As such, it assumes that we have an access into the system we wish

to model.

vi


The final rung of the ladder is imagining what would have happened if we changed the

conditions in a certain baking experiment, but, without doing any more experiment. So, suppose

we have done some experiments with different amounts of butter like in rung two, and maybe

others involving changing other ingredients as well. Then, we ask ”what would have happened if

we changed ingredient X in experiment number Y?” Unlike rung two, we cannot do an experiment

again. So this requires an additional level of sophistication - the intuition of an experienced baker.

What does this imply for causal inference in complex systems? If we observe different parts

of a complex systems over a period of time, we can record the changes of measured states of these

different parts (e.g., membrane potentials or calcium fluorescence from individual neurons in the

brain). The first thing to do with this data is to find correlations between states of different parts

of the complex system. While we just argued that such correlations are the most primitive forms

of causal modeling and may not necessarily imply causal interactions, we do obtain a wealth

of information relating different parts of a complex system, which are useful particularly if we

cannot make interventions into the system, and hence, cannot go up the ladder of causation easily.

For example, correlations between activities of different parts of the human brain can be used

to calculate functional connectivity, a measure that yields useful information, e.g., in studies on

schizophrenia [2] and learning [3].

However, simple associations between compositions of a complex systems with its

characteristics, without a mechanistic understanding of the underlying processes, has been shown

to be dangerous in science. An example is linking a person’s genes to their behavioral traits.

‘Some companies see a market in reading DNA like a fortune-teller reads tea leaves’, remarks a

Nature commentary [4], warning us against the consequences of misinterpreting these associations

as causal effects.

vii


Indeed, examples of misuse of associations abound in modern machine learning. It is

challenging for artificial intelligence (AI) to work by identifying causal relations in the data,

going beyond probabilistic associations that most of them are trained on [5]. As consequences,

mainstream AI still struggles to transfer learned skills to new environments, requires data-heavy

training, and often produces opaque machines reproducing and reinforcing non-causal associations

and biases in training data. ‘Widespread, mysterious, and destructive’ [6], Cathy O’Neil labels the

algorithms affecting our lives, for example, by performing facial recognition in surveillance and

calculating everything from credit scores, college rankings, to the amount of prison sentences.

Opaque AI algorithms learning by identifying non-causal associations have been shown to,

e.g., misclassify gender of a face depending on race [7], produce language models showing

stereotypical association and negative sentiments towards specific groups [8], misclassify objects

to be identified for steering angle prediction in self-driving cars [9] and produce false positive

melanoma recognition depending on spurious surgical skin markings in medical images [10].

Thus, a general goal in machine learning is to produce AI algorithms that learn not by identifying

statistical associations, but by identifying causal relationships in the training data. In order to do

so, we need a way to lift the AI algorithms to one rung up the ladder of causation. This thesis

is about machine learning architectures that are shown to be able to learn by identifying causal

interactions in complex systems.

Taking machines up the ladder of causation

What does it require for a causal learner to go one level up the ladder of causation? It is the

ability to do specifically designed controlled experiments. In the case of genetics experiments, the

approach of ‘perturbation biology’ [11] involves knocking down genes and observing downstream

effects of such knockdowns on other genes to infer causal gene interaction networks. However,

viii


while this example allows experimental access to the system under study, such might not be true

in more general cases. For example, in the case of ecological predator-prey relationships, it is not

desirable, in general, to have any human interventions to the population dynamics of different

species. Considering such cases, trained machine learning algorithms should be able to accurately

predict the effects of interventions, if they are to be placed on the second rung of the ladder of

causation. Is there a way that it can be achieved?

In the case of complex dynamical systems with many components connected by a network of

causal interactions, we show in this thesis that it is possible to have machine learning architectures

trained only on the time-series data from the components of a complex system, that can not only

fit to the training data (i.e., learn to model associations), but can also predict the causal interactions

among the components. In particular, our trained machines would be shown to accurately answer

questions like “would component X of a complex system change at time t′ as a result of a change

in component Y at time t, with t′ > t”? In many cases, a correct answer to these questions would

mean inference of the causal network underlying the complex system. In our examples in the

following chapters, we shall test the goodness of causal network inference by comparing the

inferred causal network to the ground truth causal network, with the latter being known in all

of our examples. The results will show that there are hopes of lifting mostly associational AI

algorithms of today one step up the ladder of causation, where they would be able to predict the

effect of interventions and different controlled experiments on the systems they would be trained

on.

ix


Dedication

This thesis is dedicated to my partner, Ms. Promita Ghosh, doctoral candidate at the Rosalind

and Morris Goodman Cancer Institute, McGill University, for her constant support, advice, and

inspiration.

x


Acknowledgments

I shall start with the Land Acknowledgement Statement. The following is copied from the

UMD Office of Diversity and Inclusion website.

Every community owes its existence and strength to the generations before them,

around the world, who contributed their hopes, dreams, and energy into making

the history that led to this moment.

Truth and acknowledgement are critical in building mutual respect and

connections across all barriers of heritage and difference.

So, we acknowledge the truth that is often buried: We are on the ancestral lands

of the Piscataway People, who are the ancestral stewards of this sacred land.

It is their historical responsibility to advocate for the four-legged, the winged,

those that crawl and those that swim. They remind us that clean air and pristine

waterways are essential to all life.

This Land Acknowledgement is a vocal reminder for each of us as two-leggeds to

ensure our physical environment is in better condition than what we inherited,

for the health and prosperity of future generations.

xi

https://diversity.umd.edu/resources/land-acknowledgement


রবীন্দ্রনাথ ঠাকুর

Where my God dwells do you only see the precious jewels?

For I see the dirt weep there in abundance.

Where my Gurus take the seat how few good ones they care to greet?

For the naive get a home there, I am a follower hence.

Rabindranath Tagore

(English translation mine)

The graduate school life at UMD Physics taught me that gratefulness is something to be

practiced mindfully. Looking back four years into the past, I realize how difficult it is for me

to acknowledge everyone who helped me survive, thrive, and learn during my grad school. So,

I am beginning with acknowledging my own limitation and apologizing for my ungratefulness

to anyone not mentioned here. I plan to host an updated version of this section in my personal

website: https://banerjeeamitava.github.io/ under the section called ‘Writings’.

These years have not been easy at all. The only easy part was probably running the computer

codes to produce the results that make the bulk of this thesis. I was never able to go back to

my country, India, during my entire grad school, even when my friends died before the COVID

vaccine could be distributed. As if the pandemic was not enough, within four years of setting

foot in this country, I saw political violence, murderous police officers, free and cheap racialized

hatred, discrimination, anti-LGBTQ legislation, malfunctioning courts, mass shootings, and decay

xii

https://banerjeeamitava.github.io/


and deliberate dismantling of human rights and dignity. If that was not enough, we had to see a

full-scale war happening in some of my friends’ homeland, and countless other deadly conflicts

killing people all over the world. Amongst all of this, I have been able to survive, thanks to some

of the most wonderful people I have been privileged to have met, who made me feel loved and

blessed.

Since a lot of time during my PhD was spent at home due to the pandemic, I spent a

considerable amount of time with my housemates, and they are partly the reason I was able to get

through. So I am thankful to my current and former housemates: Abu Saleh Musa Patoary, for

cooking N number of dishes with (/for) me and traveling, drinking, and watching Netflix with me

among doing many other stuff, including getting the the defense champagne; Gautam Nambiar, for

his always-positive presence, numerous Malyali treats, getting the defense cake, and for the drive

to the COVID vaccine center; Elizabeth Joy Paul, Arthur Lin, and Wonseok Hwang for numerous

drives to the grocery stores, helping with moving, and for teaching me about etiquette; and Wrick

Sengupta, Sagar Airen, Amit Vikram Anand, and Aritra Das for great conversations, among many

other memories that are making me smile now. Speaking of my friends, I must also thank Sourav

Das and Sohitri Ghosh for numerous Bengali treats and all the great conversations; Subhayan

Sahu and Anshuman Swain for being my constant and robust supports; and Naren Manjunath,

Prakhar Gupta, and Xiaozhen Fu for numerous dinner invites to their place.

In the Summer of 2019 I went for a hike to the Old Rag mountains in the Shenandoah

National Park. This difficult hike was the first hike ever in my life, and I realized that it symbolized

my PhD journey. I am thankful to Joseph Hart who literally lifted me up and carried my burden

during the Old Rag hike and accompanied me in the hike to make sure I do not get stuck. I am

also thankful to be in the presence of Ian Abel, William Dorland and Sarah Penniston-Dorland,

xiii


Adelle Wright, Arinjoy De, and Rupak Mukherjee in the hikes.

I am grateful to have served in and served by several DEI and planning committees, and

I got to know and learn from some of the best human beings that life has given me: Allison

Carter, Mika Chmielewski, Kathleen Hamilton-Campos, Natalia Pankratova, and others from

the Women in Physics; Daniel Serrano, Thomas Murphy, Kevin Daniels, Roxanne Defendini,

Marc Swisdak, Heidi Myers, Taylor Prendargast, Sai Kanth Dacha, Carlos Rios Ocampo, and

Meredith Pettit from the IREAP-ROLE; Milena Crnogorčević, Charlotte Ward, Andrew Guo,

Katya Leidig, Arjun Savel, Stuart Vogel, and others from the GRAD-MAP; Madison Anderson,

Ananya Sitaram, Landry Horimbere, Edgar Perez, Donna Hammer, and others from the Physics

Climate Committee; Jennifer Enriquez, Justine Suegay, Shabab Ahmed Mirza, and others from

MICA; Luanjiao Aggie Hu, Fatima Ali, and others from the GSL; Brian Medina from the BISS;

Amanda Vu, Jackie Liu, Alythia Vo and many others from the APIDA Heritage Month Planning

Committee, and many more. These people have been the best teachers in my grad school. They

have convinced me that I am worthy of love and care and also capable of receiving them.

I am also thankful to be served by the Physics GRAD-COMM and Mental Health Task

Force, the Office of Diversity and Inclusion, and the LGBTQ+ Equity Centre. I am grateful to

Physics, IREAP, and IPST staff and coordinators Josiland Chambers, Jessica Crosby, Pauline

Rirksopa, Paulina Alejandro, Judy Gorski, Dorothea Brosius, Catha Stewart, and many others.

They have been the lifesavers and made me feel home here. Also thanks to Leslie Miller from the

the CMNS Office who made me feel special by writing a feature story on my research.

I am also indebted to the cleaning staff, the nurses who gave me covid vaccines, and the

numerous Uber drivers who have delivered food to me and taken me to places. I am particularly

grateful to the Uber driver Michael who was exceptionally kind to have drove me through the Six

xiv


Flag vaccination site when I had no idea that the site was drive-thru-only beforehand.

I am thankful to people whom I travelled with to the International Physics of Living Systems

conference in Montpellier this summer. These include Vishavdeep Vashisht, Spandan Pathak,

Nathan Zimmerberg, Elissa Moller, Frank Fazekas, and others and, for leading and planning the

trip, Arpita Upadhyaya.

I am grateful to my mentees, Neil Shah, Waley Wang, and Paulson Obiniyi Jr., who have

showed me that I am capable of giving research and life advice that are useful. I am also

thankful to all the amazing GRAD-MAP mentees whom I was lucky to have served, including

Autumn Jackson Bartholomew, who was very kind to come to my thesis and helped me with the

arrangements.

I am thankful to my research collaborators: Jaideep Pathak, Sarthak Chandra, Juan Restrepo,

William Fagan, Alireza Seif, Mohammad Hafezi, Alexey Gorshkov, Ravi Chepuri, and others, for

everything I learned from them.

I am thankful to my dissertation committee members Michelle Girvan, Pratyush Tiwary and

Yanne Chembo for the great feedback and encouragements. I am grateful to all the people people

who came to my defense talk, congratulated me on my graduation and celebrated with me.

I am grateful to my partner Promita Ghosh for being a constant, loving, and supporting

figure who was kind enough to be a safe haven whenever my mental health faltered. She proves

that brilliant scientists can also be good human beings.

I am extremely lucky to have Rajarshi Roy as my major collaborator and advisor. They say

you need a lot of mentors for different things in grad school: one mentor to guide your research,

one to help you network, one to help you look for jobs, one to go walk with, one who helps you

when you get into bad troubles, one to vent out to during your sad days, one to shed tears to, and

xv


one who prays for you silently, and so on. I had Raj.

Finally, I am thankful to my doctoral thesis advisor Edward Ott for being so kind as to

accept me as his student and providing me with his experienced guidance throughout my PhD

journey. Needless to say, the thesis would not have been done, had he not abundantly supported

me and my work, in every way possible, though everything that happened.

xvi


Table of Contents

Preface ii

Dedication x

Acknowledgements xi

Table of Contents xvii

List of Figures xx

List of Abbreviations and Symbols xxi

Chapter 1: Introduction 1
1.1 A network science view of complex dynamical systems . . . . . . . . . . . . . . 1
1.2 Network inference techniques and applications . . . . . . . . . . . . . . . . . . . 2

1.2.1 Model free network inference techniques . . . . . . . . . . . . . . . . . 4
1.2.2 Model-based network inference techniques . . . . . . . . . . . . . . . . 5

1.3 Network inference with reservoir computing . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Properties of computational models for network inference . . . . . . . . 6
1.3.2 Basic structure of a reservoir computer . . . . . . . . . . . . . . . . . . . 8
1.3.3 Reservoir computer models of time-series data and network inference . . 11

1.4 Organization of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 RC inference of short-term causal dependence . . . . . . . . . . . . . . . 15
1.4.2 RC inference of time-delayed causal dependence . . . . . . . . . . . . . 15
1.4.3 Effects of sampling conditions on RC causal inference and statistical analysis 16

Chapter 2: Inference of short term causal interactions:
applications in coupled Lorenz oscillators 18

2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Short Term Causal Dependence (STCD) . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Using Reservoir Computer to Determine Short Term Causal Dependence . . . . . 25
2.5 Tests of Machine Learning Inference of STCD . . . . . . . . . . . . . . . . . . . 29
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

xvii


Chapter 3: Inference of time-delayed causal interactions:
applications in optoelectronic oscillator networks 39

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Reservoir Computing Methodology for Network Inference . . . . . . . . . . . . 44

3.3.1 The General Delay-coupled Network . . . . . . . . . . . . . . . . . . . . 44
3.3.2 Time Series Prediction with a Reservoir Computer . . . . . . . . . . . . 46
3.3.3 Our Network Inference Procedure . . . . . . . . . . . . . . . . . . . . . 49

3.4 Opto-electronic Oscillator Networks . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.1 Description of the experiment . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.2 Mathematical Model and Numerical Simulations of the Opto-Electronic

Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.5 Results of Link Inference Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.5.1 Performance on Simulated Data - Homogeneous Delays . . . . . . . . . 62
3.5.2 Performance on Experimental Data . . . . . . . . . . . . . . . . . . . . . 69
3.5.3 Performance on Simulated Data - Heterogeneous Delays . . . . . . . . . 70

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.8 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.8.1 Determination of the time-delay from cross-correlation . . . . . . . . . . 74
3.8.2 Derivation of the discrete-time equation for simulating the opto-electronic

system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Chapter 4: Network inference application for C. elegans neuronal calcium fluorescence
dynamics 80

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 Physical systems and datasets used . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.3.1 Calcium imaging time-series data from C. elegans . . . . . . . . . . . . . 86
4.3.2 Coupled Lorenz oscillator network simulations . . . . . . . . . . . . . . 87

4.4 Link-score distributions for the coupled Lorenz Oscillator Network . . . . . . . . 88
4.5 Statistical analysis of candidate link scores for link inference . . . . . . . . . . . 93

4.5.1 Brief review of surrogate data generation techniques for network link
inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.5.2 Method of surrogate time-series generation and statistical significance
analysis of inferred connections . . . . . . . . . . . . . . . . . . . . . . 94

4.6 Tests on Example Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.6.1 Application to Lorenz network model . . . . . . . . . . . . . . . . . . . 95
4.6.2 Applications to C. elegans data . . . . . . . . . . . . . . . . . . . . . . . 98
4.6.3 Transfer entropy and Granger causality . . . . . . . . . . . . . . . . . . 100
4.6.4 Another Surrogate Data Generation Technique . . . . . . . . . . . . . . . 101

4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.8 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.8.1 Reservoir-computer-based connection inference technique . . . . . . . . 105
4.8.2 Transfer Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

xviii


4.8.3 Granger Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.9 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Chapter 5: Summary 111

Bibliography 113

Bibliography 113

Curriculum vitae 127

xix


List of Figures

1 Schematic of a Rube Goldberg machine . . . . . . . . . . . . . . . . . . . . . . iii
2 The food web of waterbirds from Chesapeake Bay . . . . . . . . . . . . . . . . . iv
3 The ladder of causation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1.1 Schematics of a reservoir computer . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Properties of time-series prediction by reservoir computers . . . . . . . . . . . . 12

2.1 Schematic of the reservoir computing architecture . . . . . . . . . . . . . . . . . 26
2.2 Performance of link inference for noiseless Lorenz oscillator networks . . . . . . 33
2.3 Effect of dynamical and observational noise on network link inference . . . . . . 35
2.4 Network link strength inference for arbitrary networks . . . . . . . . . . . . . . . 37

3.1 Schematics of a reservoir computer for time-delayed prediction . . . . . . . . . . 46
3.2 Illustration of opto-electronic oscillator and coupling scheme . . . . . . . . . . . 55
3.3 Examples of experimental and simulated time series from two opto-electronic

oscillator networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4 List of possible connected directed 4-node networks . . . . . . . . . . . . . . . . 60
3.5 Synchronization error for simulated time series of 4-node networks . . . . . . . . 62
3.6 Network inference performance with varying noise strength . . . . . . . . . . . . 63
3.7 Network inference performance with varying coupling strength . . . . . . . . . . 65
3.8 Distribution of individual link scores . . . . . . . . . . . . . . . . . . . . . . . . 68
3.9 Performance of network inference method on experimental data . . . . . . . . . 69
3.10 Inference of networks with heterogeneous link delays . . . . . . . . . . . . . . . 72
3.11 Time delay determination by cross-correlation of measured time series . . . . . . 75

4.1 C. elegans neural connectome inference from calcium fluorescence time-series . . 82
4.2 Separable and non-separable network link inference score distributions . . . . . . 83
4.3 Dependence of network link inference score distributions on sampling limitations 89
4.4 score distributions from different link inference methods . . . . . . . . . . . . . 99
4.5 Score distribution and surrogate data for AAFT surrogate . . . . . . . . . . . . . 103
4.6 Schematic of the reservoir computer . . . . . . . . . . . . . . . . . . . . . . . . 107

xx


List of Abbreviations and Symbols

AAFT Amplitude-Adjusted Fourier Transform
AI Artificial Intelligence

CDSD Causality-Destroyed Surrogate Data

ML Machine Learning

OEO Opto-Electronic Oscillator

RC Reservoir Computer

STCD Short Term Causal Dependence

Win Input-to-Reservoir Coupling Matrix
Wout Reservoir-to-Output Coupling Matrix

xxi


Chapter 1: Introduction

In this thesis, we infer causal interactions in complex dynamical systems from machine

learning models of their time-series data. This chapter serves as a background material for the

rest of the thesis. It introduces both the research questions that we shall address in the rest of this

thesis, and the main framework that we shall address them within.

We start by explaining with examples why an understanding of the network of causal

interactions underlying a complex system is essential for modeling and prediction of complex

systems. Then we discuss several traditional and contemporary methods developed for such

purpose. Finally, we shall introduce a new machine-learning-based technique for inference of

causal interaction network structure underlying complex dynamical systems, that we shall employ

in the next chapters.

1.1 A network science view of complex dynamical systems

The science of agents, objects, or systems connected by graphs - network science -

has flourished in the last two decades and provided us with a wealth of useful analytical and

computational techniques [12, 13] to describe and study complex systems in nature and society.

Examples of networked systems range from neuronal networks [14] in the brains of human and

other organisms, and gene regulation and protein interaction networks inside cells [15], to the

1


internet and telecommunication systems. Network science has provided us with a ‘systems’

viewpoint of natural and social processes, that has been instrumental in their understanding and

control. A particularly important subfield of network science deals with collective properties

of complex dynamical systems connected via networks. Applications of such studies include

epidemic spreading [16], synchronization of dynamics [17], and growth of political polarization

[18] on complex networks, to mention a few. As such, networked complex systems are platforms

of studying a wide variety of dynamical phenomena arising from the interactions between network

topology and time-evolution of agents, acting as the network nodes, connected by the network

links.

1.2 Network inference techniques and applications

While studying the collective properties and dynamics of networked agents has been

interesting, the inverse problem, i.e., inference of network links from the observed dynamics of

the network nodal states, is also an important and challenging problem in itself. Applications

of network inference techniques include determination of the synaptic connectivity in nervous

systems [2, 19, 20], mapping of interactions between genes [21] and proteins in biochemical

networks [22], distinguishing relationships between species in ecological networks [23, 24],

and understanding the causal dependencies between elements of the global climate [25]. In

all such cases, a basic network structure among the agents of these complex systems (e.g.,

individual neurons, genes, proteins, and species) can be found by measuring correlations among

the time-dependent states of the agents. Examples include functional connectivity in neuroscience

which measures the correlation of neural activity among different parts of the brain [2, 14],

2


gene co-expression networks that link genes which are statistically expressed together [26], co-

occurrence/co-exclusion networks of microbes [27] that map which microbes are statistically

present or absent together in the human microbiome etc. Such networks based on correlation,

co-occurrence or associations among different agents have been useful in, for example, relating

neural activity to neurodegenerative diseases [28, 29] and other conditions [2, 14] and relating

changes in gut microbial co-occurrence patterns to inflammatory bowel disease [30]. However,

despite the usefulness of correlation-based network models of complex systems, in the absence of

causal models, they are generally unable to predict how the complex systems would respond to

external perturbations. For example, it is impossible to predict how knocking down of a specific

gene would affect the expression of other genes and stimulating a specific neuron would affect

the firing pattern of other neurons, if the causal, directed interaction networks among the genes

and neurons are not known. So, therapeutic interventions into these systems will not be possible

without a thorough causal understanding of these systems. Thus, a broad goal of this thesis is to

extract the causal network connecting different components of a complex system, solely from the

time evolution of such components.

Given the usefulness of causal network inference, a number of methods have been proposed

for such purpose. We shall now briefly review some prominent network inference methods. It is

helpful to classify them into two types: (1) model-free, i.e., network inference techniques that do

not assume any particular mathematical model of the observed dynamics, and, (2) model-based,

i.e., network inference techniques which assume the dynamics of the system to be governed

by a particular class of mathematical models (e.g., a set of linear coupled ordinary differential

equations) with parameters which are fitted using the available time-series data. We now describe

some examples of the two types of network inference methods. For connectivity inference from

3


neuronal recordings, a review of different network inference methods and their comparisons can

be found in Ref. [19].

1.2.1 Model free network inference techniques

1.2.1.1 Transfer entropy

Transfer entropy is an information theoretic causal network inference technique, which

measures the rate of transfer of information from one time-series to another [31]. For a pair of

agents (i, j) with dynamical states in a complex system, transfer entropy from i to j is defined by

the conditional mutual information between the future states of j and past states of i conditioned

by past states of j. It represents the amount by which the uncertainty in the future dynamics of j

is reduced by knowing the past dynamics of i conditioned on the past dynamics of j. Transfer

entropy is usually calculated pairwise, and can be used to construct a weighted, directed network

connecting the agents of a complex system based on the directional information flow between

them. Several generalizations and variations of transfer entropy also exist [19], to account for

experimental limitations to acquisition of time-series data.

1.2.1.2 Supervised learning of causal interactions

Model-free techniques of estimating causal interaction networks are rare because effects of

causal interactions between dynamical agents can potentially affect their dynamics in multiple,

unknown ways. In cases where there are available examples of dynamics with known ground

truth causal connectivity, one can use such examples to train machine learning algorithm to

detect the presence of causal interactions from dynamics [19, 32]. However, such techniques

4


require data-heavy training and availability of ground truth causal interaction networks which

are often unknown. Furthermore, in many cases, such models lack interpretability if it is unclear

what features of a complex system’s dynamics are being used to predict the causal interactions

underlying the dynamics.

1.2.2 Model-based network inference techniques

In model-based techniques the causal connectivity is estimated by mathematically modeling

the processes that generates the sampled time-series data, and fitting model parameters to the

observed data. Some examples of such models are the following.

1.2.2.1 Granger causality

For a pair of agents (i, j) with dynamical states in a complex system, we say that j does

not Granger cause i if and only if i, conditional on its own past dynamics, is independent of the

past dynamics of j [33–36]. Alternatively, if we can have information on the past dynamics of

i from the past dynamics of j, then j would Granger cause i. The typical way to test this type

of dependency between two time-series involves fitting a vector autoregressive model for the

time-series for i, and measuring whether inclusion of the time-series of j in that model makes the

fitting error significantly lower. In such generative time-series models applied to test the predictive

powers of different time-series, one can simultaneously consider more than two time-series at

once, and can simultaneously include multiple time-delays in the generative model. However,

these models are linear, and stationary in nature whereas the original dynamics might be nonlinear

and non-stationary [37, 38]. Furthermore, the presence of unmeasured external inputs to the

5


dynamics, noise, and finite sampling rate is also known to affect Granger causality [37, 38].

1.2.2.2 Bayesian network model

Bayesian networks model the conditional probability distributions of time-series from

different agents with a directed, acyclic graph [19, 24, 39] with parameters corresponding to, e.g.,

causal connectivity strengths. These parameters of the model are optimized to fit to the observed

data and assuming a particular form of the underlying probability distribution (e.g., Gaussian) of

the time-series variables from different agents [39].

1.3 Network inference with reservoir computing

1.3.1 Properties of computational models for network inference

In the preceding section, we saw some examples of popular techniques for causal

connectivity inference from time-series of complex systems. While model-free methods like

transfer entropy can capture causal connectivity by measuring information flow between different

dynamical agents of a complex system, they do not provide any generative model for the sampled

time-series data. In this thesis, we wish to extract causal interaction network connections from a

model of the observed time-series, that can accurately fit to and generalize the measured data.

Thus, we aim for a mathematical or computational model of the observed time-series from the

complex system that has the following three properties.

(i) Accurate fit to the sampled data: The model is sufficiently complex, potentially

containing a large number of adjustable parameter so that it can fit to the sampled time-series data

accurately.

6


(ii) Prediction or generalization of the sampled data: The model is able to predict the

time-series of the complex system for some time duration in the future, or, at least be able to

generate time-series data which is statistically similar (e.g., has similar power spectrum) to the

original sampled time-series data.

(iii) Causal inference extraction from observed data: The model should be able to predict

the effect of small changes to the states of individual agents on the states of other agents at a later

time in the complex system. That is, it should have the capability to extract causal connectivity

structure among the dynamical states of different agents, using only the observed time-series data

that it has been fitted on.

In order to faithfully infer causal network connectivity from a model of time-series data, all

three of the above properties are essential. Since causal interactions can affect the dynamics of

individual agents in many ways, we wish to make sure that our model of the dynamics is sufficiently

complex to capture all of them, so we wish our model to fit accurately to the observed time-series

data. However, goodness of this fit alone does not guarantee that the model is indeed capturing the

ground truth dynamical processes of the actual system, as the model could overfit to the available

data. In order to avoid this overfitting, we ensure that the model is indeed able to autonomously

predict the future time evolution of the actual system, or generate time-series samples which are

statistically similar to the observed time-series of the complex system. Furthermore, to ensure

that the model is able to fit to and generalize the observed time-series not by identifying simple

statistical association between the time-series of different agents, we further require that the model

must be able to predict the effect of interventions to the time-series of the different agents. In this

thesis, we shall show that it is possible to obtain a model that satisfies the above three properties,

using a specialized machine learning architecture called the ‘reservoir computer’ (RC). In the next

7


section, we describe the structure of a RC in more details.

1.3.2 Basic structure of a reservoir computer

In its most basic form, the reservoir computer is a machine learning architecture that takes

in multiple time-series at its input and produces multiple time-series at its output (Fig. 1.1). We

denote the time-series that has been sampled from the complex system by the time-dependent

vector u(t), of dimension Du, with its components denoting the time-series corresponding to

the states of individual agents of the system (note that, if the states of the agents are given by

vectors themselves, then multiple components of the time-series vector u(t) may be sampled

from the same agent). If the sampling time is δt, we have this time series vector measured at

times t = 0, δt, 2δt, ..., T δt. Based on the properties of causal models from the previous section,

we wish to obtain a computational model of the observed time-series yielding a corresponding

time-series ũ(t) of the dynamics that (i) fits to the observed time-series data u(t) (which we shall

call the ‘training data’), (ii) can generate time-series data ũ(t) with similar statistical properties as

u(t), and (iii) can predict the input data for a time duration (say, τ ) into the future, i.e., given u(t),

the model can predict ũ(t+ τ) at its output. Fig. 1.1 shows how this is done with an RC.

Input time-series

u1(t)

u2(t)

u3(t)

u4(t)

u5(t)

𝐮(t)
෦u1(t + 𝜏)

෦u2(t + 𝜏)

෦u3(t + 𝜏)

෦u4(t + 𝜏)

෦u5(t + 𝜏)

෥𝐮(t + 𝜏)

Predicted time-series

Figure 1.1: Schematics of a reservoir computer predicting a five-variable time-series u(t) a time τ
ahead into the future.

8


As depicted in Fig. 1.1, the RC is a network consisting of a large number of nodes. The

number of nodes, Dr, is chosen such that Dr >> Du. Each of the nodes has a time-dependent

scalar state, all of which are collected in a time-dependent column vector R(t) of length Dr.

These reservoir nodes receive measured inputs of the system state vector u(t). At every time-point

t, this system state vector u(t) is fed into the reservoir via a Dr ×Du input-to-reservoir coupling

matrix Win (Fig. 1.1). Furthermore, the reservoir nodes are coupled among themselves with a

Dr ×Dr adjacency matrix W. The time evolution of the reservoir node states R over a small

time step ∆t are given by the equation,

R (t+∆t) = σ (Winu(t),WR(t)) , (1.1)

which maps the reservoir state at time t to the reservoir state at the next time step, t+∆t, where σ is

a general nonlinear ‘activation’ function of the input and reservoir states, acting componentwise on

its vector argument (which must the same dimension, Dr, as the reservoir state R). For simplicity,

in this instance we can assume that δt = ∆t, even though in more general cases, it suffices that

the reservoir time-evolution iteration time-step ∆t is an integer multiple of the sampling time-step

δt of the system. Based on the available time series from the system u(t) for T time steps, Eq. 1.1

can be iterated to generate a corresponding time series for the reservoir state, R(t).

Since the goal is to predict the future values of the sampled time-series u(t+ τ) from their

current observed values u(t), we may utilize the reservoir state vectors R(t) for this purpose. In

our case, this is done by using a suitable linear combination of the reservoir node states with a

Du ×Dr reservoir-to-output coupling matrix Wout (Fig. 1.1) according to the equation,

ũ(t) = WoutR(t), (1.2)

9


where the tilde in ũ indicates that the vector is a prediction from the RC, as opposed to being

sampled from the actual system. Given the form of Eq. 1.2, the matrix Wout can be found by a

linear regression between the vectors u(t) and R(t). This linear regression constitutes the training

process of the RC, and the matrix Wout is the only element of the RC with adjustable parameters -

the other matrices, like Win and W, are chosen randomly at the beginning of iteration of eq. 1.1,

kept fixed throughout. Since the training constitutes of a linear regression, RCs can be trained

much faster than conventional deep learning neural networks which contain many adjustable

parameters which have to be trained by gradient descent. A comparison of the training time of RC

and other machine learning architectures is done in Ref. [40].

While the subsequent chapters in this thesis will contain more details about the form of

the nonlinear function σ, as well as the structure of the fixed matrices Win and W, we note here

the generality of RCs. Indeed Eq. 1.1 shows that any dynamical system with a sufficiently large

number of ‘nodes’ or computing units, having some nonlinearity, and a mechanism of receiving

inputs and providing readable outputs can serve as a potential RC. Indeed, as reviews on RC

realizations with physical systems demonstrate [41], RCs can be experimentally realized with

many different systems irrespective of the microscopic details of their dynamics. Such physical

implementation of the reservoir computer [41–43], include photonic [44–46], electronic and

opto-electronic [47, 48], neuronal [49], molecular [50], chemical [51], spin [52] systems, or even

a tank of water [53]. This flexibility of physical realization makes RCs a prime candidate for

‘neuromorphic computing’ [43], an unconventional computing paradigm that seeks inspiration

from neural systems to create novel computing architectures.

10


1.3.3 Reservoir computer models of time-series data and network inference

RCs have been used for a variety of applications in time-series analysis, including forecasting

of spatially extended [54] and networked [55] chaotic systems, data-based reconstruction of long-

term statistical properties of dynamics like the Lyapunov spectrum [56, 57], separation of mixed

signals from two sources [53, 58], and fast classification of spoken words [44], to mention a few.

Among these applications, the first one is the most relevant for our purpose of network inference.

In terms of properties (i) and (ii) from Sec. 1.3.1, RCs with a moderate size (Dr ≈ 1000) are

known to fit to the training data of size Du ≈ 10 accurately. Furthermore, as Fig. 1.2(a) shows,

it is possible to use trained RCs for autonomous long-term time-series forecasting by running

them in a closed loop configuration where the output time-series from the RC ũ(t) at a given

time-step t is fed back as the input to the RC, thus eliminating the need to have any inputs from

the original time-series data. As shown in the example of Fig. 1.2(b), such predictions remain

accurate for a long time, even for chaotic systems. It is to be noted here that, for chaotic systems,

time-series starting from two different initial conditions separated by a small amount δu, are

expected to separate by an amount eΛmaxtδu after a time t, where Λmax > 0 is the maximum

Lyapunov exponent of the system. Forecasting of such chaotic trajectories several Lyapunov times

into the future thus shows that the closed-loop RC can serve as an accurate model of the observed

dynamics. Fig. 1.2(b) further shows that, even when the accuracy of the long-term RC prediction

breaks down (say after a time t such that Λmaxt > 6), the RC predicted time-series qualitatively

looks similar to the original time-series. This is confirmed by comparing long-term statistical

properties of the dynamics, like power spectrum (Fig. 1.2(c)) and Lyapunov spectrum (which is

the set of Lyapunov exponents of a chaotic systems, ordered according to their numerical values,

11


shown in Fig. 1.2(d)) for the original time-series and closed-loop the RC predicted time-series.

Figs. 1.2(c,d) show that long-term statistical properties of the RC prediction indeed matches

with the corresponding properties of the original time-series data, thus fulfilling the ‘time-series

prediction and generalization’ condition (ii) of Sec. 1.3.1.

Win WoutRC

(a)

(b)
Original

Predicted

Error

(c)
Original data
RC prediction

Original data
RC prediction

(d)

×
+

Figure 1.2: Properties of time-series prediction by reservoir computers: (a) a schematic of closed-
loop configuration of a trained RC used for autonomous time-series prediction, (b) original and
RC-predicted time-series from a large, spatiotemporally chaotic system (Kuramoto-Sivashinsky
model) and the prediction error over time, with time measured in terms of the maximum Lyapunov
exponent Λmax, (c) comparison of the power spectrum, and (d) Lyapunov spectrum of the original
time-series data and RC prediction for the Kuramoto-Sivashinsky model. (a) and (b) are adapted
from Ref. [54] and (c) and (d) are adapted from Ref. [56].

Since the RC has been demonstrated in the literature to fulfill conditions (i) and (ii) of Sec.

1.3.1, we wish to know if it can capture the causal interactions between the dynamics of different

agents in the complex system (condition (iii) of Sec. 1.3.1). In our formalism, this is represented

by causal interaction among the components of the vector u(t). Mathematically, we suppose that,

if a component i of the vector u(t) is causally affecting another component j of u(t), then, the

12


derivative ∂uj(t + τ)/∂ui(t) would be non-zero for some τ > 0. For example, if the system

does not have any time-delay, then over a very short period of time τ = δt, the derivative would

acquire a non-zero value if the dynamics of component i is causally affecting the dynamics of

component j. For time-delayed systems, we can set τ to be the time-delay of causal interactions to

have a similar effect. Note that, our notion of causality, as defined with the partial derivative of a

component of the state vector u(t) with respect to another component at a past time, is close to the

concept of causality in terms of interventions. That is, for example, if we changed a component

ui at time t by putting an external perturbation, that would cause a corresponding change to a

component uj a time τ later if and only if the component j is causally connected to the component

i over this time. As such, our task of causal inference in complex systems amounts to estimating

the derivatives ∂uj(t+ τ)/∂ui(t) from the sampled observations of the time-series of u(t).

Partial derivatives like ∂uj(t+ τ)/∂ui(t) cannot be directly calculated from the time-series

of u(t) directly in a straightforward way. This is because, when we test causal dependence

between a pair of variables, we wish to have the other variables unchanged, so that their effects

do not interfere with the testing pair of variables. In the partial derivative ∂uj(t+ τ)/∂ui(t), all

components of the vector u(t) other than the i-th and j-th components are assumed to be held

fixed. In the original time-series, this rarely happens, since, at every sampling time point, generally

there are changes to all components of the state vector u(t) together. Thus, the original time series

of u(t) itself cannot be used to calculate the partial derivatives ∂uj(t+ τ)/∂ui(t) indicating causal

interactions.

However, if we have a model for the observed dynamics of the state u(t), of the form

u(t+ τ) = M (u(t)), then we can use that model to calculate the partial derivatives: ∂uj(t +

τ)/∂ui(t) = ∂Mj/∂ui and test the causal dependence among the components of u(t). Since RCs

13


are able to give us an excellent model of an observed time-series that not only fits to the sampled

data accurately, but can also both predict and generalize the sampled data, we use the RC model of

an observed time-series to calculate the partial derivatives. The model is obtained by combining

Eqs. 1.1 and 1.2 into a single equation, assuming a perfect fit to the training data by the RC (i.e.,

u(t) ≈ ũ(t)),

u(t+ τ) = Woutσ (Winu(t),WR(t)) , (1.3)

which we use as the model of the observed time-series to calculate the partial derivatives

∂uj(t + τ)/∂ui(t) = ∂Mj/∂ui. If the RC model of the time-series, given by Eq. 1.3 indeed

captures the true causal dependencies among the components of the time-series, the resulting

derivatives should indicate such. Throughout the rest of this thesis, we shall use equations like Eq.

1.3 to test whether RC models are able to actually predict the true causal dependencies among the

components of a complex system, which would be known in all cases that we shall consider. We

shall also investigate how properties of the time-series, like amount of correlations among their

components, and sampling limitations, like finite sampling time, observational and dynamical

noise systematically affect the causal learning performance of the RC.

1.4 Organization of this thesis

The next three chapters of the thesis are about three applications of causal inference by the

RC.

14


1.4.1 RC inference of short-term causal dependence

In chapter 2, based on Ref. [59], we test whether RCs can capture short term causal

dependencies in the data, i.e., predict how one component of a time-series affects another over

a very short time. To test this, we shall use simulated data from complex networks of coupled

chaotic Lorenz oscillators. Varying the size of the network, as well as observational and dynamical

noise strengths in the time-series data, we show that the RC is able to correctly infer the causal

interactions networks if the degree of correlation among the time-series from different Lorenz

oscillators and the observational noise strength are sufficiently small. Interestingly, in contrast

to what we generally described so far in this chapter, presence of dynamical noise is shown to

aid RCs to identify causal interactions. We hypothesize that the reason of this is two-fold: (i)

dynamical noise breaks down correlations and synchrony among the time-series from different

Lorenz oscillators, and (ii) over a short time period, dynamical noise at one Lorenz oscillator

only propagates to those particular oscillators which are causally connected to that oscillator. As

such, dynamical noise in the network generates perturbations of the Lorenz oscillator nodal states,

which propagate in time to only causally connected nodal states, revealing the causal connectivity

structure in the process. Even for relatively small dynamical noise strengths, when these changes

are infinitesimal, the RC is shown to be able to successfully use their information to correctly

predict short-term causal connections among the networked Lorenz oscillators.

1.4.2 RC inference of time-delayed causal dependence

In contrast to the short-term causal inference in chapter 2, chapter 3, based on Ref.

[60], discusses inference of time-delayed causal interaction by the RC. In this chapter, we use

15


experimental and simulated voltage data from networks of delayed-feedback-coupled nonlinear

optoelectronic oscillator (OEO) networks - an excellent testbed for studying nonlinear dynamics of

delay-coupled networks. The results are similar to the ones in chapter 2 in that the RC is shown to

be able to capture the causal interaction network correctly, provided the amount of synchronization

among the dynamical variables are small - something that can be achieved by tuning the coupling

strengths, increasing the dynamical noise strength, having different time-delays along different

feedback network links, or by rewiring the OEO networks. Moreover, the RCs identifying causal

interactions are here shown to perform equally well on experimental data. This observation

inspires us to apply the RC causal inference technique to more real-world time-series datasets

from less well-controlled experiments. This is the subject of chapter 4.

1.4.3 Effects of sampling conditions on RC causal inference and statistical

analysis

In chapter 4, we apply RC technique for short-term causal dependence, but on publicly

available experimental data on calcium fluorescence dynamics from the neural system of live C.

elegans. Such data suffers from low sampling temporal resolution, high noise level, large amount

of correlation among different variables, and only a partial measurement of the full biophysical

neural state of the organism. However, the C. elegans neuronal connectome is completely mapped

and thus serves as the ground truth for benchmarking the network inference performance of the

RC. To investigate the effect of data limitations on network inference in this system, we also

use time-series data from a synthetic model of coupled Lorenz oscillators on the same network

and systematically changed the different sampling parameters. In all such cases, when the data

16


limitations restrict the complete identification of the causal network, we apply a previously-

developed statistical technique with surrogate time-series data to assign statistical confidence

to each inferred causal link. The RC network inference technique, together with the statistical

methodology, is shown to yield useful information about the C. elegans neural connectome, when

compared with the ground truth connectome.

Finally, the next section summarizes how RC can be used for inference of causal connections

in complex systems.

17


Chapter 2: Inference of short term causal interactions:

applications in coupled Lorenz oscillators

This chapter is based on the publication “Using machine learning to assess short term

causal dependence and infer network links” by Amitava Banerjee, Jaideep Pathak, Rajarshi Roy,

Juan G Restrepo, Edward Ott, published as Chaos 29, 121104 (2019). The code to generate

some of the results can be found in the GitHub repository banerjeeamitava/Reservoir-Computer-

Network-Inference.

2.1 Overview

This chapter demonstrates the first concrete application of our network inference procedure,

using simulated time-series data from networks of coupled chaotic Lorenz oscillators. The

problem of determining causal dependencies among various agents, in an unknown time-evolving

system, from its time series observations, has wide-ranging and diverse applications. These

include inferring neuronal connections from spiking data, deducing causal dependencies between

genes from expression data, discovering long spatial range influences in climate variations, etc.

Previous work has often addressed such problems by considering correlations, prediction impact,

or information transfer metrics. In this chapter we propose a new method for causal inference

in complex systems that leverages the potential ability of a specific type of machine learning,

18

https://aip.scitation.org/doi/abs/10.1063/1.5134845
https://github.com/banerjeeamitava/Reservoir-Computer-Network-Inference
https://github.com/banerjeeamitava/Reservoir-Computer-Network-Inference


called ‘reservoir computing’ (RC), to perform predictive and generative tasks involving complex

time-series.

Using RCs, We introduce and test a general computational technique for the inference of

short term causal dependence among the measured time-dependent state variables of an unknown

dynamical system. The basic method involves training the RC with the observed time-series

data of the state variables of the unknown dynamical system to construct a computer model for

the unknown dynamics, and, subsequently, estimating the causal dependencies of different state

variables on each other in that computer model.

We test our method on model complex systems consisting of networks of many

interconnected chaotic Lorenz oscillators. Comparing the causal networks inferred by the RC

to the ground truth, we find that dynamical noise can greatly enhance the effectiveness of our

technique, while observational noise and presence of synchronization degrades the effectiveness.

We believe that the competition between these two opposing types of noise, as well as the degree

of synchrony in the system will be the key factor determining the success of causal inference in

many of the most important application situations. These tests show that machine learning offers a

unique and potentially highly effective approach to the general problem of causal inference.

2.2 Introduction

The core goal of science is often described to be generalization from observations to

understanding [61], commonly embodied in predictive theories. Related to this is the desire to

use measured data to infer necessary properties and structure of any description consistent with

a given class of observations. On the other hand, it has recently emerged that machine learning

19


(ML) is capable of effectively performing a wide range of interpretive and predictive tasks on

data [62]. Thus it is natural to ask whether machine learning might be useful for the common

scientific goal of discovering structural properties of a system from data generated by that system.

In this chapter we consider an important, widely applicable class of such tasks. Specifically, we

consider the use of machine learning to address two goals.

Goal (i): Determine whether or not a state variable of a time evolving system causally

influences another state variable.

Goal (ii): Determine the ‘strength’ of such causal influences.

In the terminology of ML, Goal (i) is referred to as “classification ML,” and Goal (ii)

is referred to as “regression ML.” These goals have previously been of great interest in many

applications (e.g., economics [33], neuroscience [63], genomics [21], climate [64], etc.). Many

past approaches have, for example, been based upon the concepts of prediction impact, [33,63]

correlation, [65–68] information transfer, [31, 69] and direct physical perturbations [11, 70]. Other

previous works have investigated the inference of network links from time series of node states

assuming some prior knowledge of the form of the network system and using that knowledge in a

fitting procedure to determine links [66, 71–74]. In addition, some recent papers address network

link inference from data via techniques based on delay coordinate embedding, [72] random forest

methods, [75] network embedding algorithms [76] and feature ranking [77]. In this chapter, we

introduce a technique that makes the use of an ML training process in performing predictive and

interpretive tasks and attempts to use it to extract information about causal dependencies. In

particular, here we use a particular type of machine learning (ML) called reservoir computing, an

efficient method of time series analysis which has previously been successfully used for different

tasks, e.g., prediction of chaotic dynamics [54, 78, 79] and speech recognition [44, 47] to mention

20


a few. In our case, a “reservoir” dynamical system is trained such that it becomes synchronized to

a training time series data set from the unknown system of interest. The trained reservoir system

is then able to provide an estimation of the response to perturbations in different parts of the

original system, thus yielding information about causal dependencies in the actual system. We will

show that this ML-based technique offers a unique and potentially highly effective approach to

determining causal dependencies. Furthermore, the presence of dynamical noise (either naturally

present or intentionally injected) can very greatly improve the ability to infer causality, [71, 72]

while, in contrast, observational noise degrades inference.

This rest of this chapter is organized as follows: Sec. 2.3 mathematically defines our notion

of short term causal dependence (STCD) in the context of dynamical systems and argues how

it captures the effects of small external interventions into the system, which are often employed

in experiments to identify causal connections. Section 2.4 introduces the method of inferring

STCD using reservoir computer models of the dynamics. The next section, Sec. 2.5, illustrates

the performance of the RC-based STCD inference technique in the system of coupled Lorenz

oscillators, and identifies conditions which are helpful and detrimental for STCD inference.

2.3 Short Term Causal Dependence (STCD)

We begin by considering the very general case of an evolving, deterministic,

dynamical system whose state at time t is represented by the M -dimensional vector

z(t) = [z1(t), z2(t), . . . , zM(t)]T , where z(t) evolves via a system of M differential equations,

dz(t)/dt = F(z(t)), and has reached a statistically steady dynamical state (perhaps chaotic). In

this context, we frame the issue of causality as follows: Will a perturbation at time t applied to a

21


component zi of the state vector z(t) (i.e., zi(t) → zi(t) + δzi(t)) lead to a subsequent change at a

slightly later time, t+ τ , of another scalar component zj (i.e., zj(t+ τ) → zj(t+ τ)+ δzj(t+ τ));

and how can we quantify the strength of this dependence? This formulation might suggest

comparison of the evolutions of z(t) that result from two identical systems, one with, and the other

without, application of the perturbation. However, we will be interested in the typical situation

in which such a comparison is not possible, and one can only passively observe (measure) the

state z(t) of the (single) system of interest. Aside from knowing that the dynamics of interest

evolves according to a system of the form dz/dt = F(z), we assume little or no additional

knowledge of the system, and that the available information is a limited-duration past time series

of the state evolution z(t). Nevertheless, we still desire to deduce causal dependencies, where

the meaning of causal is in terms of responses to perturbations as defined above. Since, as we

will see, accomplishment of this task, in principle, is not always possible, our approach will be

to first propose a heuristic solution, and then numerically test its validity. The main message

of this chapter is that our proposed procedure can be extremely effective for a very large class

of important problems. We will also delineate situations where our procedure is expected to

fail. We emphasize that, as our method is conceptually based on consideration of responses to

perturbations, in our opinion, it provides a more direct test of what is commonly of interest when

determining causality than do tests based on prediction impact, correlation, or entropy metrics.

Furthermore, although the setting motivating our procedure is for deterministic systems,

dz/dt = F(z), we will also investigate performance of our procedure in the presence of both

dynamical noise (i.e., noise added to the state evolution equation, dz/dt = F(z)) and observational

noise (i.e., noise added to observations of z(t) used as training data for the machine learning).

Both types of noise are, in practice, invariably present. An important result from our study is

22


that the presence of dynamical noise can very greatly enhance the accuracy and applicability of

our method (a similar point has been made in Ref. [71] and Ref. [72]), while observational noise

degrades the ability to infer causal dependence.

To more precisely define causal dependence, we consider the effect of a perturbation on one

variable on the other variables as follows. Taking the jth component of dz/dt = F(z), we have

dzj(t)/dt = Fj(z1(t), . . . , z2(t), . . . , zM(t)),

for j = 1, 2, . . . ,M . Perturbing zi(t) by δzi(t), we obtain for small τ , that the component of the

orbit perturbation of zj at time (t+ τ) due to δzi is

δzj(t+ τ) = τ

{
∂Fj(z)

∂zi
|z=z(t)

}
δzi(t) +O(τ 2).

We define the Short Term Causal Dependence (STCD) metric, fji, of zj on zi by

fji =

〈
G

(
∂Fj(z)

∂zi

)〉
, (2.1)

where ⟨(. . .)⟩ denotes a long time average of the quantity (. . .) over an orbit, and the function G

is to be chosen in a situation-dependent manner. For example, later in this chapter, we consider

examples addressing Goal (i) (where we want to distinguish whether or not ∂Fj(z)/∂zi is always

zero) for which we use G(q) = |q|, while, when we consider an example addressing Goal (ii)

and are concerned with the time-averaged signed value of the interaction strength, we then use

G(q) = q. In either case, we view fji as quantifying the causal dependence of zj on zi, and the

key goal of this chapter will be to obtain and test a machine learning procedure for estimating fji

from observations of the state evolution z(t). For future reference, we will henceforth denote our

machine learning estimate of fji by f̂ji. In the case of our Goal (i) experiments, where G(q) = |q|,

23


we note that fji defined by (1) is an average of a non-negative quantity and thus fji ≥ 0, as will be

our estimate, f̂ji ≥ 0. Furthermore, for that case we will define STCD of zi on zj by the condition,

fji > 0, and, when using our machine learning estimate f̂ji, we shall judge STCD to likely apply

when f̂ji > ϵ where we call ϵ > 0 the discrimination threshold. In the ideal case (f̂ji = fji), the

discrimination threshold ϵ can be set to zero, but, in practice, due to error in our estimate, we

consider ϵ to be a suitably chosen positive number. We note that, in the ideal case, ϵ = 0 can be

regarded as a test for whether or not Fj(z) is independent of zi.

As a demonstration of a situation for which the determination of STCD from observations

of the motion of z(t) on its attractor is not possible, we note the case where the attractor is a

fixed point (a zero-dimensional attractor). Here, the measured available information is the M

numbers that are the coordinates of the fixed point, and this information is clearly insufficient for

determining STCD. As another problematic example, we note that in certain cases one is interested

in a dynamical system that is a connected network of identical dynamical subsystems, and that such

a network system can exhibit exact synchronization of its component subsystems [80](including

cases where the subsystem orbits are chaotic). In the case where such a synchronized state is

stable, observations of the individual subsystems are indistinguishable, and it is then impossible,

in principle, for one to infer causal relationships between state variables belonging to different

subsystems. More generally, in addition to the above fixed point and synchronization examples,

we note that the dimension of the tangent space at a given point z∗ on the attractor is, at most, the

smallest embedding dimension of the part of the attractor in a small neighborhood of z∗. Thus

the full M ×M Jacobian of F(z) at z∗ cannot be precisely determined from data on the attractor

when the local attractor embedding dimension at z∗ is less than M , which is commonly the case.

Thus these examples motivate the conjecture that to efficiently and accurately infer STCD, the

24


orbital complexity of the dynamics should be large enough so as to encode the information that

we seek. Note that these considerations of cases where inference of STCD is problematic do not

apply to situations with dynamical noise, e.g., dz/dt = F(z) + (noise), as the addition of noise

may be roughly thought of as introducing an infinite amount of orbital complexity. Alternatively,

the addition of noise increases the embedding dimension of the data to that of the full state space,

i.e., M .

2.4 Using Reservoir Computer to Determine Short Term Causal Dependence

We base our considerations on a type of machine learning called reservoir computing,

originally put forward in Refs. [81] and [82] (for a review, see Ref. [83]). We assume that we

can sample the time-series data z(t) from our system at regular time intervals of length τ , so

that we have a discrete set of observations {z(0), z(τ), z(2τ), ...}. To begin, we first describe a

reservoir-computer-based machine learning procedure in which the reservoir computer is trained

to give an output ẑ(t+ τ) in response to an M -dimensional input z(t) as illustrated in Fig. 2.1.

For our numerical tests we consider a specific reservoir computer implementation (Fig.

2.1) in which the reservoir consists of a network of R ≫ M nodes whose scalar states,

r1(nτ), r2(nτ), . . . , rR(nτ), are the components of the R-dimensional vector r(nτ).

The nodes interact dynamically with each other through an R×R network adjacency matrix

A, and their evolution is also influenced by coupling of the M -dimensional input z(nτ) to the

individual nodes of the reservoir network by the M × R input coupling matrix Win according to

the neural-network-type of evolution equation (e.g., Refs. [54, 78, 79, 83–85])

r((n+ 1)τ) = tanh(Ar(nτ) +Winz(nτ)), (2.2)

25


z(nτ)

Win

z ̂((n+1)τ)

Wout
r((n+1)τ)

Figure 2.1: Schematic of the reservoir computing architecture used in this work. The input-to-
reservoir coupling matrix Win couples the input time series for the vector z to the reservoir state
vector r. The reservoir-to-output coupling matrix Wout generates the output vector ẑ from the
reservoir. ẑ is found to be a good estimate of z after training.

where tanh(v) for a vector v = (v1, v2, v3, . . .)
T is defined as (tanh v1, tanh v2, tanh v3, . . .)T .

For proper operation of the reservoir computer, it is important that Eq.(2.2) satisfy the ‘echo

state property’ [78, 81, 83] (in nonlinear dynamics this condition is also know as ‘generalized

synchronization’ [86–88]): given two different initial reservoir states, r1∗ and r2∗, for the same

input time series of z, the difference between the two corresponding reservoir states converges to

zero as they evolve in time (that is, |r1(t)− r2(t)| → 0 as t → ∞, implying that, after a transient

initial period, r(t) essentially depends only on the past history of z, z(t′) for t′ ≤ t, and not on the

initial condition for r).

Using measured input training data over a training interval of length Tτ , which begins after

the initial transient period mentioned above, we use Eq.(2.2) to generate r(τ), r(2τ), . . . , r(Tτ).

We also record and store these determined values r(nτ) along with the corresponding inputs,

z(nτ) that created them. The matrices A and Win are regarded as fixed and are typically chosen

26


randomly. In contrast, the R×M output coupling matrix Wout, shown in Fig. 2.1, is regarded as

an adjustable linear mapping from the reservoir states r to an M -dimensional output vector ẑ,

ẑ((n+ 1)τ) = Woutr((n+ 1)τ). (2.3)

‘Training’ of the machine learning reservoir computer then consists of choosing the RM adjustable

matrix elements (‘weights’) of Wout so as to make ẑ(nτ) a very good approximation to z(nτ)

over the time duration (τ, 2τ, . . . , T τ) of the training data. This is done by minimization with

respect to Wout of the quantity,{∑T
n=1 ∥ z(nτ)−Woutr(nτ) ∥2

}
+ β ∥ Wout ∥2. Here β ∥ Wout ∥2, with β small, is a

‘ridge’ regularization term [89] added to prevent overfitting and (r(nτ), z(nτ)) are the previously

recorded and stored training data. In general, R ≫ M is required in order to obtain a good fit of ẑ

to z(t). For illustrative purposes we now consider the ideal case where ẑ = z (i.e., the training

perfectly achieves its goal).

For the purpose of estimating STCD, we now wish to eliminate the quantity r from the basic

reservoir computer system (Eqs.(2.2) and (2.3)) to obtain an evolution equation solely for the state

variable z. To do this, we would like to solve Eq. (2.3) for r in terms of z. However, since R, the

dimension of r, is much larger than M , the dimension of z, there are typically an infinite number

of solutions of Eq.(2.3) for r. To proceed, we hypothesize that it may be useful to eliminate r

by choosing it to be the solution of Eq.(2.3) with the smallest L2 norm. This condition defines

the so-called Moore-Penrose inverse [90] of Wout, which we denote Ŵ−1
out; i.e., the minimum L2

norm solution for r is written r = Ŵ−1
outz. We emphasize that Ŵ−1

outz is not necessarily expected

to give the correct r obtained by solving the system, Eq. (2.2) and Eq. (2.3). However, from

numerical results to follow, our choice will be supported by the fact that it often yields very useful

27


estimates of fji.

Now applying Wout to both sides of Eq. (2.2) and, employing r = Ŵ−1
outz to eliminate

r(nτ) from the argument of the tanh function in Eq. (2.2), we obtain a surrogate time -τ map for

the evolution of z, z((n+1)τ) = H[z(nτ)], where H(z) = Wout tanh[(AŴ−1
out+Win)z] . Here

we note that we do not claim that this map in itself can be used for time-series prediction in place of

Eqs. (2.2) and (2.3), which were commonly used in previous works (e.g., Refs. [54,78,79,84,85])

. Rather, we use it as a symbolic represention of the result obtained after eliminating the reservoir

state vector r from Eqs. (2.2) and (2.3). In particular, the prediction recipe using Eqs. (2.2) and

(2.3) is always unique and well-defined, in contrast to the above map, where W−1
out is clearly

non-unique. So, we use this map only for causality estimation purposes, as described below.

Differentiating H(z) with respect to zi, we have

∂Fj(z)

∂zi
= τ−1

[
∂Hj(z)

∂zi
− δij

]
, (2.4)

where δij is the Kronecker delta, and we propose to use Eqs. (2.1) and (2.4) to determine STCD.

In our numerical experiments, the number of training time steps is T = 6×104 for Figs. 2.2,

2.3 and T = 2× 104 for Fig. 2.4. In each case, the actual training data is obtained after discarding

a transient part of 2 × 104 time steps and the reservoir system sampling time is τ = 0.02. The

elements of the input matrix Win are randomly chosen in the interval [−0.1, 0.1]. The reservoir is

a sparse random network of R = 5000 nodes for Figs. 2.2, 2.3 and of R = 1000 nodes for Fig.

2.4. In each case the average number of incoming links per node is 3. Each nonzero element of

the reservoir adjacency matric A is randomly chosen from the interval [−a, a], and a > 0 is then

adjusted so that the maximum magnitude eigenvalue of A is 0.9. The regularization parameter is

β = 10−4. These parameters are adapted from Ref. [79]. The average indicated in Eq. (2.1) is

28


over 1000 time steps. The chosen time step τ is sufficiently small compared to the timescale over

which z(t) evolves that the discrete time series z(nτ) is a good representation of the continuous

variation of z(t).

Although we use a specific reservoir computing implementation, we expect that, with

suitable modifications, our approach can be adapted to ‘deep’ types of machine learning [62], as

well as to other implementations of reservoir computing [44, 47, 48, 91], (notably implementations

involving photonics [44], electronics [48] and field programmable gate arrays(FPGAs) [47]).

2.5 Tests of Machine Learning Inference of STCD

In order to evaluate the effectiveness of our proposed method, we introduce mathematical

model test systems that we use as proxies for the unknown system of interest for whose state

variables we wish to determine STCD. We next use the test systems to generate simulated training

data from which we determine STCD by our ML technique. We then assess the performance of

the technique by the correctness of its results determined from the known properties of the test

systems.

We first consider examples addressing our Goal (i) (G(q) = |q| in Eq. (2.1)), and for our

simulation test systems, we consider the case of a network of N nodes and L links, where each

node is a classical Lorenz system [92] with heterogeneity from node to node, additive dynamical

noise, and internode coupling,

dxk/dt = −10[xk − yk + c
N∑
l=1

a
(x,y)
kl (yl − yk)] + σDynnkx(t), (2.5)

dyk/dt = 28(1 + hk)xk − yk − xkzk + σDynnky(t), (2.6)

29


dzk/dt = −(8/3)zk + xkyk + σDynnkz(t). (2.7)

The state space dimension of this system is M = 3N . The coupling of the N nodes is taken

to be only from the y variable of one node to the x variable of another node with coupling constant

c, and a
(x,y)
kl is either 1 or 0 depending on whether or not there is a link from l to k. The adjacency

matrix a
(x,y)
kl of our Lorenz network (not to be confused with the adjacency matrix A of the

reservoir) is constructed by placing directed links between L distinct randomly chosen node pairs.

For each node k, hk is randomly chosen in the interval [−h,+h], and we call h the heterogeneity

parameter. Independent white noise terms of equal variance σ2
Dyn are added to the left-hand sides of

the equations for dx/dt, dy/dt and dz/dt, where, for example, ⟨nkx(t)nk′x(t
′)⟩ = 2δkk′δ(t− t′).

For σ = c = h = 0, each node obeys the classical chaotic Lorenz equation with the parameter

values originally studied by Lorenz [92]. Furthermore, denoting the right-hand side of Eq. (2.5)

by Fxk, we have ∂Fxk/∂yl = 10c or 0, depending on whether there is, or is not, a link from yl to

xk.

Since in this case, the derivative ∂Fxk/∂yl is time independent, ⟨|∂Fxk/∂yl|⟩ is also either

10c or 0, and, adopting the notation f
(x,y)
kl = ⟨|∂Fxk/∂yl|⟩, we denote its machine learning estimate

by our previously described procedure by f̂
(x,y)
kl . For a reasonably large network, the number

N2 −N of ordered node pairs (k, l) of distinct nodes is large, and we consequently have many

values of f̂ (x,y)
kl . Bayesian techniques (see Ref. [93] and references therein) can be applied to such

data to obtain an estimate L̂ for the total number of links L, and one can then set the value of ϵ

so that there are L̂ values of f̂ (x,y)
kl that are greater than ϵ. Less formally, we find that making a

histogram of the values of f̂ (x,y)
kl often reveals a peak at zero and another peak at a higher positive

value with a large gap or discernible minimum in between. One can then estimate ϵ by a value in

30


the gap or by the location of the minimum between the peaks, respectively. For simplicity, in our

illustrative numerical simulations to follow we assume that L is known (approximately equivalent

to the case that L is unknown but that a very good estimate (L̂) has been obtained).

Example 1: A heterogeneous noiseless case. We consider the parameter set c = 0.3,

h = 0.06, σDyn = σObs = 0, N = 20, and we vary the number of links L. Figure 2.2(a) (for

L = 50) and (b) (for L = 100) each show an array of 20×20 boxes where each of the boxes

represents an ordered node pair (k, l) of the 20-node network, and the boxes have been colored (see

Table 1) according to whether the results for our procedure predict a link from l to k (“positive”)

or not (“negative”), and whether the prediction is correct (“true”) or wrong (“false”).

TP (True Positive) Black
Square

TN (True Negative) White
Square

FP (False Positive) Blue
Square

FN (False Negative) Red
Square

Table 2.1: Color-coding scheme for Figs. 2.2 and 2.3.

We see that for a typical case with L = 50 (Fig. 2.2(a)) all the boxes have been correctly

labeled, corresponding to all boxes being either black or white. In contrast to this perfect result at

L = 50, at L = 100 (Fig. 2.2(b)) the method fails terribly, and the fraction of correct inferences

is small. In fact, we find excellent performance for L ≤ 50, but that, as L increases past 50,

the performance of our method degrades markedly. This is shown in Fig. 2.2(c) where we give

plots of the number of false positives (FP) normalized to the expected value of FP that would

result if L links were randomly assigned to the N2 −N = 380 node pairs (k, l). (We denote this

normalization ⟨FP⟩R; it is given by ⟨FP⟩R = L(380− L)/380.) Note that, with this normalization,

31


for the different heterogeneities plotted in Fig. 2.2(c), the curves are similar, and that they all

begin increasing at around L = 60 and FP/ ⟨FP⟩R becomes nearly 1 (i.e., inference no better than

random) past L ∼ 100. In our earlier discussion we have conjectured that, for inference of STCD

to be possible, the orbital complexity should not be too small. To test this conjecture we have

calculated the information dimension DINFO of the network system attractor corresponding to

the parameters, c = 0.3, h = 0, σ = 0, N = 20, as a function of L. We do this by calculating the

Lyapunov exponents of the system Eqs.(2.5,2.6,2.7), and then applying the Kaplan-Yorke formula

for DINFO in terms of the calculated Lyapunov exponents. [94, 95] The result is shown in Fig.

2.2(d), where we see that DINFO decreases with increasing L. Regarding DINFO as a measure of

the orbital complexity, this is consistent with our expectation that the ability to infer STCD will be

lost if the orbital complexity of the dynamics is too small. As we next show, the above negative

result for L increasing past about 60 does not apply when even small dynamical noise is present.

Example 2: The effects of dynamical and observational noise. We first consider the effect

of dynamical noise of variance σ2
Dyn for the parameters h = 0 (homogeneous), c = 0.3, N = 20,

L = 200. Results (similar in style to Figs. 2.2(a) and 2.2(b)) are shown in Figs. 2.3(a), 2.3(b),

and 2.3(c). For extremely low dynamical noise variance, σ2
Dyn = 10−9 (Fig. 2.3(a)), the result is

essentially the same as for zero noise, and about one quarter of the boxes are classified TP, TN, FP,

and FN each (since there are 200 links and 400 boxes, this is no better than random assignment).

As the noise variance is increased to σ2
Dyn = 10−7.5 (Fig. 2.3(b)), the results become better, with

a fraction 0.75 of the boxes either TP or TF (as opposed to 0.52 for Fig. 2.3(a)). Upon further

increase of the dynamical noise variance to the still small value of σ2
Dyn = 10−6 (Fig. 2.3(c)),

the results typically become perfect or nearly perfect. Furthermore, excellent results, similar

to those for σ2
Dyn = 10−6, continue to apply for larger σ2

Dyn. This is shown by the red curve in

32


0         l        20  0         l         20

20                      20

k                        k

       L=50, h=0.06              L=100, h=0.06(a)                      (b)

Figure 2.2: Results of Experiment 1 (noiseless case). Panels (a) and (b) show the results of link
inferences for two noiseless cases for L = 50 links and L = 100 links. The inference is perfect
in (a), but is very bad in (b). (c) FP/ ⟨FP⟩R versus L for h = 0, 0.06, 0.15 averaged over 100
random realizations of the system and the reservoir adjacency matrix. (d) The orbital complexity
as measured by the attractor information dimension DINFO decreases with increasing L. Note
that at each value of L, we compute the DINFO for 10 random realizations of a network with L
links with h = 0. The Kaplan-Yorke dimension is then averaged over all network realizations and
the resulting plot is further smoothed by applying a moving average filter.

33


Fig. 2.3(f) which shows FP/ ⟨FP⟩R versus σ2
Dyn (N = 20;L = 200). Importantly, we also note

that our normalization of FP by ⟨FP⟩R essentially makes the red curve L-independent over the

range we have tested, 50 ≤ L ≤ 200. Our interpretation of this dynamical-noise-mediated strong

enhancement of our ability to correctly infer links is that the dynamical noise allows the orbit to

explore the state space dynamics off the orbit’s attractor and that the machine learning is able to

make appropriate good use of the information it thus gains.

We now turn to the effect of observational noise by replacing the machine learning time

series training data formerly used, [xk(nτ), yk(nτ), zk(nτ)], by [xk(nτ) + σ̂Obsn̂kx(nτ), yk(nτ) +

σ̂Obsn̂ky(nτ), zk(nτ) + σ̂Obsn̂k(nτ)], where the parameter σ2
Obs is the observational noise variance

and the n̂kx, n̂ky, n̂kz are independent Gaussian random variables with, e.g., ⟨n̂kx(nτ)n̂k′x(n
′τ)⟩ =

2δkk′δnn′ . The blue curve in Fig. 2.3(f) shows the effect of adding observational noise of variance

σ2
Obs on top of dynamical noise for the situation σ2

Dyn = 10−5 of Fig. 2.3(c). We see from Figs.

2.3(d)-(f) that, when σ2
Obs is below about 10−5, it is too small to have much effect, but, as σ2

Obs is

increased above 10−5, the observational noise has an increasing deleterious effect on link inference.

This negative effect of observational noise is to be expected, since inference of characteristics

of the unknown system is necessarily based on the part of the signal that is influenced by the

dynamics of the unknown system, which the observational noise tends to obscure.

Example 3: Inferring Continuous Valued Dependence Strengths. We now wish to address

Goal (ii) (for which we take G(q) = q in Eq. (2.1)) and we, accordingly, consider the case where

f
(x,y)
kl for each (k, l) takes on a value in a continuous range (rather than the case of Examples 1

and 2 where f
(x,y)
kl is either 10c or zero for all (k, l)). For this purpose we replace Eq. (2.5) by

dxk/dt = −10(xk − yk) +
∑
l

f
(x,y)
kl yl, (2.8)

34


2

2222

2 2 2

2

2
2
2

2

2

2

2

Figure 2.3: The effect of noise on STCD inference. Panels (a), (b), and (c) shows the effect
of increasing the dynamical noise variance σ2

Dyn is to greatly enhance the effectiveness of link
identification even at the rather low noise level of σ2

Dyn = 10−6. In contrast, as shown in panels
(d), (e), and (f), starting with the situation (c) and increasing the observational noise variance σ2

Obs
degrades link identification. L = 200, h = 0 for all the subfigures here.

and consider Eqs. (2.6,2.7, 2.8) as our new test system, with h = 0.9, σ2
Dyn = σ2

Obs = 0, and

N = 100 nodes (corresponding to 100 × 100 = 104 possible connection strength values). We

choose the connection strength values as follows. A photographic portrait of Edward N. Lorenz is

divided up into 100× 100 = 104 boxes and, by using a shading scale from dark (coded as +10) to

35


light (coded as -5), Fig. 2.4(a) is obtained, with the shading scale given to the right of Fig. 2.4(b).

Setting f
(x,y)
kl equal to the color scale value of box (k, l), we next numerically solve Eqs. (2.6,2.7,

2.8). We then use this orbit as the training data for input to our ML determination of causal strength

dependence, f̂ (x,y)
kl , and employing the same shading scale, use the thus determined values of f̂ (x,y)

kl

to reconstruct the original portrait, as shown in Fig. 2.4(b). We see that, although the reproduction

is not exact, the overall picture is still clearly recognizable, indicating the effectiveness of the

method for Goal (ii). For a more quantitative comparison of the actual and the estimated Jacobian

elements, we calculate the normalized Frobenius norm of their difference matrix f (x,y) − f̂ (x,y).

We first apply upper and lower cut-offs equal to 10 and -5.5 respectively to f̂ (x,y), in order to

eliminate some extreme values. Then we calculate the ratio

δ =

∣∣∣∣∣∣f (x,y) − f̂ (x,y)
∣∣∣∣∣∣
F〈∣∣∣∣∣∣f (x,y) − f̃ (x,y)

∣∣∣∣∣∣
F

〉 , (2.9)

where ||M ||F =
√

Trace(M †M) =
√∑

i,j

|Mij|2 is the Frobenius norm of the matrix M . Here

f̃ (x,y) denotes a matrix constructed by randomly permuting the elements of the matrix f (x,y), and

the angled brackets denote an average over such random permutations. So this ratio compares the

total error in the inferred Jacobian with the variation in the original matrix elements of f (x,y). For

example, for a perfect estimation of f (x,y), we will have δ = 0. In contrast, δ = 1 means that the

prediction error is equal to the average error when the elements of f (x,y) are randomized. For the

example shown in Fig. 2.4, we find that δ is approximately equal to 0.37.

36


Figure 2.4: Results of Experiment 3. Panel (a) shows a 100×100 pixelated, shade-coded portrait
of Edward N. Lorenz; (b) reconstruction of (a) by our ML link inference technique. Note that, in
(b), we plot all the values greater than or equal to 10 as black and all the values less than or equal
to −5.5 as white.

2.6 Discussion

In this chapter, we have formulated and tested a new, highl