ABSTRACT Title of dissertation: QUANTIFYING FLOWS IN TIME-IRREVERSIBLE MARKOV CHAINS Danielle Middlebrooks Doctor of Philosophy, 2020 Dissertation directed by: Professor Maria Cameron Department of Mathematics Stochastic networks a.k.a. Markov chains allow us to model phenomena in systems arising in many applications. The appeal of stochastic networks is that they offer a mathematically tractable and robust model focused on the most important features of the system. Nevertheless, stochastic networks approximating complex systems can be huge and unstructured, and an effective description of their dynamics is a challenging mathematical problem. This dissertation is motivated by our study of two models of a gene regulatory network (GRN), one deterministic [1] and one stochastic [2], which describes the budding yeast cell cycle. A GRN with N nodes can be straightforwardly converted into a Markov chain with 2N states. Our scientific goal is to understand how the stochasticity affects the stability of the cell cycle in the GRN. This gives rise to our mathematical goal: to develop efficient tools for quantifying dynamics of large time-irreversible Markov chains. Our methodological developments are built upon the transition path theory (TPT) [16] which is a general framework for describing transitions in Markov chains between two subsets of states. In TPT, the transition process is described by the so- called effective current. We have realized that the effective current gives a lopsided description of the transition process in the case of time-irreversible networks where elementary cycles of length greater than two are present. Thus, we have introduced the so-called acyclic current that gives a quantitative description of a transition process and proposed an algorithm to compute it. Moreover, we have developed a general recipe to modify the generator matrix of a given Markov chain in order to make the stationary probability current and the invariant distribution in the modified chain coincide with a desired current and a desired invariant distribution in the original chain. Finally, we have applied these tools to the budding yeast cell cycle GRN. Our results show which edges are essential and which ones are redundant. Our computations eloquently demonstrate that stochasticity makes the GRN much more stable with respect to edge removals. This conclusion is consistent with Q. Nie?s statement [26] that stochasticity plays a fundamental role in biological processes. Quantifying Flows in Time-Irreversible Markov Chains by Danielle Middlebrooks Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2020 Advisory Committee: Professor Maria Cameron, Chair/Advisor Professor Kasso Okoudjou Professor Howard Elman Professor Konstantina Trivisa Professor Michelle Girvan ? Copyright by Danielle Middlebrooks 2020 Dedication To my grandparents. Forever and always in my heart. ii Acknowledgments Over the past six years I have received support and encouragement from a number of individuals. First and foremost, I would like to express my deepest gratitude to my advisor, Professor Maria Cameron. She has gone above and beyond in her role as an advisor and not only has made me a better researcher, but a better person. My success in graduate school and the completion of this dissertation would not be possible without her guidance, support, and patience, and I will forever be indebted to her. I am also extremely grateful to my committee members. To Kasso Okoudjou for his unwavering support and belief in my abilities, Howard Elman and Kon- stantina Trivisa for their guidance and helpful advice when I first arrived at the university and throughout years here, and Michelle Girvan for her helpful contribu- tions and suggestions for part of this project. I wish to thank my loving parents, Gwen and Rodney and my older brother Rodney Jr. I am thankful for their love and support throughout my life. Thank you for always believing in me and encouraging me to always reach for the stars. To my aunts, uncles and many cousins, thank you for your moral support and constant words of encouragement. To my many friends near and far, thank you for listening, offering me advice, and supporting me through this entire process. Thank you to my AMSC cohort- Addison, Tengfei, Cara and David for your friendship during our time at UMD. My professors at Spelman College for encouraging me to attend graduate school iii and their words of kindness throughout the years. Thank you to my 2014 EDGE program sisters in mathematics for going through this journey with me and sharing your experiences as well as listening to mine. A special thank you to one of my closest friends Kayla Davie for always being a shoulder to lean on and someone to vent to. I am so glad you came to UMD and I can?t wait for you to be up next! I would like to thank the COMBINE program for their financial support. To the COMBINE staff and students, thank you for listening to my research talks and providing valuable feedback. Above all, I would like to thank God for giving me the strength, knowledge and ability to complete this dissertation. Without His blessings, this achievement would not have been possible. iv Table of Contents Dedication ii Acknowledgements iii List of Tables vii List of Figures viii List of Abbreviations ix 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 A Brief Summary of Main Results . . . . . . . . . . . . . . . . . . . . 4 2 Background 9 2.1 Discrete-time Markov Chains . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Markov Jump Process (MJP) . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Transition Path Theory (TPT) . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 Basic Concepts of the TPT . . . . . . . . . . . . . . . . . . . 16 2.3.3 Sampling Techniques for Reactive Trajectories . . . . . . . . . 18 2.3.4 Analyzing Markov chains using TPT . . . . . . . . . . . . . . 19 3 Designing MJP with Desired Stationary Currents 20 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Setup and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Designing Modified MJP . . . . . . . . . . . . . . . . . . . . . . . . . 24 4 Generating Acyclic Current 29 4.1 Depth-First Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Tarjan?s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Cycle Removal Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3.1 Non-uniqueness in Cycle Removal Algorithm . . . . . . . . . . 35 4.4 An illustrative example: a discretized Maier-Stein model . . . . . . . 37 v 5 Application to Gene Regulatory Network (GRN) 41 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.1 Cell Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.2 Gene Regulatory Network (GRN) . . . . . . . . . . . . . . . . 43 5.1.3 Dynamical Network . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1.4 Stochastic model of GRN . . . . . . . . . . . . . . . . . . . . 50 5.2 A ?Mutation Analysis? of the Deterministic GRN . . . . . . . . . . . 52 5.2.1 Identifying Redundant Edges . . . . . . . . . . . . . . . . . . 52 5.3 A ?Mutation Analysis? of the Stochastic GRN . . . . . . . . . . . . . 56 5.3.1 Exploring Dynamical Network . . . . . . . . . . . . . . . . . . 56 5.3.2 Addressing the Issue of Nonuniqueness of the Acyclic Current. 60 5.3.3 Identifying Redundant Edges . . . . . . . . . . . . . . . . . . 64 5.4 Comparison of the Deterministic and Stochastic Models . . . . . . . . 69 6 Conclusion 72 6.1 Summary of Methodological Advances . . . . . . . . . . . . . . . . . 72 6.2 Summary of Application to GRN . . . . . . . . . . . . . . . . . . . . 73 6.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Bibliography 77 vi List of Tables 5.1 The 11 different protein/ protein complexes, node identification and their function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2 Deterministic evolution of protein states . . . . . . . . . . . . . . . . 47 5.3 Fixed points and basin size determined by deterministic model . . . . 50 5.4 Edge removal analysis for deterministic model (i) . . . . . . . . . . . 54 5.5 Edge removal analysis for deterministic model (ii) . . . . . . . . . . . 55 5.6 Edge removal analysis for stochastic model (i) . . . . . . . . . . . . . 66 5.7 Edge removal analysis for stochastic model (ii) . . . . . . . . . . . . . 68 vii List of Figures 2.1 Discrete-time Markov chain . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Markov Jump Process . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 An illustrative example of reactive trajectories . . . . . . . . . . . . . 15 3.1 Effective currents and committor values in a time-irreversible MJP . . 22 4.1 Worst-case scenario cost of Algorithm 3 . . . . . . . . . . . . . . . . . 34 4.2 Non-uniqueness in cycle removal algorithm . . . . . . . . . . . . . . . 36 4.3 An illustrative example of cycle removal algorithm . . . . . . . . . . . 40 5.1 Schematic of budding yeast cell cycle. . . . . . . . . . . . . . . . . . . 42 5.2 Gene regulatory network of budding yeast cell cycle. . . . . . . . . . . 44 5.3 Dynamical network for the deterministic GRN . . . . . . . . . . . . . 49 5.4 Acyclic current for stochastic GRN . . . . . . . . . . . . . . . . . . . 57 5.5 Minimax current for stochastic GRN . . . . . . . . . . . . . . . . . . 58 5.6 Nonunique Analysis 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.7 Nonunique Analysis 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.8 Nonunique Analysis 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.9 Comparison of edge removal algorithms for deterministic and stochas- tic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 viii List of Abbreviations DFS Depth-first search GRN Gene regulatory network MJP Markov jump process SCC Strongly connected component TPT Transition path theory ix Chapter 1: Introduction 1.1 Overview Over recent years, the use of large graphs or networks has emerged as a popular tool for data representation and organization [3]. Through networks, one is able to identify patterns and structure within the data. The use of networks can be applied in many fields of science, including economics [4], chemical physics [5], social sciences [6, 7] and biology [8, 9]. Data points are mapped onto the set of nodes in the network and edges or links in the network indicate pairwise relationships between the nodes. For instance, webpages on the World Wide Web can be linked according to URLs, banks and businesses across the country can be linked according to investments made, and proteins in a cell can be linked according to their action on other proteins. The size and complexity of such networks exceeds by far those of graphs con- sidered in the classic graph theory [8]. This fact led to the emergence of research in network science. Network science is a relatively new discipline that has only been able to blossom because of computer technologies. With computers, scientists are capable of analyzing large-scale networks such as the World Wide Web which has 1 approximately 500 billion nodes [10]. Network science draws on theories and meth- ods from graph theory, statistical mechanics, inferential modeling and information visualization to solve challenging problems in various real-world applications [3]. While detecting structure in a network is crucial for finding relationships within the data [11, 12], important questions that we would like to answer arise from un- derstanding the dynamics of a network. For instance, in modeling Lennard-Jones particle configurations [5], where nodes correspond to local potential minima and edges represent transition states between them, one can find probabilities from one configuration to another. In this work, we consider dynamical networks arising from gene regulatory networks (GRNs). One way of analyzing the dynamics of these net- works is to represent these positive weighted networks as a Markov chain. Hence, tools for studying Markov chains can be used for understanding the dynamics of GRNs and validating the GRN models. Furthermore, in this work we address the question of robustness of the system modeled by a GRN with respect to edge re- movals. Several approaches for quantifying the dynamics of complex networks have been introduced. Bovier et al. developed the potential theoretical approach and advanced the mathematical spectral theory of metastability [13, 14]. This theory is built upon an analogy with electric circuits. This work derived sharp estimates for low-lying spectra of time-reversible Markov chains. Another approach is the tran- sition path theory (TPT) which was originally proposed by E and Vanden-Eijnden in the context of stochastic differential equations [15] and then transferred and ex- 2 tended to networks in [16, 17]. Like the potential theoretical approach, TPT was inspired by an analogy with electric circuits. However, TPT does not assume time- reversibility and focuses on the statistical analysis of so-called reactive trajectories (see next paragraph). An alternative approach for analyzing reactive trajectories based on a set of recurrence relationships proposed by Manhart and Morozov [18,19]. After considering several options, we found TPT meets our needs to the largest extent. The basic idea of TPT is to single out two specific subsets of nodes, a ?reactant? set A and a ?product? set B, and analyze the statistical properties of the trajectories by which transitions between these sets occur. Those trajectories that travel strictly from A to B without returning to A in between are called reactive. Statistical analysis of reactive trajectories gives a quantitative description of the transition process between the sets A and B. 1.2 Goals In this dissertation, we are mainly interested in applications where the given Markov chain is time-irreversible. Time-irreversible Markov chains often arise in biological models, in particular gene regulatory networks (GRN) [2,20]. In a GRN, the nodes are a collection of molecular regulators that govern gene expression levels which describe a biological process of a given cell. Edges between nodes represent interactions. These interactions can be considered ?activating?, with an increase in the concentration of one leading to an increase in the other, or ?inhibiting?, with an increase in one leading to a decrease in the other. The nodes could also reg- 3 ulate themselves directly, creating feedback loops. Gene regulatory networks have recently attracted a great deal of interest [20?23]. In particular, budding yeast have been under investigation since the underlying molecular machinery regulating this process has been highly studied and mathematical models of how the interacting proteins control each stage of the process already exist [1, 2, 24, 25]. While there are ? 800 genes involved in the process of the budding yeast cell cycle, there are only 11 key regulators that are responsible for the control and regulation of this process. Hence, the budding yeast GRN consists of only 11 nodes [1]. Our goal is to develop computational tools in the framework of TPT that would allow us to analyze the dynamics of a stochastic budding yeast GRN. We wish to understand and quantify the effects of stochasticity in this GRN. In particular, we would like to establish which edges are critical for the proper function of the cell cycle, and which are not, and understand how stochasticity effects these subsets of edges. To bench- mark the effect of stochasticity, we first answer this question for the corresponding deterministic GRN. 1.3 A Brief Summary of Main Results In the context of TPT, the effective current is defined as the net average number of reactive trajectories per unit time making a transition from one node i to another node j on their way from A to B. If a Markov chain is time-reversible, the effective current is acyclic and we can effectively describe the transition process of the induced graph. If a Markov chain is time-irreversible, the effective current 4 may be cyclic. As a result, the effective current along a single edge along the path from A to B might be larger than total reactive current coming out of A which is the same as the total reactive current going into B. This was first observed when applying the tools from TPT to the stochastic model of the gene regulatory network in [2]. This dissertation involves three major parts: (1) we propose a general framework for designing modified Markov chains in order to quantify transitions in time-irreversible Markov chains, (2) we introduce an algorithm for obtaining an acyclic current and generating acyclic reaction pathways, and (3) we propose a so- called ?mutation analysis? for gene regulatory networks allowing one to access their robustness and use the proposed tools to analyze a stochastic budding yeast gene regulatory network. We develop a general framework for modifying the original time-irreversible irreducible Markov chain to make it have a desired stationary current and a desired invariant distribution. We apply our recipe to time-irreversible networks so that the stationary currents in the modified Markov chains are equal to the effective currents and acyclic currents in the original ones. Our construction generalized Propositions 1 and 2 in Ref. [17]. We can then sample acyclic trajectories using an acyclic current. In order to design a modified Markov chain and generate the desired trajec- tories, we propose a cycle removal algorithm which introduces a so-called acyclic current. While an acyclic current may not be unique, we devoted substantial ef- fort to address this issue in our application to the gene regulatory network. Our algorithm is designed for generating a weighted directed acyclic graph from a graph 5 G(S, {f+}) where S is the set of vertices inherited from the original Markov chain, and f+ ? 0 is the effective current used for the weights of the arcs. Our methodology for quantifying transition processes in Markov chains is ap- plied to the stochastic GRN representing the budding yeast cell cycle to determine essential edges in the network. The cell cycle is a process in which one cell grows and then proceeds to divide into two cells. The cell cycle consists of four phases: growth (G1), DNA synthesis (S), gap (G2) and mitosis (M). We consider two models of the gene regulatory network for the budding yeast cell cycle: one deterministic and one stochastic. Li et al. (2004) [1] developed a deterministic model which uses boolean functions as update rules to model the dynamics of the GRN. The GRN consists of only 11 proteins/protein complexes. The dynamics of the gene regulatory network is defined by (i) the influence matrix constructed by the interaction between nodes and (ii) the transition rules defined in [1]. In 2006, Zhang et al. [2] modified the deter- ministic model [1] by making the transition rules stochastic. The stochastic model builds upon the deterministic model by adding certain degrees of unpredictability or randomness that may happen due to the environment and allows for the model to self-organize. Stochasticity plays a fundamental role in biological processes [26?28]. Due to internal noise, genetically identical cells may assume different fates within a consistent environment. This can be caused by the inherently random nature of the biochemical reactions of gene expression which can cause noisy gene expression levels [29]. The models [1,2] allow us to describe the dynamics of the GRN through what 6 we call the dynamical network. Each node can be in one of two states, either active or passive. Each phase of the cell cycle is described by certain protein/protein complexes being active. For any GRN with n nodes, the dynamical network is the graph with 2n vertices representing all possible cell states, and edges representing transitions between the states. This dynamical network produces a discrete-time Markov chain. In particular, one that is time-irreversible. Thus, for the stochastic GRN, we use transition path theory and our cycle removal algorithm to quantify the transitions starting at the excited growth state, through all cell cycle phases, then arriving at the stationary growth state right before the process repeats. Finally, we perform a ?mutation analysis? in order to identify redundant edges in the GRN. For the deterministic GRN, we apply the following steps: We first distort the GRN by removing one edge and adjusting the influence matrix. We then apply the transition rules described in [1] to obtain the dynamical network and explore it using the depth-first search (DFS) algorithm. Finally, we check if there is a pathway corresponding to the cell cycle going from the excited growth phase to the stationary growth phase. If such a pathway exists, we compare it to the one in the original dynamical network and mark changes in it if any. We repeat this process for all edges of the regulatory network in order to determine which edges can be removed without a major effect on the cell cycle. The stochastic transition rules create a dense stochastic matrix. This results in a complete weighted dynamical network. Since DFS ignores the weights of a weighted graph, this method alone is unsuitable here. To identify essential edges in 7 the stochastic GRN, we remove an edge and use the transition path theory (TPT) and our cycle removal algorithm to the respective dynamical network to obtain path- ways between the excited growth phase and the stationary growth phase carrying at least 20% of the normalize acyclic current. We repeat this process for each edge in the GRN to determine essential and non-essential edges. We found that in the deterministic GRN, about 41% of edges can be removed without a major effect on the cell cycle. In the stochastic one, this is true for 88% of edges. This suggests that stochasticity renders the cell cycle much more robust. This is consistent with Q. Nie?s statement that stochasticity plays a fundamental role in biological processes [26]. The rest of this dissertation is organized as follows. Chapter 2 gives an overview of Markov chains and Markov jump processes as well as introduces the key concepts and definitions of transition path theory. Throughout this work we consider Markov jump processes but our results are also applicable to discrete-time Markov chains. In Chapter 3, we introduce our construction for designing modified Markov jump processes to generate desired reactive trajectories. This is accompa- nied with a theorem and proof justifying this construction. In Chapter 4, we propose our algorithm for generating an acyclic current from the original time-irreversible Markov jump processes. Chapter 5 applies this methodology to the stochastic GRN for the Budding yeast cell cycle to analyze the dynamics of the model under muta- tions. Chapter 6 summarizes the project and gives avenues for future research. 8 Chapter 2: Background Markov chains are among the most well known and established probabilistic models. Dating back to the early 20th century, they have long been developed, studied and applied to real world problems. In Sections 2.1 and 2.2, we present definitions and basic properties of both discrete-time and continuous-time Markov chains. An important arsenal of tools for analysis of transitions taking place in Markov chains is offered by the transition path theory (TPT). The TPT was originally in- troduced in 2006 by W.E. and E. Vanden-Eijnden [15] as a framework for analysis of rare reactive events in systems evolving according to stochastic differential equa- tions. It was adopted for continuous-time Markov chains also known as Markov jump processes by Metzner et al. in 2009 [16]. The TPT was motivated by the transition state theory (TST) developed by Eyring in 1935 [30] to quantify chemical reactions. In Section 2.3, we give an overview of TPT and discuss basic definitions and theorems that will be necessary throughout this dissertation. 9 2.1 Discrete-time Markov Chains Let S denote the state space which is finite or countable. We consider discrete moments of time 0, 1, 2, ... and denote the state of the system at time n by Xn. A discrete-time Markov chain is defined as a sequence of random variables, (Xn)n?0 taking on values in S, that is characterized by the Markov property, which means the future state of the process is only dependent on its present state [31]. In other words, given the present state Xn, any future state Xn+k is independent of any past state Xn?m. Thus a Markov chain is characterized by a transition matrix P where Pi,j is the probability that the Markov chain will be in state j in the next time step given that it is currently in state i. Note that the ith row of P is the probability distribution for Xn+1 conditioned on the fact that Xn = i. Therefore, all entries of the matrix P are nonnega??tive, and the row sums are equal to one i.e.????Pi,j ? 0, ?i, j ? S, i =6 j???? (2.1) j?S Pi,j = 1, ?i ? S Any matrix P satisfying these conditions in called stochastic. A formal definition of a Markov chain is: Definition 1. A sequence of random variables (Xn)n?0 is a Markov chain with initial distribution ? and stochastic matrix P if ? X0 has distribution ? = {?i|i ? S} and ? the Markov property holds P(Xn+1 = in+1|Xn = in, ..., X0 = i0) = P(Xn+1 = in+1|Xn = in) = pin,in+1 . 10 1/3 1 2 1/3 1/3 1 0 0 0 1/3 0 1/3 1/3 1/2 1/2 ? = 0 1/2 0 1/2 0 1/2 1/2 0 1/2 3 4 1/2 Figure 2.1: Graph representation of a discrete-time Markov chain with 4 nodes. The probability to move from one state to another is given along each arc. The matrix representation is given by matrix P . A probability distribution in a Markov chain is called equilibrium and denoted by ? if ?P = ?. (2.2) Markov chains can be divided into time-reversible and time-irreversible. A Markov chain is time-reversible if and only if it satisfies the detailed balance condi- tions ?iPi,j = ?jPj,i ?i, j. In other words, for each pair of states i, j, as the length of time interval tends to infinity, the expected numbers of transitions from i to j and from j to i, divided by the length of the time interval, approach the same limit equal to ?iPi,j = ?jPj,i. In this dissertation, we are mainly interested in applications where the given Markov chain is time-irreversible. The stochastic matrix for the time-reversed chain is defines as follows.: Let P? be the transition matrix for the time-reversed process, 11 then P?i,j = P (Xm = j|Xm+1 = i) P (Xm = j,Xm+1 = i) = P (Xm+1 = i) P (Xm = j)P (Xm+1 = i|Xm = j) = P (Xm+1 = i) ?j = Pj,i (2.3) ?i using Bayes? theorem. 2.2 Markov Jump Process (MJP) In discrete-time Markov chains, the transitions can occur only at discrete mo- ments of time. More general models allowing the transitions to happen at any moment of time and enabling faster simulations are continuous-time Markov chains also known as Markov Jump Processes (MJPs). The dynamics of a MJP is charac- terized by the generator m??atrix L where????Li,j ? 0, ?i, j ? S, i 6= j???? (2.4) j?S Li,j = 0, ?i ? S ? The entries of the generator matrix are interpreted as follows: Li = j 6=i Li,j is the escape rate from i. The ratio Li,j/Li is the probability to jump from i to j. For brevity we call Li,j pairwise rates. We assume L is irreducible, meaning it is possible to get to any state from any other state in the system. It has a unique equilibrium distribution satisfying ? ?TL = 0, ?i = 1. (2.5) i?S 12 7 1 2 0 0 0 0 6 2 7 ?15 6 2 5 1 ? = 0 5 ?8 3 0 1 4 ?5 4 3 4 3 Figure 2.2: Graph representation of a Markov jump process with 4 nodes. The pairwise rate to move from one state to another is given along each arc. The matrix representation is given by matrix L. Moreover, independent of the initial distribution, the probability distribution in the system approaches ? as time tends to infinity. The generator for the time-reversed MJP is defined by ?j L?i,j = Lj,i. (2.6) ?i 2.3 Transition Path Theory (TPT) The discrete transition path theory (TPT) offers a framework to give a quan- titative description of the transition process between two given disjoint subsets of states in systems evolving according to MJPs [16]. 2.3.1 Motivation The transition state theory (TST) [30], [32] was introduced in the 1930?s to provide a framework for the description of rare events. The TST was derived in the context of analyzing the rate of chemical reactions R ? P , where R denotes the 13 reactant state and P the product state. The idea is to pick the optimal dividing surface where the number of crossings from R to P is minimal. However, it is not clear where the optimal surface is. Even if the dividing surface is picked optimally, the number of crossings can be significantly larger than the number of transitions from the reactant state to the product state within any interval of time. This hap- pens due to recrossings. While TST was primarily used to understand qualitatively how chemical reactions take place, some applications require a more quantitative description. Another relevant approach for describing rare events is the transition-path sampling (TPS) [33]. Instead of counting the number of crosses of some dividing surface as in the TST, the TPS generates an ensemble of reactive trajectories, i.e., those pieces of trajectories that leave the reactant state and enter the product state. Figure 2.3 depicts a few reactive trajectories between two subsets of states repre- senting the reactive state and the product state respectively. Thus, all the relevant information can then be extracted from the ensemble, such as the reaction mech- anism and the transition states. However, these reactive trajectories may be very complicated and hard to sample. An alternative approach to analyze the reactive trajectories that does not in- volve either of the mentioned issues is the transition path theory (TPT) [15]. The TPT provides a framework for quantifying rare events and enables us to efficiently generate reactive trajectories. The TPT gives precise answers to the following ques- tions: 14 Figure 2.3: Examples of Reactive Trajectories. Given two disjoint subsets denoted A and B, the reactive trajectories (red arrows) are those trajectories that go from A to B without returning to A in-between. ? What is the probability to observe a reactive trajectory at a given state? ? What is the net amount of reactive current going through a given state? ? What is the reaction rate, i.e., the transition rate between two substates, say A (reactant) and B (product)? ? What are the mechanisms of the reactions and how to describe them effec- tively? The TPT relates to Bovier and collaborators? [13, 14] works on quantification of transitions in continuous-time and discrete-time Markov chains and built upon the classic potential theory. Contrary to the work of Bovier, TPT does not rely on the assumption the Markov chain is time-reversible. 15 2.3.2 Basic Concepts of the TPT We are interested in transitions from one subset A ? S to another disjoint subset B ? S. The TPT gives a conceptual apparatus for describing statistical properties of the reactive trajectories, i.e., those trajectories that start at A and go to B without returning to A in-between. We define the key functions of the TPT describing the reactive trajectories on the state space S. The cornerstone functions of the TPT are the forward and backward commit- tors denoted by q+ = (q+i )i?S and q ? = (q?i )i?S respectively. The answers of the aforementioned questions are given in terms of them. The forward committor is the probability that the process starting at a state i ? S will first reach B rather than A. One can readily check that the forward committor function satisfies the following system of linear algebraic?equations:??????????? Li,jq + j = 0, i ? S \ (A ?B) j?S ??????q + i = 0, i ? A (2.7) ????q+i = 1, i ? B The backward committor is the probability that the process arriving at state i last came from A rather than?B. It satisfies:?????????? ? ?? L?i,jqj = 0, i ? S \ (A ?B) j?S ????q?? i = 1, i ? A (2.8) ????q?i = 0, i ? B 16 with L?i,j being the generator for the time-reversed process. Once the committors are computed, one can express some basic statistical properties of the reactive tra- jectories. ? The probability distribution of reactive trajectories is given by [16]: mR = ? q?i i i q + i (2.9) ?m R i is the probability to find a reactive trajectory at state i ? S at time t. mRi is e?qual to the probability for a trajectory to be reactive. Note thati this sum: mRi ? 1. i?S ? The probability current of reactive trajectories is defined as the average num- ber of transitions per unit time from state i to j performed by reactive trajec- tories. It is proven in [16] th?at it can be expressed as????? q?L q+i i i,j j , if i =6 j fi,j = ???? (2.10)0, otherwise ? The effective current gives the net average number of reactive trajectories per time unit making a transition from i to j on their way from A to B. It is defined in [16] as f+i,j = max{fi,j ? fj,i, 0} (2.11) ? The transition rate from A to B is the average number of transitions per unit time performed by an infinite trajectory: NAB(T ) ?AB = lim T?? T 17 where NAB(T ) is the total number of transitions from A to B up to time T . This transition rate is equal to the total reactive current coming out of A which is the same as the total reactive current going into B. It is proven in [16] that: ? ? ? ? = f+AB i,j = fi,j = fi,j (2.12) i?A,j?S i?A,j?S i?S,j?B It is also proven in [16] that the probability current of reactive trajectories is conserved at each reactive state i ? (A?B)c. This theorem is important for deriving our results and is given below. Theorem 1. The probability current is conserved at each reactive state i.e. ? (f ci,j ? fj,i) = 0 for all i ? (A ?B) (2.13) j?S 2.3.3 Sampling Techniques for Reactive Trajectories In 2013, M. Cameron and E. Vanden-Eijnden [17] proposed tools to generate reactive trajectories as well as the special kind of reactive trajectories called no- detour reactive trajectories for time-reversible MJPs. No-detour reactive trajectories for time-reversible networks are those along which the committor function is strictly increasing. Note that, for time-reversible networks, q+ ?i = 1 ? qi and L? = L. The generation of no-detour reactive trajectories relies on the concept of the reactive current given by Fi,j := fi,j ? fj,i. (2.14) 18 In general, reactive trajectories can be very long, but in the time-reversible case the no-detour reactive trajectories are much shorter. The extension of the con- cept of the reactive current to the case of time-irreversible Markov chains is not straightforward. In time-reversible MJPs, the effective current accurately describes the reactive current. However in time-irreversible MJPs, the definition of the ef- fective current given in [16] might lead to undesirable consequences such as the reactive current along some edges of a reactive trajectory may exceed the transition rate ?AB. In this dissertation, we resolve this issue and introduce a counterpart of the effective current and no-detour reactive trajectories for time-irreversible Markov chains. Moreover, we propose an algorithm to generate such trajectories and offer its theoretical justification. 2.3.4 Analyzing Markov chains using TPT One can quantify the transition process in the MJP using the TPT as follows. 1. Solve forward and backward committor equations. 2. Find probability current of reactive trajectories. 3. Find transition rate. In Chapter 3, we propose a methodology for designing MJPs with desired stationary currents from time-irreversible MJPs and provide their theoretical justi- fication. 19 Chapter 3: Designing MJP with Desired Stationary Currents In [17], two modified MJPs were designed for the original time-reversible ir- reducible MJP. The stationary probability current in the first of them coincided with the probability current of reactive trajectories, and the invariant probability distribution matched the one for reactive trajectories. The second MJP was con- structed so that the stationary current in it was equal to the reactive current in the original MJP. These modified MJPs justified the algorithms for generating reactive trajectories and the so-called no-detour reactive trajectories. We remind that the stationary probability current in a MJP is given by Ji,j = ?iLi,j ? ?jLj,i. Note that this current is always zero for time-reversible MJPs and nonzero otherwise. In this chapter, we propose a general framework for designing modified MJPs with desired stationary current and desired invariant distribution and prove a the- orem justifying our construction. We introduce a so-called acyclic current that is a counterpart of the reactive current for time-irreversible MJPs. We apply our recipe to time-irreversible networks so that the stationary currents in the modified MJPs are equal to the effective currents and acyclic currents in the original ones. 20 3.1 Motivation In many applications, it is of interest to obtain an effective description of the transition process between subsets of states. If the underlying Markov process is time-reversible, the transition process between subsets of interest can effectively be described by the reactive current defined by (2.14). However, the concept of reactive current has not been defined for time-irreversible networks for a reason that we discuss below. Note that Metzner et al. [16], where time-reversibility is not assumed, does not contain the term ?reactive current?. It introduces only the probability current of reactive trajectories (2.10) and the effective current (2.11). Thus, we propose an extension of the reactive current. We first consider a simple example that will allow us to highlight some im- portant facts about time-irreversible MJPs and understand how to resolve some issues regarding the definition of the reactive current and reaction pathways. Let us quantify the transitions between the states a and b in the MJP depicted in Fig. 3.1 (top left). We define the time-reversed MJP shown in Fig. 3.1 (top right), and set A = {a} and B = {b}. Then we calculate the forward and backward committors, q+ and q?, and compute the probability current of reactive trajectories, fi,j. For this MJP, the effective current f+ coincides with the reactive current. We see that the directed graph G(S, {f+}) induced by f+ contains a cycle, which would never happen for a time-reversible MJP. Moreover, if we follow a single trajectory of this chain starting at A, the effective current along some of its edges may exceed the total effective current emanating from A, i.e., the transition rate. This happens in 21 Original MJP Time-reversed MJP 1 1 1 2 2 1 1/2 1 2 1 1 1 a b 1/2 1a b 1 3 1 1 13 Forward committor, Backward committor and Effective current 2/3, 2/3 4/27 2/3, 2/3 1 2 1/9 1/9 a 1/27 1/27 b 0, 1 Transition rate: 0 1, 03 0 ?"# = 1/9 1/3, 1/3 Figure 3.1: An illustrative example. Top left: the original MJP. The black numbers next to the arcs are the Li,j?s. Top right: the time-reversed MJP. The black numbers next to the arcs are the L?i,j?s. Bottom: The green numbers next to the arcs are the effective currents. The red and blue numbers next to the states are the forward and backward committor values respectively. our example: the effective current f+1,2 = 4/27 is greater than the total current of 1/9 leaving a and arriving at b (see Fig. 3.1 (bottom)). A similar phenomenon was encountered in the network originating from a budding yeast gene regulatory network (see Chapter 5) and its discovery motivated us to devise a procedure for removing cycles from the graph G(S, {f+}). Moreover, we will propose a frame- work for constructing modified MJPs with desired stationary currents and invariant probability distributions. Propositions 1 and 2 in [17] will be corollaries from our theorem (see Theorem 2 in Section 3.3). 22 3.2 Setup and Assumptions We develop a general framework for modifying the original time-irreversible irreducible MJP to make it have a desired stationary current and a desired invariant distribution. Our construction generalized Propositions 1 and 2 in Ref. [17]. Again, we consider a MJP on a finite state space S with infinitesimal generator L. As in [17], we assume that direct jumps from A to B are impossible, i.e. Li,j = 0 whenever i ? A and j ? B. This allows us to simplify the discussion, however we abandon the other two as- sumptions from [17]. Let A,B ? S, with A ?B = ?. The set of reactive states will be denoted by SR := S \ (A ? B). Suppose we have defined a current e satisfying the following properties: Assumption 1. Non-negativity: ei,j ? 0 ? Assumption 2. The conservation of current: ?i ? SR, (ei,j ? ej,i) = 0. ? ? ? ? j?S Assumption 3. Transition rate: ei,j = ei,j = ?AB where ?AB is the i?A j?S i?S j?B rate from A to B. It should be noted that both the probability current of reactive trajectories, f , and the effective current, f+, satisfy assumptions 1 - 3. We will extract the subset of SR with positive current emanating from them and denote it by R, i.e., R := {i ? SR|?j ? S : ei,j > 0} 23 We will call the states in R the progress states for current e. The rest of the states in SR will be called the left-out states, and their set will be denoted by S0 : S0 = SR \ R. As in [17], we combine all the pieces of non-reactive trajectories in the original MJP into an artificial state we call s. We define the process obtained in this way the transition path process originally introduced in the context of SDEs by Lu and Nolen [34]. This definition ensures all trajectories in the reactive process are reactive trajectories. 3.3 Designing Modified MJP Theorem 2. Suppose that assumptions 1 - 3 hold. Consider the process on the state space S? = R ? {s} defined?by the generator M with off-diagonal entries given by????????? ei,j ?? Mi,j = , i, j ? R?i ????? ? e M i,ji,s = , i ? R (3.1)? ?? i j?B ? ? ? ?M 1 s,j = ? ei,j, j ? R1 ?R i?A where ?R = ?i < 1 and ?i > 0 for all i ? R. Then the desired invariant i?R probability distribution of the transit?ion path process is given by???? ??i = ??i, i ? R??? (3.2)1? ?R, i = s and the stationary current in the network with state space S? and the generator matrix M coincides with the current e in the original network. Proof. 1. We will first show that the invariant distribution in the modified MJP 24 ? is given by ??, that is ??iMij = 0. i?R?{s} Let j ? R. We rewrite the summation as ? ? ??iMij = ??iMij + ??sMsj. i?R?{s} i?R By removing the jth term from the summation we have ? ? ??iMij = ??iMij + ??jMjj + ??sMsj. i?R?{s} i?R i 6=j ? Using the fact that Mjj = ? Mji, we get i 6=j ? ? ? ??iMij = ??iMij ? ??j Mji + ??sMsj. i?R?{s} i?R i=6 j i 6=j Removing the state s from the second summation on the right-hand side gives ? ? ? ??iMij = ??iMij ? ??j Mji ? ??jMjs + ??sMsj. i?R?{s} i?R i=6 j i 6=j i?R Next, we factor out ??j from the middle two terms and rearrange the order. Thus, ? ? ????? ? ? ??iMij = ??iMij + ??sMsj ??j Mji +Mjs? . i?R?{s} i?R i 6=j i=6 j i?R We now rewrite the generator M in terms of the current e and invariant distribution ? using 3.1. Thus, ? ? ? ?eij ? 1 ? ?? e ?ji eji???iMij = ?i + (1 ?R) e? ij ? ?j? + ? .?i 1 ? ? ? i?R?{ R j js} i?R i?A i?R i?B i=6 j i 6=j 25 By simplifying we obtain ? ? ? ? ? ??iMij = eij + eij ? eji ? eji. i?R?{s} i?R i?A i?R i?B i 6=j i=6 j Finally, using assumption 3 for the second and fourth terms and assumption 2 for the first and third terms we have ? ??iMij = ?AB ? ?AB = 0. i?R?{s} Now let j = s. We rewrite the summation as ? ? ??iMij = ??iMis + ??sMss. i?R?{s} i?R? Using the fact that Mss = ? Msi, we get i=6 s ? ? ? ??iMij = ??iMis ? ??s Msi. i?R?{s} i?R i?R Similarly, we rewrite the generator M in terms of the current e and invariant distribution ? using 3.1. Therefore, ? ? ? e ? 1 ?ij ??iMij = ?i ? (1? ?R) eji. ? 1? ? i?R?{s} i?R j? i RB i?R i?A By simplifying we obtain ? ?? ?? ??iMij = eij ? eji. i?R?{s} i?R j?B i?R i?A Finally, using assumption 3 we obtain ? ??iMij = ?AB ? ?AB = 0. i?R?{s} 26 2. Next we show the stationary probability current, Ji,j in the MJP with gener- ator matrix M coincides with the current Ei,j = ei,j ? ej,i for all i, j ? R: Ji,j = ?iMi,j ? ?jMj,i = ei,j ? ej,i = Ei,j Theorem 2 can be used to generate the reactive trajectories which in turn can be analyzed using statistical tools. In Chapter 4, we develop an algorithm for generating a weighted directed acyclic graph G(S, {F+}) where S is the set of vertices inherited from the original MJP, and F+ ? 0 are the weights for the arcs. The current defined by F+ possesses properties 1 - 3 and hence we can design a modified MJP with the desired invariant probability distribution. Next, we show that the transition rate can be computed using an arbitrary cut in the graph G(S, {E+ij}) separating the sets A and B. Here we have used the notation E+ij := max{ei,j ? ej,i, 0}. Cameron and Vanden-Eijnden [17] defined an AB-cut as any partition or cut in the network G(S,E) such that sets A and B are on different sides of the partition. An AB-cut leads to decomposing S such that S = SA ? SB where A ? SA, B ? SB and SA ? SB = ?. As a result, the transition rate, ?AB, can also be expressed in terms of an AB-cut. Theorem 3. The transition rate, ?AB, is given by ? ? ?AB = (eij ? eji) (3.3) i?SA j?SB The proof of this statement is as follows: 27 Proof. For brevity, we will denote Ei,j := ei,j ? ej,i. We will use the fact that for any subset S ? ? S, ? Ei,j = 0. (3.4) i?S?,j?S? Since SA = A ? (SA \ A) and SB = S \ SA we can rewrite the summation as ? ? ? ? Ei,j = Ei,j. i?SA j?SB i?A?(SA\A) j?S\SA By separating the union under the first summation, we obtain ? ? ? ? ? ? Ei,j = Ei,j + Ei,j. i?SA j?SB ?i?A j?S?\SA ? i?SA\A j?S\SA Using the fact that in general = ? , we rewrite the above statement as ? ? ?? ?i?A?\B i?A ?i?B ? ? ? ?? Ei,j = Ei,j? Ei,j+ Ei,j? Ei,j+ Ei,j. i?SA j?SB i?A j?S i?A j?SA i?SA\A j?S i?SA j?SA i?A j?SA We now use the facts that the third summation on the right-hand side above is zero by current conservation and the fourth summation is zero by 3.4. We cancel the second and fifth terms to obtain ? ? ?? Ei,j = Ei,j = ?AB. i?SA j?SB i?A j?S 28 Chapter 4: Generating Acyclic Current In this chapter, we explore the possibility to extend the notion of the reactive current for time-irreversible MJPs. Unfortunately, as it was shown in chapter 3, the signed effective current is not a good candidate because it can be cyclic and some pathways generated according to it can have larger effective current along some of their edges than the transition rate. To reconcile this issue, we propose a cycle removal algorithm and introduce a so-called acyclic current. The acyclic current will enable us to sample no-detour reactive trajectories introduced in [17]. To find the acyclic current, we use two algorithms from graph theory, namely the Depth-First Search algorithm and Tarjan?s algorithm. The details of these algorithms are given in Sections 4.1 and 4.2. Note that we do not call the acyclic current the reactive current because in some cases the acyclic current may be nonunique. We discuss this issue in Section 4.3.1. 4.1 Depth-First Search Depth-first search (DFS) is a recursive algorithm for traversing or searching tree or graph data structures [35]. Depth-first search starts at an undiscovered vertex and recursively proceeds down a path of adjacent vertices. Once the last 29 node in the path no longer has an adjacent vertex or reaches a discovered vertex, the algorithm then backtracks until it finds an unexplored path, and then explores it. This process continues until all the vertices that are reachable from the original source vertex has been discovered. If any undiscovered vertices remain, then depth- first search selects one of them as a new source, and it repeats the search from that source. The algorithm repeats this entire process until it has discovered every vertex. Initialization: Label all vertices in graph G as undiscovered Input: Graph G and a vertex v of G Output: All vertices reachable from v labeled as discovered. The main body: DSF(G, v) Label v as discovered for each vertex u that is adjacent to v do if vertex u is not labeled as discovered then DSF(G, u) end end Algorithm 1: Depth-First Search Algorithm 4.2 Tarjan?s Algorithm Tarjan?s algorithm [36] is designed for finding strongly connected components (SCCs) of a directed graph. A strongly connected component of a directed graph G(V,E) is a subgraph G(V ?, E ?) with maximal set of vertices V ? such that for every pair of vertices u and v in V ?, there is a directed path from u to v and a directed path from v to u. Tarjan?s algorithm implements the depth-first search algorithm (Algorithm 1) to search the graph. Each visited node has two variables associated 30 with it. The first variable, index, indicates the order in which the nodes are visited. Once the index is assigned it will never change. The second variable, lowlink, represents the smallest index reachable from the current node. Each time a visited node is encountered, the lowlink is updated. A stack, S, is used to keep track of the visited nodes. While tracking back from the DFS, if a vertex v?s lowlink equals its index value, it means that none of the vertices v can reach a vertex discovered before v, so all these vertices form a SCC. Thus all vertices up to v need to be popped from the stack. Below is a pseudocode (Algorithm 2) summarizing Tarjan?s algorithm. 4.3 Cycle Removal Algorithm We start with developing an algorithm for generating a weighted directed acyclic graph G(S, {f+}) where S is the set of vertices inherited from the original MJP, and f+ ? 0 is the effective current used for the weights of the arcs. If a MJP is irreversible, the effective current may be cyclic. As a result, the effective current along a single edge along the path from A to B might be larger than the transition rate, ?AB, along some arcs. To ensure that the effective current along a single edge does not exceed the transition rate, we design an algorithm that removes all cycles from the MJP. To find cycles in our graph G(S, {f+}), we apply the DFS algorithm. Once we find a cycle, we subtract the minimum current from every edge in the cycle thus breaking this cycle. We continue this process until no more cycles are found. The resulting current is an acyclic current. 31 Input: Directed graph G(V,E) where V is the set of vertices and E is the edge list. Output: Strongly connected component (SCC) Initialization:?v ? V : index[v] := lowlink[v] := 0 count = 0 S = ? The main body: Tarjan(v) count = count + 1 index[v] := lowlink[v] := count for each (v, w) ? E do if index[w] = 0 then Tarjan(w) lowlink[v] = min(lowlink[v], lowlink[w]) else if w ? S then lowlink[v] = min(lowlink[v], index[w]) end end end if lowlink[v] = index[v] then w := S.pop() add w to SCC while index[w] ? index[v] do w := S.pop() Output the SCC end end Algorithm 2: Tarjan?s Algorithm The computational cost of the cycle removal algorithm depends on the cost of using a depth-first search algorithm. DFS is typically used to traverse an entire graph and takes time O(|V |+|E|) with |V | being the number of vertices in the graph and |E| being the number of edges in the graph. Thus, the worst-case scenario cost of Algorithm 3 is ( ) Cost(Algorithm 3) = O(|E|)Cost(DFS) = O(|V ||E|) +O |E|2 . (4.1) A graph on which the cost of Algorithm 3 will be the worst case scenario case is 32 Input: Weighted directed graph G(S, {f+}) where S is the set of states and f+ is the effective current. Initialization: Set {e} = {f+} The main body: for k = 1, 2, ... do Find cycle, Ck, in G(S, {e}) using DFS algorithm; if Ck is nonempty then (k) emin := min(i?j)?C e ;k i,j for all (i? j) ? Ck do (k) ei,j := ei,j ? emin; end k = k + 1; else Break; end end {F+} := {e}; Output: Weighted directed graph G(S, {F+}) Algorithm 3: Cycle Removal Algorithm: an algorithm for obtaining an acyclic current from the effective current by means of removing cycles. shown in Fig. 4.1. Typically, the cost of Algorithm 3 is much less than Eq. (4.1) since the DFS algorithm terminates as soon as a cycle is found. However, since the number of states in the network under consideration can be large, we are motivated to reduce the cost of this algorithm. To do so, we find the set of strongly connected components (SCCs) of G(S, f+) using Tarjan?s algorithm [36], and then apply Algorithm 3 to each SCC consisting of more than one state separately. Thus, we propose a two-step process for generating an acyclic current. The cost of Tarjan?s algorithm is O(|V |+ |E|). Hence, the cost of our two-step 33 1 2 3 4 . . . N-1 N Figure 4.1: Visual example of worst-case scenario cost of Algorithm 3. The given graph has N nodes and 2(N ? 1) edges. We start Algorithm 3 with vertex 1. With one iteration, Algorithm 3 passes through N nodes and N edges before obtaining a cycle. Algorithm 3 will remove a total of N ? 1 cycles and with each iteration we traverse all N vertices and N edges. Thus, the computational cost for this graph will be (N ? 1)(N +N) = O(|V ||E|) +O(|E|2) 34 Initialization Start with a graph G(S, {f+}) where S is the set of states and f+ is the effective current. The main body Step 1: Find all SCCs. Step 2: Apply cycle removal algorithm to each SCC. Algorithm 4: Two-step process for generating acyclic current: an algorithm for obtaining an acyclic current from the effective current by means of finding the set of SCCs and removing cycles from each of them. process for finding F+ does not exceed ( ( ( { }))) N?SCC + (k) ( ( ( { }))) Cost Tarjan G S, f + Nf+ Cost DFS G S (k), f+,(k) (4.2) k=1 where NSCC is the number of SCCs with more than one state in G(S, {f+}), and G(S(k), {f+,(k)}) is the k-th SCC with more than one state of G(S, {f+}). While either algorithm does give an acyclic current, it may not be unique as we show in Section 4.3.1. Remark. This cycle removal algorithm is closely related to the decomposition of flow vectors to the sum of simple paths (see Bertsekas, 1998 [37]). This algorithm decomposes a graph to a set of simple paths by subtracting the minimum flow found in a given simple cycle and terminates when the graph only contains simple paths. A path is said to be simple if it contains no repeated arcs and no repeated nodes. The outputs of the cycle removal algorithm and the simple path decomposition algorithm are different, but their main bodies resemble. 4.3.1 Non-uniqueness in Cycle Removal Algorithm Once we have introduced the acyclic current as an output of Algorithm 3, a natural question arises: is the acyclic current unique? Unfortunately, the answer is 35 Original MJP Effective current 1 8/165 1 2 1 2 1 1 1 1 8/165 8/165 1 1 1 4/55 4/165 4/55 a 3 4 b a 3 4 b 1 1 8/165 8/165 1 1 5 5 Solution 1 Solution 2 8/165 4/165 1 2 1 2 8/165 8/165 4/165 4/165 4/55 4/55 4/55 4/55 a 3 4 b a 3 4 b 4/165 4/165 8/165 8/165 5 5 Figure 4.2: An example illustrating nonuniqueness of acyclic currents. Top left: The given MJP. The black numbers next to the arcs are the Li,j?s. Top right: The nonzero effective current along each arc is indicated by a blue number. Bottom: The different acyclic currents resulting from the removals of cycles {3, 5, 4} and {1, 2, 4, 3}respectively. no. The acyclic current obtained by Algorithm 3 or 4 may depend on the order in which the cycles have been removed. This is demonstrated by the example in Fig. 4.2. Let us consider a MJP with states a, b, and 5 states in between (Fig. 4.2 (top left)). All nonzero off-diagonal entries of the generator matrix L are written next to the corresponding arcs. Th[e invariant probability dist]ribution in this MJP is 12 1 9 2 8 7 4 ? = , , , , , , . 55 11 55 11 55 55 55 The generator matrix L? of the corresponding time-reversed MJP (defined by Eq. 2.6) is computed. Solving equations 2.7 and 2.8, we find the forward and backward committors: [ ] [ ] + 1 2 1 2 1 4 4 4 1 4q = 0, , , , , , 1 , q? = 1, , , , , , 0 3 3 3 3 3 5 9 5 2 7 36 The probability current of reactive trajectories and the effective current shown in Fig. 4.2 (top right) coincide for this example and are cyclic. The two cycles formed in this example are {1, 2, 4, 3} and {3, 5, 4}. In either case, the minimum current along any arc in either of the cycles is along arc (4, 3). Algorithm 3 finds the acyclic current shown in Fig. 4.2 (bottom left) if it eliminates cycle {3, 5, 4}. It results in the acyclic current in Fig. 4.2 (bottom right) if it eliminates cycle {1, 2, 4, 3}. Thus, applying the cycle removal algorithm to this MJP can give two different currents depending on the enumeration of the vertices. While this may be the case, any acyclic current produced by Algorithm 3 still satisfies the conservation of current property as each state i ? S \ (A ? B). However, due to the non-uniqueness, we only refer to this current as an acyclic current and not the reactive current. 4.4 An illustrative example: a discretized Maier-Stein model We would like to demonstrate the results of our cycle removal methodology on a visual example with a large number of states. To create such an example, we discretize the Maier-Stein SDE [38], ?a nongradie?nt b?istable system, giv?en by: ? ? b 3x(x, y) ? ? x? x ? 10xy2 ? dz = b(z)dt+ dw, b(z) = ?? ?? = ?? ?? , (4.3) by(x, y) ?y(1 + x2) where dw is the standard 2D Brownian motion. We restrict this system to the rectangle ?1.2 ? x ? 1.2, ?0.6 ? y ? 0.6 and assign reflective boundary conditions, i.e. the homogeneous Neumann conditions. The generator matrix L is obtained by 37 discretizing the generator for (4.3) ( ) ? ?  ?2 ?2L := bx(x, y) + by(x, y) + + . (4.4) ?x ?y 2 ?x2 ?y2 Next, we obtain an irreversible MJP by discretizing this generator by finite differ- ences on an N ?N mesh. The nonzero off-diagonal entries of L are 1  L(i,j),(i+1,j) = bx(xi, yj) + , 2 ? i ? N ? 1, 1 ? j ? N, 2hx 2h2x 1  L(1,j),(2+1,j) = bx(xi, yj) + , 1 ? j ? N, 2hx h2x 1  L(i,j),(i?1,j) = ?bx(xi, yj) + , 2 ? i ? N ? 1, 1 ? j ? N, 2h 2h2x x 1  L(N,j),(N?1,j) = ?bx(xi, yj) + , 1 ? j ? N, (4.5) 2h h2x x 1  L(i,j),(i,j+1) = by(xi, yj) + , 1 ? i ? N, 2 ? j ? N ? 1, 2h 2y 2hy 1  L(i,1),(i,2) = by(xi, yj) + , 1 ? i ? N, 2h 2y hy 1  L(i,j),(i,j?1) = ?by(xi, yj) + , 1 ? i ? N, 2 ? j ? N ? 1, 2h 2y 2hy L(i,N),(i,N?1) = ? 1  by(xi, yj) + , 1 ? i ? N. 2h h2y y We have chosen N = 70 and  = 0.3. This combination of parameter values sat- isfies the requirements that (i)  is small enough so that the invariant probability distribution ? is bimodal (Fig. 4.3(a)), (ii), for this , N is large enough so that the off-diagonal entries of L are nonnegative, and (iii) N is small enough to make the mesh visual. The subsets A and B are the mesh points lying within the circles of radius 0.2 centered at (?0.8, 0), respectively. Three reactive trajectories are displayed in Fig. 4.3(b). We see that they tend to wonder around the set of reactive states SR for a long time prior to reaching 38 B. The probability density of reactive trajectories is shown by the contour plots. The graph G (S, {f+}) induced by the effective current contains four SCCs shown in Fig. 4.3(c). These SCCs were found using Tarjan?s algorithm. We have computed an acyclic current F+ using Algorithm 4. The graph G (S, {F+}) is depicted in Fig. 4.3(d), while the intensity of F+ is displayed in Fig. 4.3(e). Ten samples of no-detour reactive trajectories are shown in Fig. 4.3(f). Remark. This example shows that, contrary to the time-reversible case, neither the forward committor increase nor the backward committor decrease along no-detour reactive trajectories. Zoom in Fig. 4.3(d) and observe that (i) directed up blue arcs near x = y = ?0.4 crossing the level set q+ = 0.3 of the forward committor from larger values to smaller values, and (ii) directed left magenta arcs near x = 0.5, y = ?0.6 crossing the level set q? = 0.1 from smaller values to larger values. Therefore, for time-irreversible MJPs, neither committor serves a good reaction coordinate. 39 Invariant probability distribution 10-4 0.6 3.5 0.4 3 0.2 2.5 0 A B 2 1.5 -0.2 1 -0.4 0.5 -0.6 -1 -0.5 0 0.5 1 (a) x (b) Strongly connected components Acyclic jump chain 0.6 0.4 0.4 0.2 0.2 0 A B 0 A B -0.2 -0.2 -0.4 -0.4 -0.6 -0.6 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 (c) (d) x Acyclic current 10-3 No-detour reactive trajectories 3 0.4 2.5 0.4 0.2 2 0.2 0 A B 1.5 0 A B -0.2 1 -0.2 -0.4 0.5 -0.4 -0.6 0 -0.6 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 (e) x (f) x Figure 4.3: (a): The invariant probability distribution ? for the Maier-Stein MJP with the generator defined by (4.5). (b): Three reactive trajectories and the level sets of the probability density of reactive trajectories. (c): The four strongly connected components of the graph G (S, {f+}) are shown by red, blue, purple, and yellow. (d): An acyclic graph G (S, {F+}). The arcs directed right, left, up, and down are shown by red, magenta, blue, and green arrows respectively. The white areas outside A and B contain the left-out states. The red curves are the level sets of the forward committor, while the black ones are those of the backward committor. (e): The intensity of the acyclic current F+. (f): Ten no-detour reactive trajectories sampled in the graph G (S, {F+}). 40 y y y y Chapter 5: Application to Gene Regulatory Network (GRN) The theoretical and algorithmic developments in Chapters 3 and 4 were mo- tivated by our desire to analyze the dynamics of a stochastic budding yeast gene regulatory network (GRN). Now we will apply this machinery to check which edges of the GRN are essential, and which ones are redundant. In particular, to better understand the effect of stochasticity, we first answer this question for the corre- sponding deterministic GRN. We start this chapter with presenting a background on the cell cycle and the GRNs. 5.1 Background 5.1.1 Cell Cycle The cell cycle is a process in which one cell grows leading to the division into two cells. A cell must duplicate all of its components such that the two cells which derive after division each have the information and machinery necessary to repeat the process [25]. The cell cycle consists of four phases. During the first growth phase, known as G1, the cell grows and performs its usual functions. Synthesis or the S phase consists of the DNA in the cell being synthesized and the chromosomes copied. 41 G2 M S G1 Figure 5.1: Schematic of budding yeast cell cycle. This is followed by the gap phase, G2, in which the cell prepares for mitosis. Col- lectively these phases are known as interphase. Finally, during mitosis, or M phase, the chromosomes are separated and the cell divides into two identical cells (see Fig. 5.1). Mitosis can be divided into four subphases: prophase, metaphase, anaphase, and telophase. Chromosomes condense during prophase, align during metaphase, separate during anaphase, and decondense during telophase. After mitosis, each of the new cells enters its own G1 phase and the process repeats. The cell cycle of budding yeast has been under investigation in many various settings [1,2,25,39,40]. Researchers find budding yeast to be particularly interesting because (1) a great deal is known about the molecular machinery regulating the events of the budding yeast cell cycle, (2) comprehensive mathematical models of the budding yeast cell cycle have been built and studied, (3) the genes and pathways for cell cycle control seem to be very similar in all eukaryotes, including mammals, and (4) the cycle of DNA replication, mitosis, and cell division is crucial to all aspects of biological growth, development, and reproduction [41]. The budding yeast cells are also interesting in that they divide asymmetrically creating a mother and daughter cell after division. Shortly after division, the mother cell will begin a new division 42 process. However, the daughter cell must first grow to a critical size [25]. The cell cycle is mainly controlled by a family of proteins called cyclin-dependent kinases (or CDKs). CDKs interact with proteins called cyclins that are oscillatorily expressed during the cell cycle. The main CDK in yeast is coded by the gene Cdc28. Cdc28 can form a complex with two families of cyclins known as CLNs and CLBs. Once a cell reaches a certain size, a CLN called Cln3 is released and activates Cdc28, initiating all of the processes that lead cell cycle through G1 and into S phase. The rising activity of the Clb1,2/Cdc28 complex induces mitosis. The proper completion of mitosis is facilitated by Cdc20 [25,40]. There are several checkpoints during the cell cycle of any organism to ensure that the next phase will not occur until the previous one has completed. For a yeast cell, the cell cycle halts at two major checkpoints: (i) the G1 checkpoint if DNA damage is detected or the cell has not reached the critical size or (ii) the spindle assembly checkpoint if DNA damage is detected, DNA is not replicated completely, or chromosomes are not aligned on the metaphase plate [25, 40]. The models considered here have chosen to include only the G1 checkpoint governed by cell size for simplicity. 5.1.2 Gene Regulatory Network (GRN) Chen et al. (2000) [25] developed a kinetic model for the cell cycle of budding yeast. This model consisted of 10 nonlinear ODEs, three algebraic equations, and a rule for separating cells at division. There are ?800 genes involved in the cell cycle 43 Cell Size Cln3 SBF MBF Sic1 Cln1,2 Clb5,6 Cdh1 Clb1,2 Mcm1/SFF Cdc20/Cdc14 Swi5 Figure 5.2: Gene regulatory network of budding yeast cell cycle. of the budding yeast. However, the number of key regulators that are responsible for the control and regulation of this complex process is much smaller. While this model proposed a realistic mechanism for regulating cell division, it involved approximately 50 parameters that needed to be determined to fit experimental observations. In the model, overall cell growth is exponential and the main phases of the budding yeast division cycle are driven by including activities of cyclin-dependent kinases. The 50 parameters used are estimated by trial and error. As a result, [25] only claim the equations and parameter set for the model are sufficient to account for the many properties of cell cycle control. Testing the model on different genotypes of budding yeast, give enough data to provide meaningful confirmation of the model. However, it is not for certain the parameter set is optimal nor do they quantify the robustness in the system. Li et al. (2004) [1] dramatically simplified the kinetic model proposed by [25] by converting the system of ODEs to a gene regulatory network (GRN). This eliminated the need to estimate the unknown parameters. The GRN consisting of only 11 proteins/ protein complexes was constructed based on literature studies 44 [39, 41, 42]. A visual of this GRN is shown in Figure 5.2. There are three classes of nodes in the regulatory network: ? Cyclins (Cln1,2, Cln3, Clb1,2, Clb5,6, which bind to the kinase Cdc28); ? Inhibitors and competitors of the cyclin complexes (Sic1, Cdh1, Cdc20, Cdc14); ? Transcription factors (SBF, MBF, Mcm1, SFF, Swi5). There are three types of edges in the GRN: ? Activation (represented by green arrows); ? Repression (represented by red arrows); ? Self-degradation (represented by self-loops). There are n = 11 nodes in the budding yeast GRN [1]. Let s ? S denote the state of the GRN with S being the set of all possible states of the GRN. Each node i can be in one of two states, active or passive, denoted by si = 1 or si = 0 respectively. Thus, s is a vector of 11 components and |S| = 211 = 2048. The dynamics of the gene regulatory network is defined by (i) the influence matrix A (5.1) and (ii) the transition rule (5.3). The 11 ? 11 influence matrix A = (ai,j) is defined as follows: ??????????1, if there is green arrow from j to i ai,j = ???????1, if there is red arrow from j to i (5.1) ???0, otherwise 45 Table 5.1: The 11 different protein/ protein complexes, node identification and their function. Protein/ Node# Function Protein complexes Cln3 1 G1-cyclins initiating Start event. SBF 2 Transcription factor for Cln1,2. MBF 3 Transcription factor for Clb5,6. Cln1,2 4 Cyclins involved in budding. Sic1 5 Stoichiometric inhibitor of Cdc28/Clb2 and Cdc28/Clb5. Clb5,6 6 B-type cyclins appearing late in G1, involved in DNA synthesis. Cdh1 7 Activator of the APC; protein involved in Clb2 proteolysis. Clb1,2 8 B-type cyclin essential for mitosis, present in S/G2/M phase. Mcm1/SFF 9 Transcription factor for Clb2, Cdc20 and Swi5. Cdc20/Cdc14 10 Activator of the APC; protein involved in Clb2, Clb5 and proteolysis, and required for exit from mitosis. Phosphatase required for exit of mitosis. Swi5 11 Transcription factor for Sic1. For each state s ? S, we compute the influence vector v := As (5.2) and define transition of the node state at the next time step t + 1 by the following rule. ??? ???????1, if vi > 0 si(t+ 1) = ??????0, if v (5.3) i < 0 ???si(t), if vi = 0 where t is a non-negative integer and time steps are discrete. Throughout this dissertation, this model will be referred to as the deterministic model. The temporal evolution of the protein states, presented in Table 5.2, follows the cell-cycle sequence. Highlighted in each row are the necessary protein/ protein complexes that must be activated in order to be in that particular phase as described 46 Table 5.2: Protein state over time determined by the deterministic model reproduced from Li et al. [1]. Colors denote the different phases of the biological pathway. Row indicates the time step, Phase indicates the cell-cycle phase, and the remaining columns represent the protein state at the given time. Row Cln3 SBF MBF Cln1,2 Sic1 Clb5,6 Cdh1 Clb1,2 Mcm1 Cdc20 Swi5 Phase SFF Cdc14 1 1 0 0 0 1 0 1 0 0 0 0 START 2 0 1 1 0 1 0 1 0 0 0 0 G1 3 0 1 1 1 1 0 1 0 0 0 0 G1 4 0 1 1 1 0 0 0 0 0 0 0 G1 5 0 1 1 1 0 1 0 0 0 0 0 S 6 0 1 1 1 0 1 0 1 1 0 0 G2 7 0 0 0 1 0 1 0 1 1 1 0 M 8 0 0 0 0 0 0 0 1 1 1 1 M 9 0 0 0 0 1 0 0 1 1 1 1 M 10 0 0 0 0 1 0 0 0 1 1 1 M 11 0 0 0 0 1 0 1 0 0 1 1 M 12 0 0 0 0 1 0 1 0 0 0 1 G1 13 0 0 0 0 1 0 1 0 0 0 0 G1* in Section 5.1.1. The cell cycle starts with a signal from ?cell size? that excites the stationary growth state G1 resulting in the state called ?START?. The states of the 11 nodes in the GRN at the START state are listed in row 1 of Table 5.2. The cell cycle transforms back to the stationary G1 state, denoted as G1* in row 13 of Table 5.2, through a sequence of states determined by the deterministic model. We will refer to this temporal evolution as the biological pathway. The phases of the biological pathway are denoted with different colors: purple, the start of the cell cycle; blue, the other G1 states; green, the S state; yellow, the G2 state; and red, the M states. The main result of [1] is that with the use of a simple deterministic model, the GRN representing the budding yeast cell cycle is stable and robust. The biological stationary state G1* is the attractor with the largest basin size. This property is largely preserved with respect to small perturbations to the dynamic trajectories of 47 all 1,764 protein states that flow into the stationary state G1*. Particularly, when either removing edges, adding edges or switching edges from activating to inhibiting (or vice versa), the relative changes of the basin size of the biggest attractor are small (Fig. 4 in [1]). This was also compared to ones obtained from the ensemble of random networks. While Li et. al. looked at the effect of edge deletion to the basin size of the largest attractor, they left unaddressed the exact changes to the biological pathway caused by these perturbations. 5.1.3 Dynamical Network For any GRN with n nodes, the dynamical network is the graph with 2n vertices representing all possible states, and edges representing transitions between the states. Since the evolution of the network over time has the Markov property, edges in the dynamical network define a transition probability of a Markov chain. The dynamical network gives us a way to quantify most likely transitions between cell states which is the key idea behind TPT. In order to understand the dynamical network of the deterministic GRN we have reproduced the results from [1]. Since we are looking at the deterministic model of this GRN, each cell state in our dynamical network has a unique outgoing arc, i.e. there is a unique possible transition from each state. Hence, the dynamical network is a directed graph whose connected components have at most one loop. Its exploration by the DFS algorithm shows that this graph is forest-like consisting of 7 subtrees. 48 Figure 5.3: The dynamical network for the deterministic GRN that matches the network in Fig. 2 in [1]. The pathway corresponding to the cell cycle from the START state to G1* are shown with large dots. Their colors correspond to different phases of the cell cycle: purple, START; blue, other G1 states; green, S state; yellow, G2 state; and red, M states. The results of the DFS algorithm are shown in Fig. 5.3. Each cell state is represented by a black node, with the arrows between them indicating dynamic flows from one state to another. The phases of the biological pathway are denoted with different colors: purple, START state; blue, the other G1 states on the biological pathway; green, the S state; yellow, the G2 state; red, the M states. The discrete-time dynamics of the gene regulatory network defined by the deterministic rules in [1] has seven attractors (stationary states) of the network. Hence, all of the initial states eventually flow into one of the seven stationary states shown in Table 5.3. Dynamic trajectories starting from 1764 out of the 2048 possible states, end up at one particular attractor, G1*. The cell?s stationary state being 49 Table 5.3: Fixed points and basin size determined by deterministic model. Each fixed point is found on a given row. Sink # is the row index, Size is the size of the basin of attraction, and the remaining columns represent the protein state of the fixed point. This table coincides with Table 1 from [1] as it should. Sink # Cln3 SBF MBF Cln1,2 Sic1 Clb5,6 Cdh1 Clb1,2 Mcm1 Cdc20 Swi5 Size SFF Cdc14 1 0 0 0 0 0 0 0 0 0 0 0 7 2 0 1 0 1 0 0 0 0 0 0 0 151 3 0 0 0 0 1 0 0 0 0 0 0 9 4 0 0 1 0 1 0 0 0 0 0 0 7 5 0 0 0 0 0 0 1 0 0 0 0 1 6 0 0 0 0 1 0 1 0 0 0 0 1764 7 0 0 1 0 1 0 1 0 0 0 0 109 the most common attractor of the network insures the stability of the cell state is guaranteed. While this model accurately depicts the cell cycle, it does not capture the intrinsic randomness of this biological process. Thus, a stochastic model was introduced. 5.1.4 Stochastic model of GRN Zhang et al. (2006) [2] modified the deterministic model [1] by making the transition rules stochastic. We will refer to this model as the stochastic model or stochastic GRN. The stochastic model builds upon the deterministic model by tak- ing into account certain degrees of unpredictability or randomness that may happen due to the environment and allows for the model to self-organize. Stochasticity plays a fundamental role in biological processes. At the intracellular level for exam- ple, randomness can be caused by low copy numbers of chemical reactants and an inhomogeneous distribution of the chemical reactants inside the cell [43]. This ran- domness occurring in biological processes justifies the use of a stochastic approach 50 to investigate them. As in the deterministic model, the time steps here are logic steps that depend on the current state of each node, si, rather than depict actual times. We com- pute the influence vector v as defined by Eq.(5.2). Under these assumptions, the transition probability is given by: ?11 P{s1(t+ 1), ..., s11(t+ 1)|s1(t), ..., s11(t)} = P{si(t+ 1)|s1(t), ..., s11(t)} (5.4) i=1 where if vi 6= 0 e?vi P{si(t+ 1) = 1|s1(t), ..., s11(t)} = ? (5.5)e?vi + e ?vi ??vi P{ esi(t+ 1) = 0|s1(t), ..., s11(t)} = (5.6) e?vi + e??vi if vi = 0 P{ 1si(t+ 1) = si(t)|s1(t), ..., s11(t)} = ? (5.7)1 + e ? As opposed to the deterministic model, in this model if the protein i has a self-degradation loop, aii = ?0.1 as in [2]. The positive number ? in the system is an inverse temperature-like parameter which accounts for random factors coming from environment. The positive number ? is used to characterize the stochasticity in the system when the input to a node is zero. This parameter is important in controlling the likelihood for a protein to maintain its state when there is no input to it from surrounding nodes. Note that, when ?, ? ? ?, these transition rules converge to the deterministic rules in [1]. The main result of [2] is that in addition to the dynamical and structural sta- bility of the GRN found by [1], the GRN is also stable against stochastic fluctuations 51 for a wide range of ?temperatures? characterized by the parameter ?. For large ? values, which corresponds to low ?temperatures?, the stationary state G1* is the most probable state of the system. For small ? values or high ?temperatures?, the system behaves randomly and cannot carry out the main biological function. This transition temperature is found at around ? ? 1.03. While they found values for ? for the model to be stable, they did not look at how perturbations in the GRN, particularly through edge deletion, effects the main biological function of the system. 5.2 A ?Mutation Analysis? of the Deterministic GRN 5.2.1 Identifying Redundant Edges We?ve developed the following algorithm in order to identify redundant edges in the GRN and test the robustness of the model. We distort the GRN shown in Fig. 5.2 by removing one edge and adjust the influence matrix A defined by Eq.(5.1). We apply the transition rules of Eq.(5.3) to obtain the dynamical network and explore it using the DFS. We keep track of the number and sizes of the subtrees of the dynamical graph and check if there is a pathway corresponding to the cell cycle going from START to G1*. If such a pathway exists, we compare it to the one in the original dynamical network and mark changes in it if any. We then repeat this process for all edges of the regulatory network in order to determine which edges can be removed without destroying the cell cycle. As shown in Table 5.4, we have established that 11 of the 34 edges of the GRN in Fig. 5.2 can be removed without destroying the cell cycle. The removal of 52 Initialization Start with the influence matrix A for the GRN defined by (5.1). Each nonzero entry in A corresponds to an edge in the GRN. Result: List of pathways from any node in the dynamical network to a stationary state. The main body for each edge in the GRN do Step 1: Make the corresponding entry a 0 in the influence matrix A. Step 2: Update the transition rules defined by (5.3). Step 3: Run DSF algorithm to find all pathways from ?START? to G1* carrying at least 20% of the normalized current. end Algorithm 5: Edge Removal Algorithm for Deterministic GRN. these edges will have little to no effect on the main cycle described in Table 5.2 and may only affect the number of attractors and number of states in their basins. One reason for the little to no effect is due to the fact that the roles of certain proteins tend to overlap. Particularly, ones in the families of cyclins (CLNs and CLBs). Any one of the pairs of cyclins (Cln1,2, Clb1,2, Clb5,6) can do the essential jobs of the other two if the cell is large enough. For example, Clb1,2 can trigger DNA synthesis in the absence of Clb5,6 [25]. This explains why the removal of edges which have no effect all involve at least one member of the family of cyclins. Removal of edge Mcm1/SFF? Clb1,2 and edge Mcm1/SFF? Cdc20/Cdc14 both involve reducing the number of steps in the mitotic phase. Hence the removal of these edges does not destroy the cell cycle but in fact decreases the number of time steps in the cell cycle. The removal of the remaining edges in Table 5.4 all have a slight change in one or two of the rows but all still have the necessary proteins active for each of the cell cycle phases. Thus, the removal of any of these edges does not effect the overall biological pathway. 53 Table 5.4: The edges of the gene regulatory network that can be removed without destroying the cell cycle modeled by the deterministic model. Nsinks represents the number of attractors and NG1? represents the number of states converging to the stationary state G1*. The effect on the cell cycle corresponds to differences in protein state over time shown in Table 5.2 and s refers to the cell state (i.e. si = 1 or 0 if protein i is active or inactive). ?Row? corresponds to the row in Table 5.2 and we show how a row is modified according to the edge removal. Removed edge Effect on the main cycle Nsinks NG1? 1 Clb5,6 ? Sic1 No effect 7 1747 2 Clb5,6 ? Cdh1 No effect 8 1738 3 Clb5,6 ? Mcm1/SFF Row 6: s = [01110101000] 7 1734 4 Cdh1 ? Clb1,2 No effect 7 1819 5 Clb1,2 ? Sic1 No effect 7 1761 6 Clb1,2 ? Cdh1 Row 9: s = [00001011111] 7 1760 Row 10: s = [00001010111] 7 Clb1,2 ? Mcm1/SFF Row 9: s = [00001001011] 7 1635 Row 10: s = [00001000010] Skips Row 11 8 Clb1,2 ? Cdc20/Cdc14 No effect 7 1814 9 Clb1,2 ? Swi5 Row 7: s = [00010101111] 7 1764 10 Mcm1/SFF ? Clb1,2 Skips Row 9 8 1567 11 Mcm1/SFF ? Cdc20/Cdc14 Skips Row 11 7 1650 54 Table 5.5: The edges of the gene regulatory network that when removed do not include all phases of the cell cycle when modeled by the deterministic model. Nsinks represents the number of attractors and NG1? represents the number of states con- verging to the stationary state G1*. The effect on the cell cycle corresponds to differences in protein state over time shown in Table 5.2 and s refers to the cell state (i.e. si = 1 or 0 if protein i is active or inactive). ?Row? corresponds to the row in Table 5.2 and we show how a row is modified according to the edge removal. Removed edge Effect on the main cycle Nsinks NG1? 1 Cln1,2 ? Cdh1 Between Row 3 and Row 8: 9 1755 s = [01110010000] s = [01110110000] s = [01110100100] s = [01110101111] s = [00010101111] 2 Sic1 ? Clb5,6 Between Row 2 and Row 8: 8 1836 s = [01111110000] s = [01110100100] s = [01110101111] s = [00010101111] 3 Clb5,6 ? Clb1,2 Between Row 5 and Row 8: 8 1585 s = [01110100100] s = [01110101111] s = [00010101111] 55 Table 5.5 shows the 3 edges that when removed lead to the skip of the gap phase G2. While the cell states when modeled by the deterministic model still evolve from START to G1*, the states do not go through all phases of the cell cycle as described in Section 5.1.1. In the removal of each of these edges, the cell goes from G1, to S phase, to M phase and finally to stationary G1* but never goes through G2. Specifically, the protein Cdc20/Cdc14 becomes active sooner than expected which causes the cell to go directly from synthesis to mitosis. All other proteins do become active in the correct sequence. Removal of any of the other 20 edges results in the absence of the main cycle. This means that starting from any state s, the pathway including the START state does not reach G1*. For example, the removal of the edge Cln1,2 ? Cln1,2 in particular results in a biological pathway consisting of G1 states, S state, G2 state, and stopping at M state. This would indicate that the cell grows, stops in mitosis and never reaches G1* in order to restart the cell cycle. Since the inactive Cln1,2 protein is essential for completion of mitosis and has no neighboring proteins to repress its expression level, it forces the cell to stay in mitosis. 5.3 A ?Mutation Analysis? of the Stochastic GRN 5.3.1 Exploring Dynamical Network The stochastic transition rules (Eqs.(5.4)-(5.5)) create a dense transition ma- trix P . This results in a complete dynamical network with the degree of each vertex being 2048 and exploring the graph using DFS would take too long to search in its 56 Sic1, Cdh1 Sic1, Cdh1, Mcm1, Cdc20, Swi5 Sic1, Cdh1, Swi5 99% 24% 98% 24% Sic1, Cdh1, 73% Cdc20, Swi5 96% Sic1, Mcm1, Cdc20, Swi5 Cln1,2, Clb1,2, Mcm1, Cdc20, Swi5 23% 9 Sic1, Clb1,2, Mcm1, Cdc20, Swi57% 22% 7 Clb1,2, Mcm1, Cdc20, Swi50% 91% Cln1,2, Clb5,6, Clb1,2, Mcm1, Cdc20 91 SBF, MBF, Cln1,2, Clb5,6, Clb1,2, Mcm1% 90% SBF, MBF, Cln1,2, Clb5,6 85% SBF, MBF, Cln1,2 71% SBF, MBF, Cln1,2, Sic1, Cdh1 74% SBF, MBF, Sic1, Cdh1 Cln3, Sic1, Cdh1 Figure 5.4: Acyclic current through cell cycle for ? = 5, ? = 6 using stochastic model. Using Algorithm 3, 284,460 cycles were removed. As in Table 5.2, the phases of the biological pathway are denoted with different colors: purple, the start of the cell cycle; blue, the other G1 states; green, the S state; yellow, the G2 state; and red, the M states. entirety. We want to extract only the most likely transitions to occur that give all phases of the cell cycle. Thus, to analyze the dynamical network, we use the tran- sition path theory (TPT) (Section 2.3) to single out two specific sets of nodes and analyze the statistical properties of the reactive trajectories by which transitions between these sets occur. We first preprocess the dynamical network to reduce the computational ef- forts. While the transition matrix of the dynamical network is dense, most of the pairwise rates have extremely low probabilities. This is because even after introduc- ing stochasticity to the model, transition between certain cell states are still very unlikely to occur. We eliminate all probabilities less than 10?14 which leads to a 57 .22 .59 .70 6 .26 .54 .66 5 .29 .50 .62 4 ? .23 .43 .563 .32 .43 2 1 0 0 1 2 3 4 5 ? Figure 5.5: The minimax current in pathways from START to G1* for various ? and ? using stochastic model. Blue dots: pathway carrying a minimum of 20% of the current has been produced. Red dots: no pathways carrying a minimum of 20% was produced. The numbers near the blue dots indicate the minimax current for that particular ? and ?. 58 better conditioning of the generator matrix. This decreases the number of edges in our dynamical network from 4,194,304 to 655,195. We compute the necessary quantities in order to use TPT to analyze the network. Let the set A consist of the single START state and let the set B con- sist of the single G1* state. We compute the forward and backward committors, q+ = (q+) ? ?i i?S and q = (qi )i?S. Using the committor functions we compute the probability current of the reactive trajectories, fi,j, and the effective current, f + i,j. The effective current is cyclic and thus we use the cycle removal algorithm (Algo- rithm 3) to compute an acyclic current. Tarjan?s algorithm showed that the network G(S, {f+}) consists of only 3 SCCs, namely: excited G1 state; stationary G1* state; and all remaining 2046 states. Hence, we cannot reduce the cost of Algorithm 3 by preprocessing its input with Tarjan?s Algorithm. Finally, we use DFS on the graph G(S, {e+}), where eij is an acyclic current carried from i to j, in order to find pathways only from START to G1*. We normalize the current by dividing each entry by the total current leaving A. This ensures that the total current leaving A sums to 1. We refer to this as the normalized current. Since the graph G(S, {e+}) has many pathways from START to G1*, we chose a threshold of 20% to extract only those pathways that carry significant amount of the normalized current. Using a threshold less than 20% gave many pathways and made it difficult to account for differences between the pathways. Figure 5.4 shows the pathway of cell states that carries at least 20% of the normalized acyclic current for parameters ? = 5 and ? = 6 in Eq.(5.5). Parameter 59 values ? = 5 and ? = 6 were chosen in order to compare our results to that of Fig. 3 in [2]. We removed 284,460 cycles from the graph G(S, {f+}) using Algorithm 3 to obtain an acyclic current. Figure 5.4 shows that the pathway in which most of the current is carried is the biological pathway of the cell cycle shown in Table 5.2. Figure 5.5 gives a table of the minimax current for various ? and ? values. For a given ? and ?, we find the minimax current by finding the minimum current in a particular pathway, then finding the maximum min current from all pathways. For larger ? and ?, the minimax current is also larger. This is as expected since again, as ? and ? ? ?, the stochastic model recovers the deterministic model and most of the current will be carried along the main biological pathway (Table 5.2). 5.3.2 Addressing the Issue of Nonuniqueness of the Acyclic Current. Section 4.3.1 illustrates an example where Algorithm 3 does not give a unique solution for the acyclic current. To reiterate, the acyclic current obtained by Al- gorithm 3 depends on the order in which the cycles are removed. In this section, we investigate the issue of nonuniqueness by applying Algorithm 3 to our stochastic dynamical network where we randomly change the order in which the cycles are removed. Parameter values ? = 5 and ? = 6 were chosen in order to compare with Fig. 5.4. These values were used primarily by Zhang et al. [2] as they are large enough so that the stochastic model behaves closely to the deterministic model while still adding some stochasticity. In order to remove cycles in a different order, 60 (0.99, 0.00) (0.99, 0.0070) (0.76, 0.0232) (0.98, 0.0095) (0.98, 0.0099) (0.73, 0.0165) (0.92, 0.0123) (0.92, 0.0162) (0.91, 0.0166) (0.85, 0.0067) (0.71, 0.0071 ( )0.74, 0.00) Figure 5.6: Biological pathway found after 20 iterations of removing cycles in dif- ferent order. The values on each edge are the mean and standard deviation of the acyclic current respectively. we randomly permute the indexes of the states of the dynamical network. Using a permutation matrix Q, we compute f+ T +permuted = Q f Q. We then run our cycle removal algorithm on f+permuted, obtain our original enumera- tion using f+ = Qf+ TpermutedQ , and run depth first search to find pathways. This was repeated 10 times. Figure 5.6 illustrates the result of this nonuniqueness analysis. For each ran- dom permutation of rows and columns, the main biological pathway was found. Next to each edge in the main biological pathway is the mean and standard devi- ation of currents through each edge over all permutations. The largest standard 61 99% <= 25% <= 24% 98% - 100% 74% - 79 9 %6 <= 24% % - 10 9 06 %% <= 22% - 99% 70% - 74 8 %9% - 9 9 03 % - 8 % 98 4% % - 9 8 23 %% - 8 6 59 %% - 71% 74% Figure 5.7: All pathways from START to G1* found after 20 iterations of removing cycles in different orders. The values on each edge are the range of acyclic current values found (minimum value - maximum value) after applying Algorithm 3 on the permuted network. 62 0.99, 0 1 .9.0 92 , , 0 0. Effective current, Acyclic current, Difference 98, 0.04 (from original enumeration) 0.79, 0.73 [0, .02, 1 0. . 0. 0 0 04 6 4, ] 0.96 [, 0 0 , 1. . 0.05 0 05 8 ] , 0.97, [ 00 .. 00 4, 0 0. 8 .7 04 8, ] 0.70 [0, .0 0. 60 ,4 00 .0.9 94 ], 0.9 [01 ,, 00 .. 040 ].9 03 3, 0.91 [ 0 , 0 0 , 0. .9 02 .0, 2 4] 0.9 [ [Range of differences] 0, 0 0.0 , 2 0.04] (from permutations)0.85, 0.8 [5 0, , 0.04 0 0 ].71, 0. [0 7 0 1 , 0.7 .04 , 2, 0 ]0.74, [ 00 , 0.02] Figure 5.8: All pathways from START to G1* found after 20 iterations of removing cycles in different orders. The values on left each edge are the effective current (green), acyclic current (black), and the difference (red) found from the original enumeration. On the right of each edge is the range of differences between the effective current and the acyclic current from the 20 iterations (red). 63 deviation value being 0.0232. All pathways from START to G1* found after 10 iter- ations of randomly permuting the indexes of the states is shown in Figure 5.7. Each edge shows the range of acyclic current values that were found from the different permutations. While the order the cycles are removed changes the acyclic current, in any permutation the main biological pathway was still found. None of the acyclic currents found deviated significantly from the acyclic current found in the original enumeration (Fig. 5.4). 5.3.3 Identifying Redundant Edges The algorithm developed for identifying redundant edges in the stochastic GRN is similar to Algorithm 5 for the deterministic model except for the TPT tools and the computation of the acyclic current that are used to analyze GRNs with missing edges instead of solely the DFS algorithm. We distort the GRN shown in Fig. 5.2 by removing one single edge and adjust the influence matrix A defined by Eq.(5.1). We apply the stochastic transition rules (Eqs.(5.4)-(5.5)) to obtain the transition matrix. We preprocess the stochastic matrix by zeroing out all entries that are less than 10?14 and compute the effective current. Finally, we use our cycle removal algorithm to obtain the acyclic current and explore the resulting graph G(S, {e+}). We keep track of pathways from START to G1* that carry a significant amount of current (at least 20% of the normalized current). If such a pathway exists, we compare it to the one in the original dynamical network and mark changes in it if any. We then repeat this process for all 34 edges of the regulatory network in 64 order to determine which edges can be removed without destroying the cell cycle. Input: Cut-off for the transition probability in P Initialization Start with the influence matrix A for the GRN defined by (5.1). Each nonzero entry in A corresponds to an edge in the GRN. Output: List of pathways from START to G1* carrying a minimum of 20% of the normalized current. The main body for each edge in the GRN do Step 1: Make the corresponding entry a 0 in the influence matrix A. Step 2: Update the stochastic transition rules defined by (5.4)-(5.7). Obtain matrix P . Step 3: Remove all entries in P less than cut-off. Step 4: Compute effective current, f+i,j, defined by (2.11). Step 5: Run cycle removal algorithm (Algorithm 3) to obtain acyclic current. Step 6: Run DFS algorithm to find all pathways carrying at least 20% of the normalized acyclic current. end Algorithm 6: Edge Removal Algorithm for Stochastic GRN As shown in Table 5.6, we have established that 26 of the 34 edges in the GRN can be removed from the network without destroying the cell cycle described by the stochastic GRN. This means the removal of these edges will have little to no effect on the biological pathway described in Table 5.2. 6 of the remaining 8 edges can be removed and a pathway from A to B can be achieved but does not follow the biological pathway (Table 5.7). The removal of only 2 of the edges, Cln3 ? SBF and SBF? Cln1,2, results in no path found carrying at least 20% of the normalized acyclic current. Thus, even when stochasticity is added to the model, no meaningful pathway is produced in the removal of these edges. These results suggest that these edges are essential to the survival of the cell cycle. 65 Table 5.6: The edges of the gene regulatory network that can be removed without destroying the cell cycle modeled by the stochastic model [2]. All pathways carrying a minimum of 20% of the current was found after each removal in the GRN. Parameter values ? = 5 and ? = 6 were used. The dominant pathway (pathway carrying most current) in each case was compared to the biological pathway described in Table 5.2. The effect on the main cycle corresponds to differences in protein state over time show in Table 5.2 and s refers to the cell state (i.e. si = 1 or 0 if protein i is active or inactive). ?Row? corresponds to the row in Table 5.2 and we show how a row is modified according to the edge removal. Removed edge Effect on the main cycle 1 Cln3 ? Cln3 Row 2: s = [11101010000] Row 3: s = [11111010000] Row 4: s = [11110000000] Row 5: s = [11110100000] Row 6: s = [11110101100] Between Row 6 and Row 7 s = [11110101110] s = [11110101111] s = [01110101111] s = [00010101111] 2 Cln3 ? MBF Row 2: s = [01001010000] Row 3: s = [01011010000] Row 4: s = [01010000000] Row 5: s = [01010100000] Row 6: s = [01010101100] 3 Cln1,2 ? Cln1,2 Between Row 7 and Row 8: s = [00010001111] 4 Sic1 ? Clb1,2 Between Row 9 and Row 11 s = [00001011111] s = [00001010111] 5 Clb5,6 ? Sic1 No effect 6 Clb5,6 ? Cdh1 No effect 7 Clb5,6 ? Mcm1/SFF Row 6: s = [01110101000] 8 Cdh1 ? Clb1,2 No effect 9 Clb1,2 ? SBF Between Row 6 and Row 8 s = [01010101110] s = [01010001111] s = [00010001111] 10 Clb1,2 ? MBF Between Row 6 and Row 8 66 s = [00110101110] s = [00100101111] s = [00000101111] 11 Clb1,2 ? Sic1 No effect 12 Clb1,2 ? Cdh1 Row 9: s = [00001011111] Row 10: s = [00001010111] 13 Clb1,2 ? Mcm1/SFF Between Row 8 and Row 12 s = [00001001011] s = [00001000010] 14 Clb1,2 ? Cdc20/Cdc14 No effect 15 Clb1,2 ? Swi5 Row 7: s = [00010101111] 16 Mcm1/SFF ? Clb1,2 Skips Row 9 17 Mcm1/SFF ? Mcm1/SFF Between Row 7 and Row 8: s = [00010001111] 18 Mcm1/SFF ? Cdc20/Cdc14 Skips Row 11 19 Mcm1/SFF ? Swi5 Row 8: s = [00000001110] Row 9: s = [00001001110] Row 10: s = [00001000110] 20 Cdc20/Cdc14 ? Sic1 Row 9: s = [00000011111] Row 10: s = [00000010111] 21 Cdc20/Cdc14 ? Clb5,6 Between Row 6 and Row 9 s = [01000101110] s = [00000101111] s = [00001101111] 22 Cdc20/Cdc14 ? Cdh1 Between Row 7 and Row 8 s = [00010001111] Between Row 10 and Row 13 s = [00001000011] s = [00001000001] s = [00001000000] 23 Cdc20/Cdc14 ? Clb1,2 Between Row 9 and Row 11 s = [00001011111] s = [00001010111] 24 Cdc20/Cdc14 ? Cdc20/Cdc14 Between Row 7 and Row 8: s = [00010001111] 25 Cdc20/Cdc14 ? Swi5 Row 8: s = [00000001110] Row 9: s = [00001001110] Row 10: s = [00001000110] Skips Row 12 26 Swi5 ? Swi5 Between Row 7 and Row 8: s = [00010001111] 67 Table 5.7: The edges of the gene regulatory network that when removed do not include all phases of the cell cycle when modeled by the stochastic model. All path- ways carrying a minimum of 20% of the current was found after each removal in the GRN. Parameter values ? = 5 and ? = 6 were used. The dominant pathway (pathway carrying most current) in each case was compared to the biological path- way described in Table 5.2. The effect on the main cycle corresponds to differences in protein state over time shown in Table 5.2 and s refers to the cell state (i.e. si = 1 or 0 if protein i is active or inactive). ?Row? corresponds to the row in Table 5.2 and we show how a row is modified according to the edge removal. Removed edge Effect on the main cycle 1 MBF ? Clb5,6 Between Row 4 and Row 8: s = [01110001000] s = [00010001110] 2 Cln1,2 ? Sic1 Between Row 3 and Row 8: s = [01111000000] s = [01111100000] s = [01110100100] s = [01110101111] s = [00010101111] 3 Cln1,2 ? Cdh1 Between Row 3 and Row 8: s = [01110010000] s = [01110110000] s = [01110100100] s = [01110101111] s = [00010101111] 4 Sic1 ? Clb5,6 Between Row 2 and Row 8: s = [01111110000] s = [01110100100] s = [01110101111] s = [00010101111] 5 Clb5,6 ? Clb1,2 Between Row 5 and Row 8: s = [01110100100] s = [01110101111] s = [00010101111] 6 Swi5 ? Sic1 Skips Row 2 - Row 12 68 5.4 Comparison of the Deterministic and Stochastic Models Figure 5.9 shows a closer comparison of the results of the edge removal algo- rithms for the deterministic and stochastic models. Edges colored green/red corre- spond to ones with green/red arrows in the GRN shown in Fig. 5.2 respectively. We see that 5 of the edges can be individually removed from the GRN without having any effect on the main cycle described by Table 5.2 when modeled by both the deterministic and stochastic model. Namely these edges are Clb5,6 ? Sic1, Clb5,6 ? Cdh1, Cdh1 ? Clb1,2, Clb1,2 ? Sic1, and Clb1,2 ? Cdc20/Cdc14. There are 11 edges that can be removed from the GRN when modeled by the deterministic model having little to no effect on the cell cycle and 3 edges that when removed do not go through all phases of the cell cycle as described in section 5.1.1. These same 11 edges can also be removed from the GRN when modeled by the stochastic model having little to no effect on the cell cycle and the same 3 edges can be removed when modeled by the stochastic model resulting in skipping the gap G2 phase. These 3 edges are again Cln1,2 ? Cdh1, Sic1 ? Clb5,6, and Clb5,6 ? Clb1,2. In both models, when removing any of these 3 edges, the protein complex Cdc20/Cdc14 activates sooner than expected resulting in the G2 phase being skipped. There are additional 15 edges that can be removed from the stochastic model with little to no effect on the cell cycle and additional 3 edges that can be removed but do not go through all the cell cycle phases. These additional 3 edges are MBF ? Clb5,6, Cln1,2 ? Sic1, and Swi5 ? Sic1 (Rows 1,2 and 6 from Table 5.7). In 69 the removal of MBF ? Clb5,6, the protein Clb5,6 never activates causing the cell to never reach synthesis or the G2 phase. Synthesis being a crucial step in the cell cycle, the removal of this edge completely destroys the process. The removal of Cln1,2? Sic1 results in the G2 phase being skipped due to the premature activation of Cdc20/Cdc14. Finally, the removal of Swi5? Sic1 results in a jump directly from the excited G1 state to the stationary G1* state. This removes all states in between including those involved in synthesis and mitosis, both being crucial phases in the cell cycle. Thus, while we still find a path from START to G1*, the removal of this edge also destroys the cell cycle. Our analysis shows that the stochastic model is much more robust than the deterministic one since it allows for the individual removal of more than double the number of edges than the deterministic model allows. We use the term robust in the sense that when applying Algorithm 3 to the model, the biological pathway can be extracted under a wide range of edges being individually removed from the GRN. 70 Deterministic Stochastic No effect: Small effect: ???5,6 ? ???1 ???3 ? ???3 ???5,6 ? ???1 ???3 ? ??? ???1 ? ???1,2 ???1,2 ? ???1,2 ???1,2 ? ???1 ???1 ? ???1,2 ???1,2 ? ???20/???14 ???1,2 ? ??? Small effect: ???1,2 ? ??? ???5,6 ? ???1 /??? ???1 /??? ? ???1/??? ???1,2 ? ???1 ???1 /??? ? ???5 ???1,2 ? ???1 /??? ???20 /???14 ? ???1 ???1,2 ? ???5 ???20 /???14 ? ???5,6 ???1 /??? ? ???1,2 ???20 /???14 ? ???1 ???1 /??? ? ???20/???14 ???20 /???14 ? ???1,2???20 /???14 ? ???20 /???14 Loss of G2 phase: ???20 /???14 ? ???5 ???1,2 ? ???1 ???5 ? ???5 ???1 ? ???5,6 ???5,6 ? ???1,2 Loss of G2 phase: ???1,2 ? ???1 Loss of S and G2 phases: ??? ? ???5,6 Loss of S, G2 and M phases: ???5 ? ???1 Figure 5.9: Comparison of edge removal algorithms for deterministic and stochastic models. The edges are categorized by their removal having either no effect on the cell cycle, a small effect on the cell cycle or the loss of some particular phases of the cell cycle. Edges are colored according to being an activating or inhibiting edge as described in Section 5.1.2. 71 Chapter 6: Conclusion 6.1 Summary of Methodological Advances We have developed a set of analytical and computational tools based on transition path theory (TPT) to analyze flows in time-irreversible Markov chains. Our developments are equally applicable to both discrete-time and continuous-time Markov chains. Since the concept of reactive current has not been defined for time- irreversible networks, we proposed an extension of this. We propose a general recipe for designing modified Markov chains with desired stationary current and desired invariant distribution. With this, we are able to apply our recipe to time-irreversible networks so that the stationary currents in the modified Markov chains are equal to the effective currents and acyclic currents in the original ones. We call the currents generated in this manner the acyclic currents which can be used as a counterpart of the reactive current for time-irreversible Markov chains. To generate these acyclic currents in practice, we developed a cycle removal algorithm (Algorithm 3). As a result, we are able to apply our tools based on TPT to quantify the transitions from one user defined subset of states to another. As a visual example, we applied Algorithm 3 to the Maier-Stein SDE. With 72 this example, we were able to reduce the computational efforts of our cycle removal algorithm (Algorithm 3) by applying the algorithm to strongly connected compo- nents separately. This example also shows that in time-irreversible Markov chains, the forward (or backward) committor values does not have to strictly increase (or de- crease) along no-detour reactive trajectories which is contrary to the time-reversible case. 6.2 Summary of Application to GRN We investigate the dynamics of the GRN using two algorithms: (1) depth-first search algorithm used for the deterministic model and (2) TPT and the cycle removal algorithm used for the stochastic model. TPT and the cycle removal algorithm allow us to find acyclic currents where only the most likely transitions that occur between cell states are found. We then use these algorithms to test the robustness of the GRN by identifying redundant edges in the models. In order to identify redundant edges in models we apply the following steps. We first distort the GRN by removing one edge and adjusting the influence matrix. We then apply the appropriate transition rules for each model to obtain the dynamical networks. We explore these dynamical networks using algorithms (1) and (2) described above. Finally, we check if there is a pathway corresponding to the cell cycle going from the excited growth phase to the stationary growth phase. If such a pathway exists, we compare it to the one in the original dynamical network and mark changes in it if any. We repeat this process for all edges of the regulatory network in order to determine which edges 73 can be removed without a major effect on the cell cycle. Our methodology can be used as a deterministic computational tool for analysis of the dynamical network. We conclude the removal of 11 out of 34 edges make no significant effect on the cell cycle modeled by the deterministic GRN. Thus, if a mutation occurs in protein A that does not allow for protein B to be turned on, the cell cycle will still proceed as expected. In some cases, there may only be minor modifications to the size of the basin of attraction or main cycle pathway. There are 3 edges that when removed only result in the loss of the gap phase G2. The remaining 20 edges are very essential to the gene regulatory network. Therefore, removing any of the remaining 20 edges causes the cell cycle to be destroyed and cell states converge to states not on the biological pathway. We also conclude that 26 out of the possible 34 edges can be removed without notable effects on the cell cycle modeled by the stochastic model. There are 4 edges that result in the loss of the gap phase, 2 edges that result in the loss of synthesis and/or mitosis, and removing either of the remaining 2 edges produce no pathway. Overall, this shows that the stochastic model is notably more robust than the deterministic one. The GRN for the budding yeast cell cycle is a robust network. From a biologi- cal point of view, it is important for us to perform mutation analysis on GRNs since it would be beneficial to be able to specifically determine which mutations allow for the survival of the biological process of interest and which mutations aid in their destruction. Future research would include testing the robustness and investigating the dynamics of more complex GRNs. 74 6.3 Future work One question that arises is, can this methodology be used if we are unable to store the entire dynamical network for a GRN into computer memory? With an N -node GRN, the dynamical network consist of 2N nodes. Hence, given a GRN with say 50 nodes, we are unable to store the entire network with 250 ? 1015 nodes and apply our methodology. However, even when the network is large and complex, we would like our tools to still be efficient in analyzing these networks. Through our analysis, we have observed that even though the dynamical network for some GRNs become huge, the vast majority of states are biologically meaningless. Thus, if we can incorporate in our algorithm a way to extract meaningful states, say through Monte Carlo sampling, then we can apply our methodology to larger and more complex networks. We would like to explore this idea through more examples involving larger networks [9, 44]. Gene regulatory networks are becoming an increasingly-popular tool for the modeling and analysis of biological processes [45?47]. One goal is to use our method- ology to answer questions arising in more complicated GRNs. One network of in- terest is the GRN model representing the segment polarity genes [48, 49]. These genes are a group of genes involved in embryonic pattern formation in the fruit fly Drosophila Melanogaster. Homologs of the segment polarity genes have been identified in vertebrates, including humans, which suggests strong evolutionary con- servation. This GRN is more complex in that these genes refine and maintain their expression through the network of intra- and intercellular regulatory interactions 75 and consists of 16 nodes. Moreover, some models of the GRN introduce a time- delay in the activation or inactivation of nodes [50]. It would be of interest to us to see how these time-delays effect the dynamic of the network and overall robustness. Finally, our analytical and computational tools can also be applied to time- irreversible Markov chains arising in other applications. One application of interest comes from investigating the aggregation process of Lennard-Jones atoms. In 2017, Forman and Cameron [51] proposed expected initial and pre-attachment distribu- tions to analyze the aggregation/deformation in the LJ6?14 network, where LJN is the N -atom Lennard-Jones cluster and the network representing its energy land- scape (this was later extended to the LJ6?15 network). With this they answered the question: If the aggregation process starts at the bicapped tetrahedron local minimum of LJ6, formed as a result of the attachment of an additional atom to the only minimum of LJ5, what configurations are most likely to be observed in each LJN as the aggregation process proceeds to LJ14. This LJ6?15 network is not only time-irreversible but also reducible. However, allowing for the attachment and detachment of particles gives rise to the irreducible network we desire. Thus, we would like to explore the use of our methodology to answer similar questions for this extended network. 76 Bibliography [1] Fangting Li, Tao Long, Ying Lu, Qi Ouyang, and Chao Tang. The yeast cell- cycle network is robustly designed. Proceedings of the National Academy of Sciences, 101(14):4781?4786, 2004. [2] Yuping Zhang, Minping Qian, Qi Ouyang, Minghua Deng, Fangting Li, and Chao Tang. Stochastic model of yeast cell-cycle network. Physica D: Nonlinear Phenomena, 219(1):35?39, 2006. [3] Albert-La?szlo? Baraba?si et al. Network science. Cambridge university press, 2016. [4] Wataru Souma, Yoshi Fujiwara, and Hideaki Aoyama. Complex networks and economics. Physica A: Statistical Mechanics and its Applications, 324(1-2):396? 401, 2003. [5] David Wales et al. Energy landscapes: Applications to clusters, biomolecules and glasses. Cambridge University Press, 2003. [6] Stanley Wasserman, Katherine Faust, et al. Social network analysis: Methods and applications, volume 8. Cambridge university press, 1994. [7] Peter J Carrington, John Scott, and Stanley Wasserman. Models and methods in social network analysis, volume 28. Cambridge university press, 2005. [8] Mark EJ Newman. The structure and function of complex networks. SIAM review, 45(2):167?256, 2003. [9] Ilya Shmulevich, Ilya Gluhovsky, Ronaldo F Hashimoto, Edward R Dougherty, and Wei Zhang. Steady-state analysis of genetic regulatory networks mod- elled by probabilistic boolean networks. Comparative and functional genomics, 4(6):601?608, 2003. [10] Xiao Fan Wang and Guanrong Chen. Complex networks: small-world, scale- free and beyond. IEEE circuits and systems magazine, 3(1):6?20, 2003. 77 [11] Mark EJ Newman. Modularity and community structure in networks. Proceed- ings of the national academy of sciences, 103(23):8577?8582, 2006. [12] Huawei Shen, Xueqi Cheng, Kai Cai, and Mao-Bin Hu. Detect overlapping and hierarchical community structure in networks. Physica A: Statistical Mechanics and its Applications, 388(8):1706?1712, 2009. [13] Anton Bovier, Michael Eckhoff, Ve?ronique Gayrard, and Markus Klein. Metastability and low lying spectra in reversible markov chains. Communi- cations in mathematical physics, 228(2):219?255, 2002. [14] Anton Bovier, Michael Eckhoff, Ve?ronique Gayrard, and Markus Klein. Metastability in reversible diffusion processes i: Sharp asymptotics for capac- ities and exit times. Journal of the European Mathematical Society, 6(4):399? 424, 2004. [15] Weinan E. and Eric Vanden-Eijnden. Towards a theory of transition paths. Journal of statistical physics, 123(3):503, 2006. [16] Philipp Metzner, Christof Schu?tte, and Eric Vanden-Eijnden. Transition path theory for Markov jump processes. Multiscale Modeling & Simulation, 7(3):1192?1219, 2009. [17] Maria Cameron and Eric Vanden-Eijnden. Flows in complex networks: theory, algorithms, and application to Lennard-Jones cluster rearrangement. Journal of Statistical Physics, 156(3):427?454, 2014. [18] Michael Manhart and Alexandre V Morozov. Statistical physics of evolution- ary trajectories on fitness landscapes. In First-Passage Phenomena and Their Applications, pages 416?446. World Scientific, 2014. [19] Michael Manhart and Alexandre V Morozov. Path-based approach to random walks on networks characterizes how proteins evolve new functions. Physical review letters, 111(8):088102, 2013. [20] Eric H Davidson and Douglas H Erwin. Gene regulatory networks and the evolution of animal body plans. Science, 311(5762):796?800, 2006. [21] Andre Ribeiro, Rui Zhu, and Stuart A Kauffman. A general modeling strategy for gene regulatory networks with stochastic dynamics. Journal of Computa- tional Biology, 13(9):1630?1639, 2006. [22] Selene L Fernandez-Valverde, Felipe Aguilera, and Rene? Alexander Ramos- D??az. Inference of developmental gene regulatory networks beyond classical model systems: new approaches in the post-genomic era. Integrative and com- parative biology, 58(4):640?653, 2018. 78 [23] Frank Emmert-Streib, Matthias Dehmer, and Benjamin Haibe-Kains. Gene regulatory networks and their applications: understanding biological and med- ical problems in terms of networks. Frontiers in cell and developmental biology, 2:38, 2014. [24] Debashis Barik, David A Ball, Jean Peccoud, and John J Tyson. A stochastic model of the yeast cell cycle reveals roles for feedback regulation in limiting cellular variability. PLoS computational biology, 12(12), 2016. [25] Katherine C Chen, Attila Csikasz-Nagy, Bela Gyorffy, John Val, Bela Novak, and John J Tyson. Kinetic analysis of a molecular model of the budding yeast cell cycle. Molecular biology of the cell, 11(1):369?391, 2000. [26] Likun Zheng, Meng Chen, and Qing Nie. External noise control in inherently stochastic biological systems. Journal of mathematical physics, 53(11):115616, 2012. [27] Michael B Elowitz, Arnold J Levine, Eric D Siggia, and Peter S Swain. Stochas- tic gene expression in a single cell. Science, 297(5584):1183?1186, 2002. [28] Peter Ha?nggi. Stochastic resonance in biology how noise can enhance de- tection of weak signals and help improve biological information processing. ChemPhysChem, 3(3):285?290, 2002. [29] Jonathan M Raser and Erin K O?Shea. Noise in gene expression: origins, consequences, and control. Science, 309(5743):2010?2013, 2005. [30] Henry Eyring. The activated complex in chemical reactions. The Journal of Chemical Physics, 3(2):107?115, 1935. [31] James R Norris. Markov chains. Number 2. Cambridge university press, 1998. [32] Donald G Truhlar, Bruce C Garrett, and Stephen J Klippenstein. Cur- rent status of transition-state theory. The Journal of physical chemistry, 100(31):12771?12800, 1996. [33] Christoph Dellago, Peter Bolhuis, and Phillip L Geissler. Transition path sam- pling. Advances in chemical physics, 123:1?78, 2002. [34] Jianfeng Lu and James Nolen. Reactive trajectories and the transition path process. Probability Theory and Related Fields, 161(1-2):195?244, 2015. [35] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to algorithms. MIT press, 2009. [36] Robert Tarjan. Depth-first search and linear graph algorithms. SIAM journal on computing, 1(2):146?160, 1972. 79 [37] Dimitri P Bertsekas. Network optimization: continuous and discrete models. Athena Scientific Belmont, 1998. [38] Robert S Maier and Daniel L Stein. A scaling theory of bifurcations in the sym- metric weak-noise escape problem. Journal of statistical physics, 83(3-4):291? 357, 1996. [39] Frederick R Cross, Vincent Archambault, Mary Miller, and Martha Klovstad. Testing a mathematical model of the yeast cell cycle. Molecular biology of the cell, 13(1):52?70, 2002. [40] Katherine C Chen, Laurence Calzone, Attila Csikasz-Nagy, Frederick R Cross, Bela Novak, and John J Tyson. Integrative analysis of cell cycle control in budding yeast. Molecular biology of the cell, 15(8):3841?3862, 2004. [41] John J Tyson, Bela Novak, Kathy Chen, and John Val. Checkpoints in the cell cycle from a modeler?s perspective. In Progress in cell cycle research, pages 1?8. Springer, 1995. [42] Michael D Mendenhall and Amy E Hodge. Regulation of cdc28 cyclin- dependent protein kinase activity during the cell cycle of the yeast saccha- romyces cerevisiae. Microbiol. Mol. Biol. Rev., 62(4):1191?1243, 1998. [43] Alan U Sabino, Miguel FS Vasconcelos, Misaki Yamada Sittoni, Willian W Lautenschlager, Alexandre S Queiroga, Mauro CC Morais, and Alexandre F Ramos. Lessons and perspectives for applications of stochastic models in bio- logical and cancer research. Clinics, 73, 2018. [44] Shane Squires, Andrew Pomerance, Michelle Girvan, and Edward Ott. Stability of boolean networks: The joint effects of topology and update rules. Physical Review E, 90(2):022814, 2014. [45] Fernando M Delgado and Francisco Go?mez-Vela. Computational methods for gene regulatory networks reconstruction and analysis: A review. Artificial intelligence in medicine, 95:133?145, 2019. [46] Gilles Bernot, Jean-Paul Comet, Adrien Richard, Madalena Chaves, Jean-Luc Gouze?, and Fre?de?ric Dayan. Modeling and analysis of gene regulatory networks. In Modeling in Computational Biology and Biomedicine, pages 47?80. Springer, 2013. [47] Ezio Bartocci and Pietro Lio?. Computational modeling, formal analysis, and tools for systems biology. PLoS computational biology, 12(1), 2016. [48] Re?ka Albert and Hans G Othmer. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in drosophila melanogaster. Journal of theoretical biology, 223(1):1?18, 2003. 80 [49] George Von Dassow, Eli Meir, Edwin M Munro, and Garrett M Odell. The segment polarity network is a robust developmental module. Nature, 406(6792):188?192, 2000. [50] Madalena Chaves, Reka Albert, and Eduardo D Sontag. Robustness and fragility of boolean models for genetic regulatory networks. Journal of the- oretical biology, 235(3):431?449, 2005. [51] Yakir Forman and Maria Cameron. Modeling aggregation processes of lennard-jones particles via stochastic networks. Journal of Statistical Physics, 168(2):408?433, 2017. 81