Fixed Points in Two{Neuron Discrete Time Recurrent Networks: Stability and Bifurcation Considerations Peter Tino  Department of Computer Science and Engineering Slovak Technical University Ilkovicova 3, 812 19 Bratislava, Slovakia Email: tino@decef.elf.stuba.sk Bill G. Horne and C. Lee Giles y NEC Research Institute 4 Independence Way Princeton, NJ 08540 Email: fhorne,gilesg@research.nj.nec.com Technical Report UMIACS-TR-95-51 and CS-TR-3461 Institute for Advanced Computer Studies University of Maryland College Park, MD 20742  Currently with NEC Research Institute, 4 Independence Way, Princeton, NJ 08540, Email: tino@research.nj.nec.com y Also with UMIACS, University of Maryland, College Park, MD 20742 1 ABSTRACT The position, number and stability types of xed points of a two{neuron recurrent network with nonzero weights are investigated. Using simple geometrical arguments in the space of derivativesof the sigmoidtransferfunctionwith respect to the weighted sum of neuron inputs, we partition the network state space into several regions corre- sponding to stability types of the xed points. If the neurons have the same mutual interaction pattern, i.e. they either mutually inhibit or mutually excite themselves, a lower bound on the rate of convergence of the attractive xed points towards the saturation values, as the absolute values of weights on the self{loops grow, is given. The role of weights in location of xed points is explored through an intuitively appealing characterization of neurons according to their inhibition/excitation perfor- mance in the network. In particular, each neuron can be of one of the four types: greedy, enthusiastic, altruistic or depressed. Both with and without the external in- hibition/excitation sources, we investigate the position and number of xed points according to character of the neurons. When both neurons self-excite themselves and have the same mutual interaction pattern, the mechanism of creation of a new attractive xed point is shown to be that of saddle node bifurcation. 2 1 Introduction In this contribution we address the issues concerning xed points of discrete{time recurrent neural networks consisting of two neurons. Nonzero weights are assumed. As pointed out in [3], because of the interest in associative memory applications, a great deal of previous work has focused on the question of how to constrain the weights of the recurrent networks so that they exhibit only xed points (no oscillatory dynamics) [6]. In this context, it is desirable that all xed points are attractive. Recently, Jin, Niki ruk and Gupta [16] reported new results on the absolute stability for a rather general class of recurrent neural networks. Conditions under which all xed points of the network are attractive were determined by the weight matrix of the network. However, there are many applications where oscillatory dynamics of recurrent networks is de- sirable. For example, when trained to act as a nite state machine ([7], [9] [11], [12], [17], [19], [21], [22]), the network has to induce a stable representation of state transitions associated with each input symbol of the machine. A transition may have a character of a loop (do not move from the current state when the symbol x is presented), or a cycle (when repeatedly presenting the same input, we eventually return to the state where we have started). As reported in [5], [17], and [18], loops and cycles associated with an input symbol x are usually represented as attractive xed points and periodic orbits respectively of the underlying dynamical system corresponding to the input x. In this respect, one can look at the training process from the point of view of bifurcation analysis. The network solves the task of nite state machine simulation by location of point and periodic attractors and shaping their respective basins of attraction [8]. Before training, the connection weights are set to small random values and as a consequence, the network has only one attractor basin. This implies that the network must undergo several bifurcations [10]. In [18], a preliminary analysis of the two{neuron recurrent network is given. Under some some speci c conditions on weight values, the number, position and stability types of xed points of the underlying dynamical systems are analyzed and bifurcation mechanism is clari ed. The most typical bifurcation responsible for the creation of a new xed point is the saddle node bifurcation. Typically, studies of the asymptotic behaviour of recurrent neural networks usually assume some form of a structure in the weight matrix describing connectivity pattern among recurrent neurons. For example, symmetric connectivity and absence of self-interactions enabled Hop eld [14] to interpret the network as a physical system having energy minima in attractive xed points of the network. These rather strict conditions were weakened in [6], where a more easily satis ed conditions are formulated. Blum and Wang [4] globally analyze networks with nonsymmetrical connectivity patterns of special types. In case of two recurrent neurons with sigmoidal activation function g(`) = 1=(1+ e BnZr` ), they give results for weight matrices with diagonal elements equal to zero 1 . This paper presents a generalization of the results presented in [18]. A similar approach to determining the number and position of xed points in continuous{time recurrent neural networks can be found in [3]. In section 3, the network state space is partitioned into several regions corresponding to stability types of the xed points. This is done by rst exploring the space of derivatives of the sigmoid transfer function with respect to the weighted sum of neuron inputs. Then, the structure is transformed into the space of neuron activations. It was proved by Hirsh [13], that when all the weights in a recurrent network with exclusively 1 In such a case the recurrent network is shown to have only one xed point and no \genuine" periodic orbits (of period greater than one) 3 self-exciting (or exclusively self-inhibiting) neurons are multiplied by larger and larger positive number (neural gain), attractive xed points tend to saturated activation values. Due to the analysis in section 3, in case of two{neuron network, under the assumption that the neurons have the same mutual interaction pattern 2 , we give a lower bound on the rate of convergence of the attractive xed points towards the saturation values as the absolute values of weights on the self{loops grow. In section 4 the position and the number of xed points is discussed. The role of weights in location of xed points is investigated through an intuitively appealing characterization of neurons according to their inhibition/excitationperformance inthe network. For example,we view aneuron as a greedy one, if it self{excites itself, but inhibits the other neuron; an enthusiastic neuron excites both itself and the other neuron; etc... In the context of greedy and enthusiastic neurons, the saddle node bifurcation, as a mechanism responsible for creation of a new attractive xed point, is described in section 5. Section 2 brie y introduces some necessary concepts fromthe theory of discrete timedynamical systems. 2 Dynamical systems A discrete-time dynamical system can be represented as the iteration of a (di erentiable) function f : A ! A (A  < n ), i.e. x t+1 = f(x t ); t 2 N; (1) where N denotes the set of all natural numbers. For each x 2 A, the iteration (1) generates a sequence of distinct points de ning the orbit, or trajectory of x under f. Hence, the orbit of x under f is the set ff m (x)j m  0g. For m  1, f m is the composition of f with itself m times. f 0 is de ned to be the identity map on A. A point x  2 A is called a xed point of f, if f m (x  ) = x  , for all m 2 N. Fixed points can be classi ed according to the behaviour of the orbits of points in their vicinity. A xed point x  is said to be asymptotically stable (or an attractive point of f), if there exists a neighborhood O(x  ) of x  , such that lim m!1 f m (x)=x  , for all x 2 O(x  ). As m increases, trajectories of points near to an asymptotically stable xed point tend to it. A xed point x  of f is asymptotically stable only if for each eigenvalue  of Df(x  ), the Jacobian of f at x  , jj < 1 holds. The eigenvalues of Df(x  ) govern whether or not the map f in a vicinity of x  has contracting or expanding directions. Eigenvalues larger in absolute value than one lead to expansion, whereas eigenvalues smaller than one lead to contraction. If all the eigenvalues of Df(x  ) are outside the unit circle, x  is a repulsive point, or repellor. All points from a neighborhood of a repellor moveaway fromit as m increases. If some eigenvalues of Df(x  ) are inside and some are outside the unit circle, x  is said to be a saddle point. 3 Qualitative analysis The iterative map under consideration can be written as follows: (x n+1 ;y n+1 ) = (g(ax n +by n + t 1 );g(cx n +dy n + t 2 )); (2) 2 they either mutually inhibit or mutually excite themselves 4 where (x n ;y n ) 2(0;1) 2 is the state of the network at the time step n, a;b;c;d 2 < nf0g and t 1 ;t 2 2 < are weights and bias terms respectively. g is a sigmoid function g(`) = 1=(1 + e BnZr` ). Since the neuron activations x n and y n are positive, signs of the weights determine the type of the corresponding connections: a connection is exciting and inhibiting if its weight is positive and negative respectively. The aim of this section is to partition the state space (0;1) 2 of neurons' activations into several regions according to stability types of xed points of (2). De ne the map  : (0;1) 2 ! (0;1=4] 2 as (x;y) = (x(1BnZrx);y(1BnZry)): (3) Let F(u;v) be a function F : < 2 !< 2 . The sets f(u;v)jF(u;v) < 0g; f(u;v)jF(u;v) = 0g and f(u;v)jF(u;v) > 0g are denoted by F BnZr ;F 0 and F + respectively. Theorem 1: If bc > 0, then all attractive xed points (x;y) of (2) satisfy (x;y) 2  0; 1 jaj    0; 1 jdj  : Proof: Any xed point (x;y) of (2) satis es (x;y) = (g(ax +by +t 1 );g(cx+dy + t 2 )): (4) Jacobian J(x;y) of (2) in (x;y) is given by aG 1 (x;y) bG 1 (x;y) cG 2 (x;y) dG 2 (x;y) ! ; where G 1 (x;y) = g 0 (ax + by + t 1 ) and G 2 (x;y) = g 0 (cx + dy + t 2 ). Since g 0 (p) = g(p)(1BnZr g(p)), considering (4) and (3) we have (G 1 (x;y);G 2 (x;y)) = (x;y): (5) The eigenvalues of J are 3  1;2 = aG 1 + dG 2  p D(G 1 ;G 2 ) 2 ; where D(G 1 ;G 2 ) = (aG 1 BnZrdG 2 ) 2 +4G 1 G 2 bc: (6) Assume a;d > 0, i.e both neurons self-excite themselves. Then D + ; +  (0;1) 2 , where (G 1 ;G 2 ) = aG 1 + dG 2 : (7) Since G 1 ;G 2 can only be positive, it follows that to identify possible values of G 1 and G 2 so that j 1;2 j < 1, it is sucient to solve the inequality aG 1 +dG 2 + p D(G 1 ;G 2 ) < 2, or equivalently 2BnZraG 1 BnZrdG 2 > p D(G 1 ;G 2 ): (8) 3 to simplify the notation, the identi cation (x;y) of a xed point in which (2) is linearized is omitted 5 (0,0) G 1 2G 1/d 2 1/a 1/a 1/d ? ? adbc ?01 0 1? 0? 1 ?0 Figure 1: An illustration for the proof of Theorem 1. a;d > 0; bc > 0. All (G 1 ;G 2 ) 2 (0;1=4] 2 below the left branch (if ad  bc ), or between the branches of  0 1 (if ad < bc ) correspond to the attractive xed points. Di erent line styles are associated with the cases ad > bc; ad = bc and ad < bc, namely, the solid, dotted and dashed lines respectively. Consider only (G 1 ;G 2 ) such that 4  1 (G 1 ;G 2 ) = aG 1 + dG 2 BnZr2 < 0: (9) All (G 1 ;G 2 ) 2  + 1 (above  0 1 ) lead to at least one eigenvalue of J greater than 1. Squaring both sides of (8) we arrive at  1 (G 1 ;G 2 ) = (adBnZrbc)G 1 G 2 BnZraG 1 BnZrdG 2 +1 > 0: (10) If ad 6= bc, the \border" curve  0 1 is a hyperbola G 2 = 1 ~ d + B G 1 BnZr 1 ~a ; (11) where ~a = adBnZrbc d = aBnZr bc d ; ~ d = adBnZrbc a = dBnZr bc a ; and B = bc (adBnZrbc) 2 : It is easy to check that (1=a;0);(0;1=d)2  0 1 . If ad = bc,  0 1 is a line passing through the points (0;1=d) and (1=a;0) (see gure 1). A xed point (x;y) of (2) is attractive only if (G 1 ;G 2 ) = (x;y) 2  + 1 \ BnZr 1 , where the map  is de ned by (3). A necessary (not sucient) condition for (x;y) to be attractive reads 5 (x;y) 2  0; 1 a    0; 1 d  : 4 (G 1 ;G 2 ) lying under the line  0 1 : aG 1 + dG 2 = 2. 5 If ad > bc, then 0 < ~a < a and 0 < ~ d < d. (G 1 ;G 2 ) 2  + 1 lie under the \left branch" and above the \right branch" of  0 1 . It is easy to see that since we are con ned to  BnZr 1 (below the line  0 1 ), only (G 1 ;G 2 ) under the \left branch" of  0 1 will be considered. Indeed,  0 1 is a decreasing line going through (1=a;1=d) and so it never intersects the right branch of  0 1 . If ad < bc, then ~a; ~ d < 0 and (G 1 ;G 2 ) 2  + 1 lie between the two branches of  0 1 . 6 Consider now the case of self-inhibiting neurons, i.e. a;d < 0. Since BnZr  (0;1) 2 , in order to identify possible values of (G 1 ;G 2 ) such that j 1;2 j < 1, it is sucient to solve the inequality aG 1 +dG 2 BnZr p D(G 1 ;G 2 ) > BnZr2, or equivalently 2+aG 1 +dG 2 > p D(G 1 ;G 2 ): (12) Analogically to the previous case, we shall consider only (G 1 ;G 2 ) such that 6  2 (G 1 ;G 2 ) = aG 1 + dG 2 +2 > 0: (13) (G 1 ;G 2 ) 2  BnZr 2 (above  0 2 ) lead to at least one eigenvalue of J greater than 1. Squaring both sides of (12) we arrive at  2 (G 1 ;G 2 ) = (adBnZrbc)G 1 G 2 +aG 1 +dG 2 + 1 > 0; (14) which is equivalent to ((BnZra)(BnZrd)BnZrbc)G 1 G 2 BnZr(BnZra)G 1 BnZr(BnZrd)G 2 + 1 > 0: (15) Further analysis equals the analysis from the previous case (a;d > 0) with a;~a;d and ~ d replaced by jaj;jajBnZrbc=jdj;jdjand jdjBnZrbc=jaj respectively. If ad 6= bc, the \border" curve  0 2 is a hyperbola G 2 = BnZr1 ~ d + B G 1 + 1 ~a (16) with (BnZr1=a;0);(0;BnZr1=d)2  0 2 . If ad = bc,  0 2 is a line passing through (0;BnZr1=d) and (BnZr1=a;0). A xed point (x;y) of (2) is attractive only if (G 1 ;G 2 ) = (x;y) 2  + 2 \ + 2 . (G 1 ;G 2 ) corre- sponding to attractive xed points of (2) must lie in (0;1=jaj)(0;1=jdj). Finally, consider the case when one of the neurons has a self-excitation link, while the other self-inhibits itself. Without a loss of generality assume that a > 0 and d < 0. Assume further that 7 (G 1 ;G 2 ) 2 + [ 0 . It is sucient to solve the inequality (8). The relevant 8 (G 1 ;G 2 ) lie in  + 1 \  BnZr 1 \( + [ 0 ) ( gure 2). From a > 0;d < 0 it follows that ad BnZr bc is negative, and 0 < a < ~a, ~ d < d < 0. For (G 1 ;G 2 ) 2 BnZr the relevant (G 1 ;G 2 ) lie in  + 2 \ + 2 . It can be easily seen that  0 1 and  0 2 intersect on the line 0 in (1=~a;1=a)(1=j ~ dj;1=jdj) (see gure 2). Added together, a xed point (x;y) of (2) is attractive only if (x;y) 2 [ + 1 \ BnZr 1 \( + [ 0 )][[ + 2 \ + 2 \ BnZr ]: In particular, if (x;y) is attractive, then (x;y) must lie in (0;1=a) (0;1=jdj). Examination of the case a < 0;d > 0 in the same way leads to a conclusion that all attractive xed points of (2) have their corresponding (G 1 ;G 2 ) in (0;1=jaj)(0;1=d). 6 (G 1 ;G 2 ) lying under the line  0 2 : BnZraG 1 BnZr dG 2 = 2. 7 (G 1 ;G 2 ) such that aG 1 + dG 2 is nonnegative lie under or on the line 0 : G 2 = aG 1 =jdj. 8 (G 1 ;G 2 ) that correspond to xed points in which both the eigenvalues of J have the absolute value less than one 7 (0,0) G1 G2 1/a -1/d 1/d 1/d -1/a 0 ~ ~ 2/d -2/d 1/a -1/a~ -1/d ~ ? 2 1 ? ? ? ? 1 1? 2? 2 0 0 0 0 0 0 Figure 2: An illustrationfor the proof of Theorem 1. a > 0;d < 0; bc > 0. All (G 1 ;G 2 ) 2 (0;1=4] 2 below and on the line 0 , and between the two branches of  0 1 (solid line) correspond to the attractive xed points. So do all (G 1 ;G 2 ) 2 (0;1=4] 2 above 0 , between the two branches of  0 2 (dashed line). Theorem 2: Assume bc < 0. Suppose ad > 0, or ad < 0 with jadj  jbcj=2. Then each xed point (x;y) of (2) such that 9 (x;y) 2  0; 1 jaj    0; 1 j ~ dj  [  0; 1 j~aj    0; 1 jdj  is attractive. In particular, all xed points (x;y) for which (x;y) 2  0; 1 j~aj    0; 1 j ~ dj  are attractive Proof: D(G 1 ;G 2 ) is no longer exclusively positive. It follows from analytic geometry (see for example [2]) that D(G 1 ;G 2 ) = 0 de nes either a single point or two lines (that can collide into one, or disappear). Since (aG 1 BnZrdG 2 ) 2  0, D(G 1 ;G 2 ) = 0 is satis ed only by those (G 1 ;G 2 ) for which G 1 G 2  0. Furthermore, D(0;0) = 0. Hence, D 0 is either a single point { the origin, or a pair of increasing lines (that may be the same) passing through the origin. Assume a;d > 0. Since D(1=a;1=d) = 4bc=ad < 0 and D(1=a;0) = D(0;1=d) = 1 > 0, the point (1=a;1=d) is always in 10 D BnZr , while (1=a;0);(0;1=d)2D + . First, we shall examine the case when D(G 1 ;G 2 ) is negative. From j 1;2 j 2 = (aG 1 + dG 2 ) 2 +jDj 4 = G 1 G 2 (adBnZrbc) 9 Recall that ~a and ~ d denote a BnZr bc=d and d BnZr bc=a respectively. 10 D BnZr is a nonempty region 8 G G 1/a 1/a 1/d 1/d 1 2 ~ ~ 0 D- (0,0) D+ 2/d 2/a D+ ? ? ? 0 0 1 1 ?01 D D0 0 Figure 3: An illustration for the proof of Theorem 2. a;d > 0; bc < 0. All (G 1 ;G 2 ) 2 D BnZr and below the right branch of  0 (dashed line) correspond to the attractive xed points. So do (G 1 ;G 2 ) 2 (0;1=4] 2 in D + between the two branches of  0 1 . it follows, that (G 1 ;G 2 ) 2 D BnZr , for which j 1;2 j < 1, lie in D BnZr \ BnZr ( gure 3), where (G 1 ;G 2 ) = (adBnZrbc)G 1 G 2 BnZr1: (17) It is easy to show that  1 a ; 1 ~ d  ;  1 ~a ; 1 d  2  0 ; 1 ~a < 1 a ; 1 ~ d < 1 d ; and  1 a ; 1 ~ d  ;  1 ~a ; 1 d  2D BnZr : Turn now to the case D(G 1 ;G 2 ) > 0. Using the technique from the previous proof we conclude that the relevant (G 1 ;G 2 ) lie in 11 D + \ BnZr 1 \ + 1 . D 0 ; 0 1 ; 0 1 and  0 intersect in two points as suggested by gure 3. To see this, note that for the points on  0 , it holds G 1 G 2 = 1 adBnZrbc ; and for all (G 1 ;G 2 ) 2D 0 \ 0 we have (aG 1 + dG 2 ) 2 = 4: (18) For G 1 ;G 2 > 0, (18) de nes the line  0 1 . Similarly, for (G 1 ;G 2 ) lying on  0 1 and  0 , it holds aG 1 + dG 2 = 2; which is the de nition of the line  0 1 .  0 1 and  0 are monotonically increasing and decreasing respectively, and there is exactly one intersection point of the right branch of  0 with each of the two branches of  0 1 . 11 under the line  0 1 and between the two branches of hyperbola  0 1 . Note that 0 < a < ~a; 0 < d < ~ d. 9 For (G 1 ;G 2 ) 2D 0 , j 1;2 j = aG 1 + dG 2 2 ; and the relevant (G 1 ;G 2 ) are from D 0 \ BnZr 1 . In summary, if a;d > 0, each xed point (x;y) of (2) such that (x;y) = (G 1 ;G 2 ) is from (x;y) 2  0; 1 a    0; 1 ~ d  [  0; 1 ~a    0; 1 d  is attractive. Assume a;d < 0. This case is identical to the case a;d > 0 examined above, with a;~a;d; ~ d; BnZr 1 and  + 1 replaced by jaj;j~aj;jdj;j ~ dj; + 2 and  + 2 respectively. First, note that D 0 is the same as before, since (aG 1 BnZrdG 2 ) 2 = (jajG 1 BnZrjdjG 2 ) 2 : Furthermore, adBnZr bc = jajjdjBnZrbc and so (G 1 ;G 2 ) 2 D BnZr , for which j 1;2 j < 1, lie in D BnZr \ BnZr . Again, it directly follows that  1 jaj ; 1 j ~ dj  ;  1 j~aj ; 1 jdj  2  0 ; 1 j~aj < 1 jaj ; 1 j ~ dj < 1 jdj ; and  1 jaj ; 1 j ~ dj  ;  1 j~aj ; 1 jdj  2D BnZr : For D + the relevant (G 1 ;G 2 ) lie in D + \ + 2 \ + 2 . All (G 1 ;G 2 ) 2 D 0 \ + 2 lead to j 1;2 j < 1. Hence, if a;d < 0, every xed point (x;y) of (2) such that (x;y) 2  0; 1 jaj    0; 1 j ~ dj  [  0; 1 j~aj    0; 1 jdj  is attractive. Finally, consider the case a > 0;d < 0. The case a < 0;d > 0 would be treated in exactly the same manner. Assume D BnZr is a nonempty region. Then, ad > bc must hold and  1 a ; 1 jdj  2D BnZr : This can be easily seen, since for ad < bc we would have D(G 1 G 2 ) = (aG 1 BnZrdG 2 ) 2 + 4G 1 G 2 bc = (aG 1 +dG 2 ) 2 + 4G 1 G 2 (bcBnZrad)  0 and D BnZr would not be a nonempty region. The sign of D  1 a ; 1 jdj  = 4  1+ bc ajdj  is determined by the sign of ajdj+bc = bcBnZrad < 0. (G 1 ;G 2 ) 2 D BnZr , for which j 1;2 j < 1, lie in D BnZr \ BnZr and  1 a ; 1 ~ d  ;  1 j~aj ; 1 jdj  2  0 : Note that ~ d jdj and j~aj a only if 2ajdj jbcj. Only those (G 1 ;G 2 ) 2 D 0 are taken into account for which jaG 1 + dG 2 j < 2. This is true for all 12 (G 1 ;G 2 ) 2D 0 \ BnZr 1 \ + 2 ( gure 4). 12 (G 1 ;G 2 ) between the lines  0 1 and  0 2 . 10 G G2 11/a 0 -1/d ad>bc ad=bc ad 0. a > 0;d < 0; bc < 0. If adBnZrbc < 0, the branches of  0 1 and  0 2 intersect on the line 0 . As adBnZrbc grows, the meeting point movesup on the line 0 . When ad = bc, the branches deformintothe lines and as adBnZrbc > 0 grows further, the two branches move towards the axis G 1 = 0;G 2 = 0. If D(G 1 ;G 2 ) > 0, the inequalities to be solved depend on the sign of aG 1 + dG 2 . Following the same reasoning as in the proof of Theorem 1, we conclude that the relevant (G 1 ;G 2 ) lie in [ + 1 \ BnZr 1 \( + [ 0 )][[ + 2 \ + 2 \ BnZr ]: When adBnZrbc < 0, the branches of  0 1 and  0 2 intersect on the line 0 in (1=a;1)(1=jdj;1). As ad BnZr bc grows, the meeting point moves up on the line 0 . When ad = bc, the branches deform into the lines and as adBnZrbc > 0 grows further, the two branches move towards the axis G 1 = 0;G 2 = 0 ( gure 4). In the proof of Theorem 1, we have seen that if a > 0;d < 0 and bc > 0, then all (G 1 ;G 2 ) 2 (0;1=~a)(0;1=j ~ dj) potentiallycorrespond to attractive xed points of (2) ( gure 2). In the proof of the last Theorem it was shown that when a > 0;d < 0;bc < 0, if 2ajdjjbcj, then (1=a;1=jdj) is on or under the right branch of  0 and each (G 1 ;G 2 ) 2 (0;1=a)(0;1=jdj) potentially corresponds to an attractive xed point of (2). Hence, the following Theorem can be formulated: Theorem 3: If ad < 0 and  bc > 0, then every xed point (x;y) of (2) such that (x;y) 2  0; 1 j~aj    0; 1 j ~ dj  is attractive.  bc < 0 with jadj jbcj=2, then each xed point (x;y) of (2) satisfying (x;y) 2  0; 1 jaj    0; 1 jdj  11 is attractive. Now, we transform our results into the (x;y){space of neuron activations. For u > 4, de ne (u) = 1 2 r 1BnZr 4 u : In Theorems 1, 2 and 3 a structure re ecting stabilitytypes of the xed points of (2) was introduced into the (G 1 ;G 2 ){space. The region (0;1=4] 2 in (G 1 ;G 2 ){plane corresponds to four regions  0; 1 2  2 ;  0; 1 2    1 2 ;1  ;  1 2 ;1    0; 1 2  ; and  1 2 ;1  2 : in the (x;y){space. In particular, for each (G 1 ;G 2 ) 2 (0;1=4] 2 , under the map , there are four preimages (x;y) =  BnZr1 (G 1 ;G 2 ) =  1 2   1 G 1  ; 1 2   1 G 2  : (19) Results formulated in Theorems 1, 2 and 3 can now be stated for the space of activations of recurrent neurons. For > 4; > 4, the regions of the (x;y){space  0; 1 2 BnZr( )    0; 1 2 BnZr()  ;  1 2 BnZr( ); 1 2    0; 1 2 BnZr()  [  0; 1 2 BnZr( )    1 2 BnZr(); 1 2  and  1 2 BnZr( ); 1 2    1 2 BnZr(); 1 2  are denoted byR A 00 ( ;);R S 00 ( ;)andR R 00 ( ;)respectively. RegionssymmetricaltoR A 00 ( ;);R S 00 ( ;) and R R 00 ( ;) with respect to the line x = 1=2 are denoted by R A 10 ( ;);R S 10 ( ;) and R R 10 ( ;) respectively. Similarly, let R A 01 ( ;);R S 01 ( ;) and R R 01 ( ;) denote the regions symmetrical to R A 00 ( ;);R S 00 ( ;) and R R 00 ( ;) with respect to the line y = 1=2. Finally, R A 11 ( ;);R S 11 ( ;) and R R 11 ( ;)denote regions that are symmetricalto R A 01 ( ;);R S 01 ( ;) andR R 01 ( ;)with respect to the line x = 1=2 ( gure 5). Corollary1: If bc > 0;jaj> 4;jdj> 4, then all attractive xed points of (2) lie in S i2I R A i (jaj;jdj), where I is the index set I = f00;10;01;11g. Corollary 2: If bc < 0; ad < 0; jaj > 4; jdj > 4 and jadj jbcj=2, then all xed points of (2) lying in S i2I R A i (jaj;jdj); I = f00;10;01;11g are attractive. Corollary 3: If j~aj;j ~ dj > 4 and one of the following conditions is satis ed  bc > 0 and ad < 0  bc < 0 and ad > 0 12 AR R R R R R R R R R R R R R R R A A A 00 00 00 00 01 010.5+?(?) 01 01 11 11 11 11 10 10 10 10 S S S S S S SS R RR R 0 1 1 0.5 0.50.5??(?) 0.5+?(?) 0.5??(?) Figure 5: Partitioning of the network state space according to stability types of the xed points.  bc < 0;ad < 0 and jadj jbcj=2 then all xed points of (2) lying in S i2I R A i (j~aj;j ~ dj); I = f00;10;01;11g are attractive. For an insight into a bifurcation mechanism(explored in section 5) by which attractive xed points of (2) are created (or dismissed), it is useful to have an idea where other types of xed points can lie. For the case when both neurons are either self-exciting, or self-inhibiting ( ad > 0), and their mutual interaction is of the same character (bc > 0), we have the following theorem: Theorem 4: Suppose ad > 0;bc > 0;jaj > 4;jdj > 4. Then the following can be said about the xed points of (2):  attractive points can lie only in S i2I R A i (jaj;jdj); I = f00;10;01;11g.  if ad  bc=2, then all xed points in S i2I R S i (jaj;jdj) are saddle points; repulsive points can lie only in S i2I R R i (jaj;jdj).  if jadBnZrbcj < 4minfjaj;jdjg, then there are no repellors. Proof: Regions for attractive xed points result from Corollary 1. Consider rst the case a;d > 0. A xed point (x;y) of (2) is a saddle if j 2 j < 1 and j 1 j =  1 > 1. Assume ad > bc. Then 0 < p (aG 1 + dG 2 ) 2 BnZr4G 1 G 2 (adBnZrbc) = p D(G 1 ;G 2 ) < aG 1 +dG 2 : 13 It follows that if aG 1 +dG 2 < 2, i.e. (G 1 ;G 2 ) 2  BnZr 1 , 0 < aG 1 +dG 2 BnZr p D(G 1 ;G 2 ) < 2 holds and 0 <  2 < 1. For (G 1 ;G 2 ) 2  0 1 [ + 1 , we solve the inequality aG 1 +dG 2 BnZr p D(G 1 ;G 2 ) < 2, that is satis ed by (G 1 ;G 2 ) from  BnZr 1 \( 0 1 [ + 1 ). It can be seen ( gure 1) that in all xed points (x;y) of (2) with (x;y) 2  0; 1 4    0;min  1 ~ d ; 1 4  [  0;min  1 ~a ; 1 4    0; 1 4  ; the eigenvalue  2 > 0 is less than 1. This is certainly true for all (x;y) such that (x;y) 2 (0;1=4]  (0;1=d) [ (0;1=a)  (0;1=4]. In particular, the preimages of (G 1 ;G 2 ) 2 (1=a;1=4]  (0;1=d) [ (0;1=a)  (1=d;1=4] under  de ne the region S i2I R S i (a;d) where only saddle xed points of (2) can lie. Fixed points (x;y) whose images under  lie in  + 1 \  + 1 are repellors. No (G 1 ;G 2 ) can lie in that region, if ~a; ~ d  4, that is, if d(a BnZr 4)  bc and a(d BnZr 4)  bc, which is equivalent to maxfa(dBnZr4);d(aBnZr4)g  bc. In the case ad = bc, we have p D(G 1 ;G 2 ) = aG 1 + dG 2 and so  2 = 0. Hence, there are no repelling points if ad = bc. Assume ad < bc. Then p D(G 1 ;G 2 ) > aG 1 +dG 2 , which implies that  2 is negative. It follows that the inequality to be solved is aG 1 +dG 2 BnZr p D(G 1 ;G 2 ) > BnZr2. It is satis ed by (G 1 ;G 2 ) from  + 2 . If 2ad  bc, for the coecients of  0 2 we have j~aj a and j ~ dj d. Fixed points (x;y) with (x;y) 2  0; 1 4    0;min  1 j ~ dj ; 1 4  [  0;min  1 j~aj ; 1 4    0; 1 4  ; have j 2 j less than 1. If 2ad bc, this is true for all (x;y) such that (x;y) 2 (0;1=4](0;1=d)[ (0;1=a)(0;1=4] and the preimages of (G 1 ;G 2 ) 2 (1=a;1=4](0;1=d)[(0;1=a)(1=d;1=4]under  de ne the region S i2I R S i (a;d) where only saddle xed points of (2) can lie. There are no repellors if j~aj;j ~ dj 4, that is, if minfa(d+ 4);d(a+4)g  bc. If we examined the case a;d < 0 in the same spirit as the case a;d > 0 we would conclude that  if ad > bc, in all xed points (x;y) of (2) with (x;y) 2  0; 1 4    0;min  1 j ~ dj ; 1 4  [  0;min  1 j~aj ; 1 4    0; 1 4  ; j 1 j < 1. Surely, this is true for all (x;y) such that (x;y) 2 (0;1=4](0;1=jdj)[(0;1=jaj) (0;1=4]. The preimages of (G 1 ;G 2 ) 2 (1=jaj;1=4](0;1=jdj)[(0;1=jaj)(1=jdj;1=4] under  de ne the region S i2I R S i (jaj;jdj) where only saddle xed points of (2) can lie. There are no repellors if j~aj;j ~ dj 4, that is, if jdj(jajBnZr4)  bc and jaj(jdjBnZr4)  bc, which is equivalent to maxfjaj(jdjBnZr4);jdj(jajBnZr4)g bc.  in the case ad = bc, we have p D(G 1 ;G 2 ) = jaG 1 + dG 2 j and so  1 = 0. Hence, there are no repelling points.  if ad < bc, in all xed points (x;y) with (x;y) 2  0; 1 4    0;min  1 ~ d ; 1 4  [  0;min  1 ~a ; 1 4    0; 1 4  ; 14  1 13 is less than 1. If 2ad  bc, this is true for all (x;y) such that (x;y) 2 (0;1=4] (0;1=jdj) [ (0;1=jaj)  (0;1=4] and the preimages of (G 1 ;G 2 ) 2 (1=jaj;1=4] (0;1=jdj)[ (0;1=jaj) (1=jdj;1=4] under  de ne the region S i2I R S i (jaj;jdj) where only saddle xed points of (2) can lie. There are no repellors if ~a; ~ d  4, that is, if minfjaj(jdj+4);jdj(jaj+ 4)g bc. In general, we have shown that if  ad < bc and ad+4minfjaj;jdjg bc, or  ad = bc, or  ad > bc and adBnZr4minfjaj;jdjg bc, then there are no repellors. 4 Quantitative analysis In this section we are concerned with the actual position of xed points of (2). We study, how the coecients a;b;t 1 ;c;d and t 2 e ect the position and the number of the xed points. It is illustrative rst to concentrate on a single neuron from a pair of neurons. Denote the values of the weights associated with the self{loop of the selected neuron and with the interconnection link from the other neuron to the selected neuron by s and r respectively. The constant input to the selected neuron is denoted by t. If the activations of the selected neuron and the other neuron are u and v respectively, then the activation of the selected neuron at the next time step is 14 g(su+rv+t). If the activation of the selected neuron is not to change, (u;v) should lie on the curve f s;r;t : v = f s;r;t (u) = 1 r  BnZrtBnZrsu+ ln u 1BnZru  : (20) ln(u=(1BnZru)): (0;1) !<, is a monotonically increasing function with lim u!0 + ln u 1BnZru = BnZr1 and lim u!1 BnZr ln u 1BnZru = 1: The linear function BnZrsu + t cannot in uence these assymptotical properties, it can, however, locally in uence the \shape" of f s;r;t . In particular while the e ect of the constant term BnZrt is just a vertical shift of the whole function, BnZrsu (if decreasing, i.e. if s > 0, and \suciently large" ) has the power to overcome for a while the increasing tendencies of ln(u=(1BnZru)). More precisely, if s > 4 then the term BnZrsu causes the function BnZrsuBnZrt + ln(u=(1BnZru)) to \bend" so that on  1 2 BnZr(s); 1 2 + (s)  it is decreasing, while it still increases on  0; 1 2 BnZr(s)  [  1 2 +(s);1  : BnZrsuBnZrt+ln(u=(1BnZru)) is always concave and convex on (0;1=2) and (1=2;1) respectively. Finally, the coecient r scales the whole function and ips it around the u{axis, if r < 0. A graph of f s;r;t (u) is presented in gure 6. 13  1 is positive 14 recall, that g the sigmoid function g(`) = 1=(1+ e BnZr` ) 15 10 0.5+?(0.5 u v f s,r,t(u ) u))u0.5??( Figure 6: Graph of f s;r;t (u). Solid and dotted lines represent the cases t;s = 0;r > 0 and t = 0;s > 4;r > 0. Dashed line shows the graph when t < 0;s > 4 and r > 0. Negative external input t shifts the bended part into v > 0. We characterize the neurons according to the sign of weights of the links stemmingout of them. A neuron is said to be greedy if it self{excites itself, but inhibits the other neuron (the weight of the link to the other neuron is negative). A neuron is said to be altruistic if the opposite is true, i.e. if it self{inhibits itself, but excites the other neuron. An enthusiastic neuron excites both itself and the other neuron, while a depressed neuron inhibits everything including itself. There are ( 4 2 )+4 = 10 possible cases of the coexistence of the two neurons. A xed point represents a \compromise" achieved by both neurons in that the state of the system, once in a xed point, does not change. Of course, just as with xed points, the compromise can be characterized by various forms of stability. Based on the results from the previous section, in some cases we are able to predict the stability type of the xed points of (2) according to their position in the neurons' activation space. Each xed point of (2) lies on the intersection of two curves y = f a;b;t 1 (x);x = f d;c;t 2 (y). We present some illustrative examples of the analysis of the position and the number of xed points of (2) based on the characterization of neurons proposed above. Other cases would be analyzed in a similar manner. The external inputs are treated as arti cial means to externally control the state of the system and the discussion of each case starts with an assumption that t 1 ;t 2 = 0. Signs of the coecients a;b;c;d are marked by + (if positive) and BnZr (if negative). both neurons are enthusiastic: (a;b;c;d)= (+;+;+;+) ( gure 7)  (t 1 ;t 2 = 0) Since a;d > 0, both functions f a;b;t 1 and f d;c;t 2 can \bend", but they bend before running into positive values (they bend outside (0;1) 2 ). Since f a;b;t 1 and f d;c;t 2 pass only through (1=2;1)(0;1) and (0;1)(1=2;1) respectively, a xed point only occurs in (1=2;1) 2 , the region 16 y 0 1 1 x Figure 7: f a;b;t 1 and f d;c;t 2 when both neurons are enthusiastic. of high activity of both neurons (f a;b;t 1 shown as a dashed line). There is no way to create a xed point in a region, say, (0;1=2)(1=2;1) corresponding to a state where the second neuron dominates over the rst neuron.  (t 1 ;t 2 6= 0) The only way to achieve the situation described above would be to use a large negative external input t 1 to the rst neuron that would move the graph of f a;b;t 1 up, so that it intersects f d;c;t 2 in (0;1=2)(1=2;1). If we arti cially inhibited the rst neuron by the external input too much, there may no longer be a xed point in (1=2;1) 2 (f a;b;t 1 shown as a dotted line). However, if the self-excitation loop of the rst neuron is strong enough, the bended shape of f a;b;t 1 can retain a xed point in (1=2;1) 2 in spite of the external inhibition of the rst neuron (f a;b;t 1 shown as a solid line). both neurons are depressed: (a;b;c;d)= (BnZr;BnZr;BnZr;BnZr)  (t 1 ;t 2 = 0) Since a;d < 0, neither of the functions f a;b;t 1 and f d;c;t 2 can \bend". f a;b;t 1 and f d;c;t 2 pass through (0;1=2)(0;1) and (0;1)(0;1=2) respectively. A xed point occurs in (0;1=2) 2 , the region of low activity of both neurons.  (t 1 ;t 2 6= 0) Positive external input to a neuron can, however, shift the xed point towards high activity of that neuron. an enthusiastic and a greedy neurons: (a;b;c;d) = (+;BnZr;+;+) ( gure 8)  (t 1 ;t 2 = 0) f d;c;t 2 passes only through (0;1)  (1=2;1). The rst (enthusiastic) neuron pays for being generous (it excites the second, greedy neuron) and there is no possibility of creating a xed point in (1=2;1) (0;1=2), the region of dominance of the enthusiastic neuron. Besides the possibility that there is a xed point in (0;1) (1=2;1) which may be close to either of the vertices 15 (0;1) or (1;1), there is a chance of having xed points near both vertices (in fact there will be another xed point \between" them) at one time. This can be achieved by a \cooperation" 15 depending on how strong is the self{loop of the rst neuron; f a;b;t 1 is shown as a dashed and a dotted lines 17 y 0 1 1 x Figure 8: f a;b;t 1 and f d;c;t 2 in the case of an enthusiastic and a greedy neurons. between the two neurons, in that the self{loop of the rst neuron and the inhibition link from the second neuron have to have the \right" weights (they are neither too weak, nor too strong), so that the bended function f a;b;t 1 intersects f d;c;t 2 near both of the vertices (f a;b;t 1 shown as a solid line).  (t 1 ;t 2 6= 0) An interesting situation arises when the greedy neuron is externally inhibited, but has a strong self{loop, so that the bended part of f d;c;t 2 gets into (0;1) 2 . Nine xed points can be created. In general, a necessary condition on weights so that nine xed points can exist is that the weights a;d on the self{loops are positive. This enables both functions f a;b;t 1 and f d;c;t 2 to \bend", and by moving the bended parts into (0;1) 2 , create a complex intersection pattern. a greedy and an altruistic neurons: (a;b;c;d)= (+;+;BnZr;BnZr)  (t 1 ;t 2 = 0) Only a single xed point in (1=2;1) (0;1=2) can exist. Everything is in control of the greedy neuron.  (t 1 ;t 2 6= 0) By externally inhibiting the greedy neuron (moving the bended part of f a;b;t 1 upwards into (0;1) 2 ) more xed points can be created. A strong external excitation of the altruistic neuron moves xed points into (0;1)(1=2;1). There is even a possibility of creating a single xed point in the region of dominance of the altruistic neuron ( (0;1=2)(1=2;1) ), if the greedy and altruistic neurons are strongly externally inhibited and excited respectively, but then the system is totally controlled by the external forces. both neurons are greedy: (a;b;c;d) = (+;BnZr;BnZr;+)  (t 1 ;t 2 = 0) Generally, if there are no external inputs, the case of two greedy neurons is the only case when there can be xed points in the regions (0;1=2)(1=2;1), (1=2;1)(0;1=2) and (1=2;1)  (1=2;1) at one time. Even though the neurons inhibit each other, they can increase their self-excitation and through bended functions f a;b;t 1 and f d;c;t 2 introduce xed points near the vertices (1;0) and (0;1) representing \winning"states of the rst and second neuron respectively. 18 If they, moreover, \decide to cooperate" by not self-exciting themselves and inhibiting each other too much, a third xed point near the high-activation vertex (1;1) can be created. Hence, to create the most complex intersection pattern of f a;b;t 1 and f d;c;t 2 without external inputs, the two neurons should be \reasonably" greedy. 5 Creation of a new attractive xed point through saddle node bifurcation In this section we bring together the results fromthe last two sections. Normally,to detect stability types of the xed points of (2), we would compute the position of the xed points (which cannot, in general, be done analytically) and then linearize the system (2) in those xed points, or directly use results of the section 3, where we have structured the network state space (0;1) 2 into areas where xed points of particular stability types can lie. Fortunately, in some cases, these areas correspond to monotonicity intervals of the functions f a;b;t 1 and f d;c;t 2 de ning the xed points. The reasoning about the stability type of the xed points can be based on the knowledge of where the functions intersect. In this respect, the results of the section 3 will be useful when the neurons are enthusiastic or greedy, with a strong tendency to self excite themselves so that the functions f a;b;t 1 and f d;c;t 2 \bend", thus creating a possibility of complex intersection pattern in (0;1) 2 . For a > 4, denote the set  (x;f a;b;t 1 (x))j x2  0; 1 2 BnZr(a)  of points lying on the \ rst outer branch" of f a;b;t 1 (x) by f #0 a;b;t 1 . Analogically, the set of points  (x;f a;b;t 1 (x))j x2  0; 1 2 +(a)  in the \second outer branch" of f a;b;t 1 (x) is denoted by f #1 a;b;t 1 . Finally, let f  a;b;t 1 denote the set of points  (x;f a;b;t 1 (x))j x2  1 2 BnZr(a); 1 2 + (a)  on the \middle branch" of f a;b;t 1 (x). Similarly, for d > 4, f #0 d;c;t 2 ;f #1 d;c;t 2 and f  d;c;t 2 are used to denote the sets  (f d;c;t 2 (y);y)j y2  0; 1 2 BnZr(d)  ;  (f d;c;t 2 (y);y)j y2  1 2 + (d);1  ; and  (f d;c;t 2 (y);y)j y2  1 2 BnZr(d); 1 2 + (d)  respectively. Using the Theorem 4 we state the following corollary: Corollary 4: Assume that each of the neurons is either enthusiastic, or greedy and ad  bc=2. Then, attractive xed points of (2) can lie only on the intersection of the outer branches of f a;b;t 1 19 2 1 1 0 f f x y (x) (y) a,b,t d,c,t 1 Figure 9: Geometrical illustration of saddle-node bifurcation in a recurrent neural network with two state neurons. Saddle and attractive points are marked with squares and circles respectively. a;d > 0; b;c < 0. and f d;c;t 2 . Whenever the the middle branch of f a;b;t 1 intersects with an outer branch of f d;c;t 2 (or vice-versa), it corresponds to a saddle point of (2). In particular, all attractive xed points of (2) are from [ i;j=0;1 f #i a;b;t 1 \f #j d;c;t 2 : Every point from f  a;b;t 1 \ [ i=0;1 f #i d;c;t 2 ; or f  d;c;t 2 \ [ i=0;1 f #i a;b;t 1 is a saddle point of (2). When both neurons self{excite themselves, Corollary 4 suggests that the usual scenario of creation of a new attractive xed point is that typical of the saddle-node bifurcation in which a pair attractive + saddle xed point is created. Attractive xed points disappear in a reverse manner: an attractive point coalesces with with a saddle and they are annihilated. This is illustrated in gure 9. f d;c;t 2 (y) shown as dashed curve intersects f a;b;t 1 (x) in three points. By increasing d, f d;c;t 2 bends further (solid curve) and intersects with f a;b;t 1 in ve points 16 . Saddle and attractive points are marked with squares and circles respectively. Note that as d increases attractive xed points move closer to vertices f0;1g 2 . 16 At the same time, jcj has to be also appropriately increased so as to compensate for the increase in d so that the \bended" part of f d;c does not move radically to higher values of x. 20 This tendency, in the context of networks with exclusively self-exciting (or exclusively self- inhibiting) recurrent neurons, is discussed in [13]. Our result stated in Corollary 1, assumes two{neuron recurrent network. It only requires that the neurons have the same mutual interaction pattern (bc > 0) and gives a lower bound on the rate of convergence of the attractive xed points of (2) towards some of the vertices f0;1g 2 , as the absolute values of weights on the self{loops grow. Corollary 1.1: Assume bc > 0;jaj > 4;jdj > 4. Then all attractive xed points of (2) lie in the "-neighborhood of vertices of unit square, where " = s  1 2 BnZr(jaj)  2 +  1 2 BnZr(jdj)  2 : 6 Conclusion The regions corresponding to stability types of xed points of a two-neuron recurrent neural net- work were described based on the weight matrix of the network. The position of xed points was investigated in the context of intersections of functions de ning their x- and y-coordinates. It was shown that there is a correspondence between the stability regions for xed points and monotonic- ity intervals of functions de ning their position. When both neurons self-excite themselves and have the same mutual-interaction pattern, a new attractive xed point is created through saddle node bifurcation. Assuming the same mutual interaction pattern between neurons, we give a lower bound on the rate of convergence of the attractive xed points towards the saturated activation values, as the absolute values of weights on the self{loops grow. Our ultimate goal is to extend the issues studied in this paper to a general case of n-neuron recurrent neural network. It is to be seen whether the reasoning in the space of derivatives of the sigmoid transfer function with respect to the weighted sum of neuron inputs, can be simpli ed to a more straightforward analysis of xed point stability regions (as opposed to the case-analysis used in the proofs of this paper). As explained in the introduction, training process during which recurrent neural networks learn to act as nite state machines can be interpreted from the point of view of bifurcation analysis [18]. Often, loops in state transition diagramof the nite state machine being learned are represented as attractive xed points of the network. Understaning the xed point potential of recurrent neural networks (number, stability, bifurcations of xed points) can bring some light into the problem of neural complexity of nite state machines which (to our knowledge) has not been satisfactorily solved so far (see [1], [15] and [20]). Neural complexityof a nite state machinecan be characterized as the minimal number of neurons needed so that the network can mimic the nite state machine. References [1] N. Alon, A.K. Dewdney, and T.J. Ott. Ecient simulation of nite automata by neural nets. Journal of the Association of Computing Machinery, 38(2):495{514, 1991. [2] H. Anton. Calculus with analytic geometry. John Wiley and Sons, New York, NY, 1980. 21 [3] R.D. Beer. On the dynamics of small continuous{time recurrent networks. Technical Report CES{94{18, Case Western Reserve University, Cleveland, OH, 1994. [4] E.K. Blum and X. Wang. Stability of xed points and periodic orbits and bifurcations in analog neural networks. Neural Networks, (5):577{587, 1992. [5] M.P. Casey. Computation in Discrete-Time Dynamical Systems. PhD thesis, University of California, San Diego, Department of Mathematics, March 1995. [6] M.P.Casey. Relaxingthe symmetricweight condition forconvergent dynamicsin discrete-time recurrent networks. Technical Report INC-9504, Institute for Neural Computation,University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0112, 1995. [7] A. Cleeremans, D. Servan-Schreiber, and J.L. McClelland. Finite state automata and simple recurrent networks. Neural Computation, 1(3):372{381, 1989. [8] F. Cummins. Representation of temporal patterns in recurrent networks. Submitted to the 15th Annual Conference of the Cognitive Science Society, 1993. [9] S. Das and M.C. Mozer. A uni ed gradient{descent/clustering architecture for nite state machine induction. In J.D. Cowen, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 19{26. Morgan Kaufmann, 1994. [10] K. Doya. Bifurcations in the learning of recurrent neural networks. In Proc. of 1992 IEEE Int. Symposium on Circuits and Systems, pages 2777{2780, 1992. [11] J.L. Elman. Finding structure in time. Cognitive Science, 14:179{211, 1990. [12] C.L.Giles, C.B. Miller, D. Chen, H.H. Chen, G.Z.Sun, and Y.C. Lee. Learning and extracting nite state automata with second{order recurrent neural networks. Neural Computation, 4(3):393{405, 1992. [13] M.W. Hirsch. Saturation at high gain in discrete time recurrent networks. Neural Networks, 7(3):449{453, 1994. [14] J.J. Hop eld. Neurons with a graded response have collective computational properties like those of two{state neurons. Proceedings of the National Academy of Science USA, 81:3088{ 3092, May 1984. [15] B.G. Horne and D.R. Hush. Bounds on the complexity of recurrent neural network imple- mentations of nite state machines. In J.D. Cowen, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 359{366. Morgan Kaufmann, 1994. Also submitted to Neural Networks. [16] L. Jin, P.N. Nikiforuk, and M.M. Gupta. Absolute stability conditions for discrete-time re- current neural networks. IEEE Transactions on Neural Networks, (6):954{963, 1994. [17] P. Manolios and R. Fanelli. First order recurrent neural networks and deterministic nite state automata. Neural Computation, 6(6):1155{1173, 1994. [18] P. Tino, B.G. Horne, C.L. Giles, and P.C. Collingwood. Finite state machines and recurrent neural networks { automata and dynamical systems approaches. Technical Report UMIACS- TR-95-1, Institute for Advance Computer Studies, University of Maryland, College Park, MD 20742, 1995. 22 [19] P. Tino and J. Sajda. Learning and extracting initial mealy machines with a modular neural network model. Neural Computation, 7(4), 1995. [20] H.T. Siegelmann, E.D. Sontag, and C.L. Giles. The complexity of language recognition by neural networks. In J. van Leeuwen, editor, Algorithms, Software, Architecture (Proceedings of IFIP 12 th World Computer Congress), pages 329{335, Amsterdam, 1992. North{Holland. [21] R.L. Watrous and G.M. Kuhn. Induction of nite{state languages using second{order recur- rent networks. Neural Computation, 4(3):406{414, 1992. [22] Z. Zeng, R.M. Goodman, and P. Smyth. Learning nite state machines with self{clustering recurrent networks. Neural Computation, 5(6):976{990, 1993. 23