ABSTRACT Title of dissertation: DYNAMIC PANEL DATA MODELS WITH SPATIALLY CORRELATED DISTURBANCES Jan Mutl, Doctor of Philosophy, 2006 Dissertation directed by: Professor Ingmar Prucha Department of Economics This thesis considers a dynamic panel data model with error components that are correlated both spatially (cross-sectionally) and time-wise. The model extends the literature on dynamic panel data models with cross-sectionally independent error components. The model for spatial dependence is a Cliff-Ord type model. We introduce a three step estimation procedure and give formal large sample results for the case of a finite time dimension. In particular, we show that a simple first stage instrumental variable (IV) estimator, that ignores the spatial correlation of the errors, is consistent and ?N-consistent, where N denotes the cross-sectional dimension. We then extend the generalized moments estimator introduced by Kelejian and Prucha (1999) for estimating the spatial autoregressive parameter and show that if it is based on a ?N-consistently estimated disturbances, it will also be consistent. Finally, we derive a large sample distribution of a second stage generalized method of moments (GMM) estimator based on a consistent estimator of the spatial autoregressive parameter. We also present results from a small Monte Carlo study to illustrate the small sample performance of the proposed estimation procedure. JEL Classification and Keywords: Cross-Sectional Models; Spatial Models (C21), Models with Panel Data (C23); Dynamic panels, Spatial Autocorrelation DYNAMIC PANEL DATA MODELS WITH SPATIALLY CORRELATED DISTURBANCES by Jan Mutl Dissertation submited to the Faculty of the Graduate School of the University of Maryland, College Park in partial fullfilment of the requirements for the degree of Doctor of Philosophy 2006 Advisory Committee: Professor Ingmar Prucha, Chair Professor John Chao, Co-Chair Professor Harry Kelejian Professor John Rust Professor Francis Alt Contents 1Introduction 1 2ReviewofLiterature 3 2.1 Dynamic Panel Data Models.................... 3 2.1.1 GMM Estimation . . .................... 6 2.1.2 Bias Correction . . . .................... 20 2.1.3 MDandMLEstimation .................. 21 2.2 ModellingCross-SectionalDependence .............. 32 2.2.1 Model Specifications.................... 34 2.2.2 Estimation ......................... 40 2.3 Space-Time Models . . . . ..................... 44 2.3.1 Space-Time Autoregressive Moving Average (STARMA)Models..................... 46 2.3.2 Models with Contemporaneous Spatial Correlation . . . . 50 3Model 58 3.1 Model Specification......................... 59 3.2 Model Implications . . . . ..................... 63 4 Estimation and Inference 68 4.1 Initial IV Estimation ........................ 68 4.2 Estimation of the Degree of SpatialAutocorrelation........ 79 4.3 SecondStageGMMEstimation .................. 87 ii 4.3.1 OptimalWeightingMatrix................. 94 4.3.2 FeasibleGMMEstimator.................. 97 5 Monte Carlo Study 106 5.1 Estimators Considered . ......................106 5.1.1 Initial Estimators . .....................107 5.1.2 SpatialParameterEstimators................110 5.1.3 SecondStageGMMEstimators ..............111 5.2 Data Generation . . . . . ......................113 5.3 Designs Considered . . . . .....................114 5.4 Tables of Results . . . . ......................117 5.5 Conclusions and Comparison with Other Studies . . . . . . . . . 118 6 Directions for Future Research 122 A Appendix: Central Limit Theorem for Vectors of Linear Quadratic Forms 123 B Appendix: Proof of Claims in Chapter 3 131 C Appendix: Proofs for Chapter 4 140 C.1 Proofs for Section 4.1 . . ......................142 C.2 Proofs for Section 4.2 . . ......................153 C.3 Proofs for Section 4.3 . . ......................159 D Appendix: Tables of Monte Carlo Results 182 iii E Appendix: Symbols and Notation Used 210 F Appendix: Inequalities 214 F.1 Deterministic Inequalities......................214 F.2 Stochastic Inequalities . ......................214 iv v LIST OF TABLES Table 1: Consistency of ML Estimation 28 Table 2: Estimators Considered 110 Table D1: Initial IV Estimators of ? 183 Table D2: Second Stage GMM Estimators of ? 188 Table D3: Unweighted Spatial GM Estimators of ? 193 Table D4: Weighted Spatial GM Estimators of ? 198 vi LIST OF FIGURES Figure 1: QQ Plot of IV Estimator AH1 203 Figure 2: QQ Plot of IV Estimator AH2 203 Figure 3: QQ Plot of IV Estimator AB 204 Figure 4: QQ Plot of GMM Estimator AB Ignoring Spatial Correlation 204 Figure 5: QQ Plot of GMM Estimator AB based on V mix 205 Figure 6: QQ Plot of GMM Estimator AB based on V E 205 Figure 7: Normal Probabilty QQ Plot 206 Figure 8: Student t Probability QQ Plot 206 1Introduction This thesisconsiders estimationof paneldata modelswhenthe dependentvariable isallowedtobecorrelatedinbothdimensions. Usinganaturalterminology,I investigatemodelsinwhichthereiscorrelation both across time and between the cross-sectional units. Although there might be many ways to write down such model, Ichoosetoconcentrateonconcretespecificationthatarisesasanextension of the existing literature on dynamic panel data models and on spatial modelling. In doing so, I hope to offer a useful synthesis of the two strands of the literature. My model is applicable to situations where the number of time periods over which the data are observed is limited. 1 In the next chapter, I review the existing literature related to this topic. I first focus on theoretical contributions to dynamic panels estimation methods, then briefly outline the specifications used in spatial econometrics, and close with a review of papers that have used specifications in which time and space are inter- actinginanontrivialway. Chapter 3 will then spell out the specification I chose to concentrate on. It will also provide the generalassumptions maintained throughout thethesis and discuss some implication of the model. In Chapter 4, I provide an outline of several estimation methods and provide a formal statements of their asymptotic properties. I start with an initial instru- 1 Of course, if the time dimension of the panel is sufficiently large, one can consider, for ex- ample, a seemingly unrelated regression model that allows for a fairly general specification of the correlation pattern in the cross-sectional dimension. 1 mental variable (IV) technique suggested by Anderson and Hsiao (1981) to esti- mate the slope coefficients of the model. Although this method ignores possible cross-sectional correlation in the data, I show that it is still consistent and asymp- totically normal under the specification considered in this thesis. Next, I outline a spatial generalized moments estimation technique that estimates the degree of cross-sectional dependence in the disturbances. The method was suggested by Kapoor et al. (2005) for a static model and is based on Kelejian and Prucha (1999). I extend the proofs in Kapoor et al. (2005) for the dynamic case. The last step of the proposed estimation method consists of a generalized method of mo- ments (GMM) estimation of the slope coefficients. I discuss the optimal choice weighting matrix for a given set of moment conditions. I provide formal large sample results for a generic GMM estimator based on linear moment conditions with stochastic instruments. I also provide formal large sample properties of a feasible GMM estimator and its small sample covariance matrix approximation. In Chapter 5, I investigate small sample properties of the different estimation method via a Monte Carlo study. I also provide some simulation evidence that supports the formal large sample claims made in the thesis. 2 2ReviewofLiterature The purpose of this review is not to provide a comprehensive treatment of the econometric work that has been done on panel data methods. For such there are excellent book-length works, such as Hsiao (2003) or Baltagi (2002). Instead, I will provide a more in depth review of the theoretical work that has been done on dynamic panel data models on the one hand and then review the literature relaxing the assumption of independently and identically distributed (iid) errors both in panel and purely cross-sectional setting. It proves to be useful to introduce the following notational conventions: I use bold letters for matrices and vectors, and regular font letters to denote scalars. Furthermore, I use lower case letters for vectors and upper case letters for matri- ces. In general I will denote the cross-sectional dimension of the panel as N and thetimedimensionasT. 2.1 Dynamic Panel Data Models Models with individual effects and limited time dimension face the problem of in- cidental parameters. Hence these are estimated after a suitable transformation that removes the individual effects. In most cases this would be after first differencing. If the model also includes a lagged endogenous variable, the first difference of the error term will then be correlated with the explanatory variables. It has been long recognized in the literature that in this situation, the ordinary least squares (OLS) estimator will be biased, see, e.g., Trognon (1978) for an analytical treatment, 3 or Nerlove (1967 and 1971) who explores the properties of the bias of the OLS estimation by Monte Carlo work. Trognon (1978), Nickell (1981) and Sevestre and Trognon (1985) derive analytical expressions for the asymptotic biases of the OLS estimator of an autoregressive panel data models with fixed time dimension. Small sample bias correction has also been suggested by Kiviet (1995). The bias of the OLS estimation also resulted in attention to other estimation methods. Hence Anderson and Hsiao (1981, 1982) discuss maximum likelihood (ML) estimation of various model specifications and provide a comprehensive classification of the different conceptual possibilities of dynamic panel data mod- els. They also suggest a simple instrumental variables (IV) estimator that is con- sistent. Bhargava and Sargan (1983) provide a framework for maximum likeli- hood estimation for a panel with lagged dependent variable and individual effects. As an alternative, Chamberlain (1982) proposed a minimum distance (MD) type of estimator for distributed lag models with heterogenous coefficients. The subsequent developments have shifted attention to generalized method of moments (GMM) estimators that utilize linear moment conditions. The literature has focused on exploiting as many possible moment conditions while keeping the resulting GMM estimator linear. Most of the large sample results are usually backed by a reference to ?standard central limit theorems? or assumed to follow from the general results on the asymptotic properties of GMM estimators in, for example, Hansen(1982). The(non)optimalityofutilizingredundantmomentcon- ditions has also not been explored in detail. Papers in this line of research include Arellano and Bond (1991), Arellano and Bover (1995), Ahn and Schmidt (1995) 4 and Blundell and Bond (1998). The use of all lags as available instruments was suggestedbyHoltz-Eakin,NeweyandRosen (1988). Keane and Runkle (1992) provide an alternative method of exploiting the moment conditions. 2 Large sam- ple results for the GMM estimators are in Alvarez and Arellano (2003), while Harris and Tzavalis (1999) obtain the limiting distributions of pooled OLS, the within-group (WG) and WG with individual trends estimators, under the null of a unit root and normally distributed errors. Observe that, as noted by Kiviet (1995) and Judson and Owen (1999), the number of possible instruments used by the GMM estimators increases with T 2 , the GMM estimators may perform poorly in samples with moderate and large T. More recently several authors have proposed maximum likelihood and quasi- maximum likelihood (ML and QML) procedures arguing that these are compu- tationally feasible and providing some Monte Carlo evidence of improved small sample performance even for non-normal errors. See the papers by Hsiao, Pe- saran and Tahmicsioglu (2002) and Binder, Hsiao and Pesaran (2000) discussed below. Some further Monte Carlo evidence is provided by Binder, Hsiao, Mutl and Pesaran (2002). Below I will review papers on the GMM, bias corrected OLS, MD and ML estimation mentioned above and compare the various model specifications, as- sumptions on the disturbance process involved and estimation methods. When required, I modify the original notation to make the comparison feasible. 2 They propose to transform the model by a Cholesky decomposition of an initial estimate of the variance covariance matrix and use the untransformed instruments in the second step of the estimation. See below for a more detailed review. 5 2.1.1 GMM Estimation I will now review the papers proposing GMM type of estimators in more detail. The model under consideration can be written as y it = ?y i,t?1 +x it ?+u it ,t=1,..,T, i =1,...,N, (2.1.1) where y it and x it denote the (scalar) dependent variable and the 1 ? p vector of exogenous variables corresponding to cross sectional unit i in period t, ? and ? represent corresponding 1 ? 1 and p ? 1 parameters, and u it = ? i + ? it denotes the overall disturbance term consisting of individual effects ? i and an innovation ? it . Under different assumptions on the disturbance process we obtain different possible moment restrictions that are exploited by the estimator. The proposed estimator also differs under different exogeneity assumptions on the p? 1 vector of explanatory variables. Arellano and Bond (1991) assume that the error terms are distributed as ? i ?IID ? 0,? 2 ? ? , (2.1.2) and ? it ?IID ? 0,? 2 ? ? , (2.1.3) 6 independent of each other. 3 Because the disturbances as well as the endogenous variable contain individual effects, they will be correlated when interacted in lev- els. Therefore, the moment conditions considered involve first differences of the disturbances and in particular they are E[(u it ?u i,t?1 )y i,t?k ]=0,t=2,..,T, k =2,..,t?1 i =1,...,N, (2.1.4) and with strictly exogenous variables also E[x 0 is (u it ?u i,t?1 )] = 0 p?1 ,t=2,..,T, s =1,..,T i=1,...,N, (2.1.5) while with the variables being only predetermined these conditions hold only for s =1,..,t?1. Stacking the model by grouping the observation first by time and then by in- dividuals 4 we can write the first differenced model (after dropping the initial ob- servation) as ?y (T?1)N?1 = ?Z? (T?1)N?2 ? 2?1 + ?? (T?1)N?1 , (2.1.6) 3 These assumptions are not formally stated in the paper. However, the asymptotic claims are based on the iid assumptions. 4 This stacking is commonly used in the literature on dynamic panel. Observe, however, that we will use a different order of stacking in our model presented in later chapters. 7 where?Z =[?y ?1 ,X] with ?y = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? y 12 ?y 11 . . . y 1T ?y 1,T?1 . . . y N2 ?y N1 . . . y NT ?y N,T?1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , ?y ?1 = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? y 11 ?y 10 . . . y 1,T?1 ?y 1,T?2 . . . y N1 ?y N0 . . . y N,T?1 ?y N,T?2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , ?X = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? x 12 ?x 11 . . . x 1T ?x 1,T?1 . . . x N2 ?x N1 . . . x NT ?x N,T?1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , ?? = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 12 ?? 11 . . . ? 1T ?? 1,T?1 . . . ? N2 ?? N1 . . . ? NT ?? N,T?1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? . (2.1.7) We can define the matrix of instruments as H =(H 0 1 ,...,H 0 N ) 0 whereforthecase 8 of strictly exogenous variables we have H 0 i = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? y i0 x i1 . . . x i,T ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? y i0 y i2 x i1 . . . x i,T ? ? ? ? ? ? ? ? ? ? ? ? . . . ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? y i0 . . . y i,T?2 x i1 . . . x i,T ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? . (2.1.8) The proposed estimator is of the form b ? = ? ?Z 0 HA ?1 H 0 ?Z ? ?1 ?Z 0 HA ?1 H 0 ?y, (2.1.9) whereAis some weights matrix for the moments. More specifically, the first step 9 of the estimation uses a simple weighting matrix A = N X i=1 H 0 i DD 0 H i (2.1.10) = H 0 (I N ?DD 0 )H, where D is a T ?1 ?T first difference operator matrix: D = ? ? ? ? ? ? ? ? ? ?11 0??? 0 0 ?11 . . . . . . . . . . . . . . . . . . 0 0 ??? 0 ?11 ? ? ? ? ? ? ? ? ? T?1?T . (2.1.11) In thesecond step themoment conditions areweightedbytheir estimatedvariance covariance matrix and the authors propose to use A = N X i=1 H 0 i ?bu i ?bu 0 i H (2.1.12) = H 0 " I N ?D ? N X i=1 bu i bu 0 i ! D 0 # H, where ?bu i =(?bu i2 ,...,?bu iT ) 0 and bu i =(bu i1 ,...,bu iT ) 0 are the fitted residuals from the first step estimator. Arellano and Bover (1995) consider a general nonsingular transformation of the model that removes the individual effects. Consider again the model in (2.1.1) 10 and let K be any (T ?1) ? T transformation matrix of rank (T ?1) such that Ke T = 0 T?1 ,wheree T is a T ? 1 vector of ones. That is, the transformation by K is nonsingular and removes the individual effects. Hence K can, for example, be the matrix D considered above, or be equal to the ?Within Group? operator, with K = ? ? ? ? ? ? ? ? ? ? 1? 1 T ? ? 1 T ??? ? 1 T ? 1 T ? 1 T ? 1? 1 T ? ??? ? 1 T ? 1 T . . . . . . . . . . . . ? 1 T ? 1 T ??? ? 1? 1 T ? ? 1 T ? ? ? ? ? ? ? ? ? . (2.1.13) ArellanoandBover(1995)alsosuggesttheorthogonaldeviationsoperatordefined as: K = ? ? ? ? ? ? ? ? ? ? ? ? 1 ? 1 (T?1) ? 1 (T?1) ??? ? 1 (T?1) ? 1 (T?1) ? 1 (T?1) 01? 1 (T?2) ??? ? 1 (T?2) ? 1 (T?2) ? 1 (T?2) . . . . . . . . . . . . . . . . . . 00 0??? 1 ? 1 2 ? 1 2 00 0??? 01?1 ? ? ? ? ? ? ? ? ? ? ? ? . (2.1.14) Thistransformationsubtractsthemeanoffutureobservationsavailableinthesam- ple from the first T ?1 observations. The transformed model is then (I N ?K)y =(I N ?K)Z? +(I N ?K)?, (2.1.15) 11 If the transformation matrix is upper triangular and the disturbances ? it are not serially correlated, then the same moment conditions as consider by Arellano and Bond (1991) remain valid for the transformed model. Arellano and Bover (1995) then show that the resulting GMM estimator is in fact invariant to the choice of the transformation matrix. If the exogenous variables are uncorrelated with the individual effects, Arel- lano and Bover (1995) also suggest the use of additional moment conditions in the form of E "? 1 T T X t=1 u it ! x is # = 0 p?1 . (2.1.16) In this case the transformation matrix is appended with a row consisting of e T /T and can be denoted as: C = ? ? ? K e T /T ? ? ? . (2.1.17) 12 The instrument matrixH i becomes H 0 i = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? y i?0 x i1 . . . x i,T ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? y i0 y i2 x i1 . . . x i,T ? ? ? ? ? ? ? ? ? ? ? ? . . . ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? y i0 . . . y i,T?2 x i1 . . . x i,T ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? x i1 . . . x i,T ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? . (2.1.18) 13 The GMM estimator of Arellano and Bover (1995) can then be expressed as b ? = ? Z 0 (I N ?C 0 )HA ?1 H 0 (I N ?C)Z ? ?1 Z 0 (I N ?C 0 )HA ?1 H 0 (I N ?C)y. (2.1.19) The preliminary estimates are obtained with A = H 0 (I N ?CC 0 )H and the sec- ond stage estimator uses consistently with (2.1.12): A = H 0 " I N ?C ? N X i=1 bu i bu 0 i ! C 0 # H, (2.1.20) where bu i are the fitted residuals from the preliminary estimation. Given that the estimator is invariant to the choice of the transformation matrix, the filtering is in fact irrelevant and the estimator can be obtained by performing three stage least squares (3SLS). AhnandSchmidt(1995) show that there are additional moment conditions that can be exploited. Ahn and Schmidt also make weaker assumptions that lead to the set of moment restriction utilized by the Arellano and Bond (1991) and Arellano and Bover (1995) estimators. In particular, Ahn and Schmidt assume that the disturbances satisfy: Cov(? it ,y i0 )=0,t=1,..,T (2.1.21) Cov(? it ,? i )=0,t=1,..,T Cov(? it ,? is )=0,t,s=1,..,T;t 6= s 14 The additional moment conditions pointed out by Ahn and Schmidt are E[u iT (? it ?? i,t?1 )] = 0,t=2,..,T?1. (2.1.22) These restrictions, together with the moment conditions utilized by the Arellano and Bond (1991) estimator, represent all the moment conditions implied by the assumption that the innovations ? it are mutually uncorrelated among themselves and with ? i and y i0 . Ahn and Schmidt also point out that further restrictions can be derived from homogeneity and stationarity assumptions. The assumption that the innovations ? it have a variance that does not change over time implies the following additional moment restrictions: E[y i,t?2 ?? i,t?1 ?y i,t?1 ?? it ]=0,t=4,..,T. (2.1.23) In a model without exogenous variables the homogeneity restrictions can be im- plemented by utilizing the extended instrument set defined as H + i = ? ? ? ? ? ? ? ? ? ? ? ? H i y i2 ?y i3 y i3 ?y i4 . . . . . . y i,T?2 ?y i,T?3 ? ? ? ? ? ? ? ? ? ? ? ? , (2.1.24) 15 where H i is the Arellano and Bond instrument matrix for the case without exoge- nous variables, i.e. H i = ? ? ? ? ? ? ? ? ? y i0 y i0 y i1 . . . y i0 ??? y i,T?2 ? ? ? ? ? ? ? ? ? . (2.1.25) Ahn and Schmidt show that the GMM estimator based on the full set of mo- ment restrictions is asymptotically equivalent to Chamberlain?s (1982, 1984) opti- mal minimum distance estimator and that it reaches the semiparametric efficiency bound. Blundell and Bond (1998) document a potential gain in efficiency arising from exploiting restrictions on the initial observations when the time dimension of the panelissmallandthedegreeofautocorrelationishigh. Theestimationapproaches discussedsofarusuallydropthefirst observation. WithN going to infinity andT fixed this amounts to ignoring information from a fixedproportionofthesample andthuscanleadtosizeableinefficiency. In their simulation study Blundell and Bond consider two types of additional restrictions. The first type of restriction justifies the use of an extended linear GMM estimator that uses lagged differences of y it as instruments for equations in levels (in addition to lagged levels of y it as instruments for equations in first differences). The second type of restriction validates the use of the error compo- 16 nents GLS estimator on an extended model that conditions on the observed initial values. This provides a consistent estimator under homoscedasticity which, under normality, is asymptotically equivalent to conditional maximum likelihood (see also Blundell and Smith, 1991). Ina model without exogenous variables, Blundell and Bond show that after re- moving redundant restrictions the extended GMM estimator they consider utilizes the following instrument matrix: H ++ i = ? ? ? ? ? ? ? ? ? H + i ?y i2 . . . ?y i,T?1 ? ? ? ? ? ? ? ? ? , (2.1.26) where H + i is the instrument matrix employedbytheAnhandSchmidtestimator and is defined in (2.1.24) above. Their Monte Carlosimulations andasymptoticvariancecalculations showthat this extended GMM estimator offers considerable efficiency gains in situations where the basic GMM estimator performs poorly. The GLS estimator that con- ditions on the initial values is also found to have good finite sample properties. However, the conditional GLS estimator requires homoscedasticity, and only ex- tends to a model with regressors if the regressors are strictly exogenous which is not the case for the GMM estimators. The efficiency gain from incorporating the information in the initial observa- tionisalsodocumentedbyasimulation study of Hahn (1999). 17 Alvarez and Arellano (2002) consider the same model (2.1.1) with |?| < 1 and E(? it |? i ,y i0 ,...,y it?1 )=0.Theyassumey i0 is also observed. To derive asymp- totic results they assume that ? it for t =1,...,T and i =1,...,N are independent and identically distributed across time and individuals and independent of ? i and y i0 , with E(? it )=0, Var(? it )=? 2 and finite fourth moments. Additionally they assume that the initial observation are generated as y i0 = ? i 1?? + ? X j=0 ? j ? i,?j . (2.1.27) The article than establishes asymptotic properties of the ?Within Group? es- timator, the GMM estimator, and the Limited Information Maximum Likelihood (LIML) estimator when both T and N tend to infinity. The WG estimator can be obtained by OLS estimation on the model transformed by the forward orthogonal means transformation (see above Arellano and Bover, 1995). The GMM estima- tor in their terminology is what I describe above as the first stage GMM estimator on a model transformed by the orthogonal deviations transformation, using the moment conditions of Arellano and Bond (1991). The second stage GMM esti- mation with an estimated weighting matrix is not considered. Note that my results contain this extension as a special case. See Chapter 4. The LIML estimator is what has been suggested by Alonso-Borrego and Arel- lano (1999) as a symmetrically normalized GMM estimator. It can also be re- garded as a ?continuously updated GMM estimator? in terminology of Hansen, 18 Heaton and Yaron (1997). 5 The estimator is only an analogue LIML estimator in the sense of the minimax instrumental variable interpretation given by Sargan (1958) to the original LIML estimator. It is defined as b ? =argmin ? (y?Z?) 0 (I N ?C 0 )H(H 0 H) ?1 H 0 (I N ?C)(y?Z?) (y?Z?) 0 (I N ?C 0 )(I N ?C)(y?Z?) , (2.1.28) where His an instrument matrix. Alvarez and Arellano show that the asymptotic bias of the WG estimator only disappears when N/T ? 0.WhenN/T tends to a positive constant, all three estimators are asymptotically biased with negative asymptotic biases of order 1/T , 1/N,and1/(2N ?T), respectively. When N/T tends to infinity, the fixed T results assumed by the GMM literature remain valid. They also consider a random effects maximum likelihood estimator that leaves the mean and variance of the initial conditions unrestricted and show that this estimator is asymptotically unbiased for all cases. KeaneandRunkle(1992) suggestanalternativeestimationprocedurethattakes into account the variance covariance structure of the disturbances. First the model is estimated by an initial procedure, such as the instrumental variables (IV) with instruments that could, for example, be the instruments suggested by Arellano and Bond (1991). Then an estimate of the inverse of the variance covariance matrix and its Cholesky decomposition is calculated. The model is then transformed and 5 Instead of keeping ? 2 fixed in the weighting matrix of the GMM criterion, it is continuously updated by making it a function of the argument in the estimating criterion. 19 estimated with original (untransformed) instruments, i.e. b ? = h Z 0 ? I N ? b P 0 ? HA ?1 H 0 ? I N ? b P ? Z i ?1 (2.1.29) ?Z 0 ? I N ? b P 0 ? HA ?1 H 0 ? I N ? b P ? y, where b P is Cholesky decomposition of the estimated inverse of the variance co- variance matrix and A is moment weighting matrix that is chosen analogously to the standard GMM estimators. 2.1.2 Bias Correction Small sample bias correction procedure of the inconsistent OLS estimation has been proposed by Kiviet (1995). Consider a dynamic panel data model as in (2.1.1). The model in levels can be stacked as in (2.1.6) y = Z? +(I N ?e T )?+?, (2.1.30) where Z =[y ?1 ,X] with y =(y 11 ,...,y 1T ,...,y N1 ,...,y NT ) 0 , (2.1.31) y ?1 =(y 10 ,...,y 1,T?1 ,...,y N0 ,...,y N,T?1 ) 0 , X =(x 11 ,...,x 1T ,...,x N1 ,...,x NT ) 0 , ? =(? 11 ,...,? 1T ,...,? N1 ,...,? NT ) 0 , ? =(? 1 ,...,? N ) 0 . 20 The within group estimator is defined as b ? =(Z 0 AZ) ?1 Z 0 Ay, (2.1.32) where the NT ?NT within group transformation matrix Ais defined as A = I N ? ? I T ? e T e 0 T T ? . (2.1.33) Kiviet (1995) calls this estimator Least-SquaresDummyVariables(LSDV) while Anderson and Hsiao (1981) refer to is as Covariance (CV) estimator. This estimator is inconsistent for fixed T due to presence of individual effects in both the disturbances ? and the regressors y ?1 . Although consistent estimates can be obtained by IV or GMM procedures, the inconsistent LSDV estimator has a rel- atively low variance and hence can lead to an estimator with lower root mean square error after the bias is removed. Theasymptoticformulaeforthebiasgiven in Nickell (1981) for a model with no exogenous regressors has been found to be accurate in small samples, except for large values ?. Similar results have been reported by Sevestre and Trognon (1985). Kiviet (1995) provides approximating formulae for the small sample bias that have robust performance over the entire range of parameters. 2.1.3 MD and ML Estimation Chamberlain?s (1982, 1984) proposed to treat each timeperiodasanequation in a multivariate equation framework. Such approach is robust to certain kinds 21 of heteroscedasticity as well as autocorrelation in the errors without imposing a priori restrictions on the variance covariance matrix. To demonstrate the method assume for simplicity that the model is: y it = x 0 it ? +? i +? it t =1,..,T; i =1,...,N, (2.1.34) and E(? it |x i1 ,...,x iT ,? i )=0, (2.1.35) where the p ? 1 vector of explanatory variables is assumed to be stochastic and hence the model also covers the lagged dependent variable case. The variables can be stacked by grouping observations for each individual into a vector y i = (y i1 ,...,y iT ) 0 and x i =(x i1 ,...,x iT ) 0 .Assumethat(y i ,x i ) is an independent draw from a common unknown multivariate distribution with finite fourth-order moments and with E(x i x 0 i ) positive definite. The individual effects are possibly correlated with the explanatory variables. Chamberlain (1984) assumes that the minimum-mean-squared-error linear projection of ? i onto x i is given by 6 E ? (? i |x i )=?+ T X t=1 a 0 t x it . (2.1.36) 6 If the conditional expectation of ? i are linear, we have E ? (? i |x i )=E (? i |x i ). 22 The model can be rewritten as E ? (y i |x i )=E ? {E ? (? i |x i ,? i )|x i } (2.1.37) = E ? {? i e T +(I T ?? 0 )x i |x i } = ? i e T +?x i , and y i = ? i e T +(I T ?x i )? +? i , (2.1.38) where ? = I T ?? 0 +e T (a 0 1 ,...,a 0 T ), (2.1.39) and? i = y i ?E ? (y i |x i ),and? = vec(?). The proposed estimation procedure is then as follows. Treating the coef- ficients in the above equation as unrestricted, one first obtains initial (usually least-squares) estimate b? of ?. In the second step, the restrictions on ? in (2.1.39) are incorporated by letting?be a function of the parameters of the model ? =(? 0 ,a 0 1 ,..,a 0 T ). The restrictions are imposed by using a minimum-distance estimator, specifically b ? =argmin ? [b???(?)] 0 b ?[b???(?)], (2.1.40) where b ? is the estimated variance covariance matrix of the asymptotic variance 23 of b?: b ? = 1 N N X i=1 nh (y i ?y)? b ?(x i ?x) i (2.1.41) h (y i ?y)? b ?(x i ?x) i 0 ?S ?1 XX (x i ?x)(x i ?x) 0 S ?1 XX ? , where S ?1 XX = 1 N N X i=1 (x i ?x)(x i ?x) 0 . (2.1.42) Anderson and Hsiao (1981) consider the model (2.1.1) with |?| < 1.They distinguishfourdifferentcasesbasedondifferentassumptionsontheinitialvalues of the process (y i0 ): ? Case I. Fixed initial observations: y i0 are fixed observed constants ? Case II. Random initial observations, common mean: y i0 = c+? i (2.1.43) where ? has a mean zero and a finite variance and is independent of ? i and ? it . Here they also suggest that one could assume y i0 = c+? i (2.1.44) so that the initial endowment affects the level. 24 ? Case III. Random initial observations, different means (in this case there the incidental parameter problem arrises and for fixed T the MLE is inconsis- tent): the model is y it = w it +? i t =0,1,..,T, (2.1.45) w it = ?w i,t?1 +? it t =1,..,T, (2.1.46) where w it and ? i are unobservable. In this case w i0 are unknown constants. ? Case IV. Random initial observations with stationary distribution: same as above but w i0 are(a)drawsfromstationarydistribution with mean zero and variance var(? it ) 1?? 2 or(b)samebutthevarianceisarbitrary.Inthesubcase(a), the y it come from the stationary distribution of the process. To derive the likelihood function they assume normality of the error terms ? it , ? i and when applicable also y i0 . Implicit assumption is that E(? it )=0and Var(? it )=? 2 (uniform over individuals). Anderson and Hsiao (1982) have y it = ?y i,t?1 +x it ? +z i ? +? i +? it t =1,..,T; i =1,...,N, (2.1.47) 25 where |?| < 1 and E(? i )=E(? it )=E(? i z i )=E(? i x it )=E(? i ? it )=0 (2.1.48) t =1,..,T; i =1,...,N, and E ? ? i ? j ? = ? 2 ? for i = j and =0for i 6= j, E(? it ? js )=? 2 ? i = j, t = s, (2.1.49) =0 otherwise They also assume normality of ? i and ? it and first consider the model with only time-invariant exogenous regressors. Again several cases are distinguished: ? (I) y i0 is fixed ? (II) y i0 is random with ? (IIa) y i0 independent of ? i ,or ? (IIb) y i0 correlated with ? i ; in their wording ?If we wish the initial endowment [y i0 ] affects the equilibrium level [ ? i 1?? ] we may let?: y i0 = z i ? +? i . (2.1.50) ? (III) (y i0 ?? i ) is fixed ? (IV) (y i0 ?? i ) is random with 26 ? (IVa) variance ? 2 ? 1?? 2 ? (IVb) unrestricted (but uniform over i)variance Next Anderson and Hsiao consider the model with only time-varying regres- sors and they offer two interpretations of the model: (1) Serial correlation model: y it = ?y i,t?1 +x it ???x it ? +? i +? it . (2.1.51) Here they again assume either that (y i0 ?x i0 ??? i ) is fixed, or random with zero mean and variance ? 2 ? 1?? 2 . (2) State dependence model: y it = ?y i,t?1 +x it ? +? i +? it . (2.1.52) As before, there is a variety of assumptions concerning y i0 considered - the as- sumption correspond exactly to cases I.-IV above, except that in case of IV they distinguish whether (y i0 ?? i ) is random with ? ? (IVa) common mean and variance ? 2 ? 1?? 2 ? (IVb) common mean and unrestricted variance ? (IVc) heterogeneous mean and variance ? 2 ? 1?? 2 ? (IVd) heterogeneous mean and unrestricted variance Table 1 below summarizes the consistency findings of Anderson and Hsiao: 27 Table 1. Consistency of ML Estimation Case Estimated Parameters N fixed, T ?? T fixed, N ?? I. ?,?,? 2 ? Consistent Consistent ?,? 2 ? Inconsistent Consistent II.a ?,?,? 2 ? Consistent Consistent ?,? 2 ? ,? 2 y 0 ,E(y i0 ) Inconsistent Consistent II.b ?,?,? 2 ? Consistent Consistent ?,? 2 ? ,? 2 y 0 Inconsistent Consistent E(y i0 ),Cov(? it ,? i ) III. ?,?,? 2 ? Consistent Inconsistent ?,? 2 ? ,(y i0 ?? i ) Inconsistent Inconsistent IV.a ?,?,? 2 ? Consistent Consistent ?,? 2 ? ,E(y i0 ?? i ) Inconsistent Consistent IV.b ?,?,? 2 ? Consistent Consistent ?,? 2 ? ,E(y i0 ?? i ) Inconsistent Consistent Var(y i0 ?? i ) IV.c ?,?,? 2 ? Consistent Inconsistent ?,? 2 ? ,E i (y i0 ?? i ) Inconsistent Inconsistent Var(y i0 ?? i ) IV.d ?,?,? 2 ? Consistent Inconsistent ?,? 2 ? ,E i (y i0 ?? i ) Inconsistent Inconsistent Var(y i0 ?? i ) 28 Bhargava and Sargan (1983) consider the dynamic panel data model with ex- ogenous variable of essentially the same form as (2.1.1). They derive the maxi- mum likelihood function under the assumption that the innovations and the indi- vidual effects are normally and independently distributed with constant variances, i.e. ? it ? N (0,? 2 ? ) and ? i ? N ? 0,? 2 ? ? . The likelihood is derived first treating the initial values y i0 as exogenous and then as endogenous by assuming that the initial values are generated from the stationary distribution of the process. In par- ticular, they assume that y i0 is generated by a series of equations (2.1.1) and can be written as y i0 = ? X k=0 ? k (x i,t?k ? +? i +? i,t?k ) (2.1.53) = y i0 + ? i 1?? + ? X k=0 ? k ? i,t?k , where y i0 is exogenous part of the initial values and is in fact assumed to be stochastic with y i0 ?N ? y ? i0 ,? 2 y 0 ? , independent of ? it and ? i . Hsiao, Pesaran and Tahmiscioglu (2002) consider the model (2.1.1) without exogenous variables, 7 i.e. y it = ?y i,t?1 +? i +? it t =1,..,T; i =1,...,N, (2.1.54) 7 In the second part, the authors extend the model for both strictly and weakly exogenous vari- ables. 29 withy i0 observable. Under the assumption that the process has started at time?m one can express the first difference of the initial observation as ?y i1 = ? m ?y i,?m+1 +? i , (2.1.55) where ? i = P m?1 j=0 ? j ?? i,1?j . Hsiao, Pesaran and Tahmiscioglu then distinguish two assumptions for the initial values of the process: ? Case (3.i) |?| < 1 andtheprocesshasbeengoingonforalongtime (m ?? )andE(?y i1 )=0, Var(?y i1 )=2 Var(? it ) 1+? , Cov(? i ,?? i2 )= ?Var(? it ) and Cov(? i ,?? it )=0for t =3,4,...,T. ? Case (3.ii) m is finite and E(?y i1 )=b, Var(?y i1 )=c ? var(? it ), where c>0, Cov(? i ,?? i2 )=?Var(? it ) and Cov(? i ,?? it )=0for t =3,4,...,T. In both cases, the maximum likelihood function is then derived for the model in first differences under the assumption that the error terms are normally distrib- uted with ? it ? N (0,? 2 ? ). They also show that the ML function is invariant to the choice of transformation that is used to remove the individual effects from the model. Hsiao, Pesaran and Tahmiscioglu also define a minimum distance estimator and show that if it ignores the initial conditions, it will be inconsistent when T is fixed. They also study the relationship of the ML estimator the the GMM esti- mators suggested by Arellano and Bond (1991), Arellano and Bover (1995), and 30 Ahn and Schmidt (1995). Conditional on ? 2 ? and the variance of the initial obser- vations, Hsiao, Pesaran and Tahmiscioglu show that the difference between the asymptotic variance covariance matrix of the GMM and the ML (or MD) estima- tors will be positive definite. They conjecture that the same holds even when ? 2 ? and the variance of the initial observations is unknown and document this by a Monte Carlo study. Binder, Hsiao and Pesaran (2000) consider a multivariate extension of the dy- namic panel data model. Their specification is w it = ? i +?t+?[w i,t?1 ?? i ??(t?1)] +? it , (2.1.56) wherey it ,? i ,? and? it arem?1 vectors and?is anm?mmatrix. They define y it = w it ?? i ??t and hence the model becomes y it = ?y it +? it (2.1.57) They assume that the model started as time t = ?M, M ? 0 and the initial deviations are given by y i,?M = ? X j=0 ? ? j ?C ? ? i,?M?j +C? i , (2.1.58) 31 where ? it , i =1,2,...,N; t ? T, are i.i.d. across i and over t,and? i are i.i.d. across i with E ? ? ? ? it ? i ? ? ? =0 and Var ? ? ? ? it ? i ? ? ? = ? ? ? ?? ? 0 z ? ? ? . (2.1.59) The matrix C is defined recursively as C = P ? j=0 C j where C 0 = I m , C 1 = ??I m , C j = C j?1 ?, j ? 2. Notice that for m =1,theC canonlybezeroor one. Binder, Hsiao and Pesaran then derive the quasi maximum likelihood function for the model under the assumption the disturbances are {? it } and {? i } are mu- tually independent and identically distributed. The authors also extend the GMM and MD estimators to the multivariate context and provide simulation evidence that the QML estimator dominates the GMM and MD procedures even when the underlying disturbances are not normal. 8 Binder, Hsiao, Mutl and Pesaran (2002) discuss the same model but with higher order autocorrelation structure and pro- vide further Monte Carlo evidence. 2.2 Modelling Cross-Sectional Dependence When T is large and N small, one does not have to parametrically specify the cross sectional interdependencies and can allow for arbitrary covariance structure of the disturbances. The model can then be consistently estimated by a general- 8 The authors consider a case where the underlying disturbances are drawn from a zero mean chi-square distribution. 32 ized least squares method. This is what Zellner (1962) refers to as the seemingly unrelated regressions (SUR) specification. On the other hand, observe that the dimensions of the variance covariance matrix of the dependent variable (or dis- turbances) grows with sample size (number of cross-sections). Therefore, when the time dimension of the data is limited or fixed, it becomes impossible to in- fer the cross-sectional covariance structure of the model without imposing some parametric restrictions. Typically the interaction among the cross-sectional units is modelled as pro- portional to some observable distance. The most widely used parameterization are variants of the one considered by Cliff and Ord (1973 and 1981) which I re- view below. Recent applications include Audretsch and Feldmann (1996), Bernat (1996), Besley and Case (1995), Bollinger and Ihlanfeldt (1997), Buettner (1999), Case (1991), Case, Hines, and Rosen (1993), Dowd and LeSage (1997), Holtz- Eakin (1994), LeSage (1999), Kelejian and Robinson (2000, 1997), Pinkse and Slade (1998), Pinkse, Slade, and Brett (2002), Shroder (1995), and Vigil (1998). See also a host of other papers presented for example at the Spatial Economet- rics Workshop in Kiel, 2005 (http://www.uni-kiel.de/ifw/konfer/spatial/spatial- econometrics.htm). In this thesis, I follow the spatial econometrics literature and study a first order spatial autocorrelation model with a known spatial weighting matrix. The panel spatial autocorrelation model is a generalization of the single cross-section mod- els that include Cliff and Ord (1973, 1981), Whittle (1954), Anselin (1988) or Kelejian and Prucha (1998, 1999 and 2004). See also Lee (2004) who provides 33 asymptotic properties of ML procedure for spatial models. Other recent theo- retical developments include Baltagi and Li (2001a,b), Baltagi, Song and Koh (2003), Conley (1999), Das, Kelejian and Prucha (2005), Kelejian and Prucha (2001, 1997), Lee (2003, 2002, 2001a,b), LeSage (2000, 1997), Pace and Barry (1997), Pinkse and Slade (1998), Pinkse, Slade, and Brett (2002), and Rey and Boarnet (2004). An excellent review of the different specifications in spatial econometrics can be found in Anselin (1988). See also Haining (1990) and refer- ences therein. 2.2.1 Model Specifications I will now present the basic specification of spatial dependence suggested in the literature. The Cliff-Ord type model of spatial dependence can be written in the following form. Suppose that we have a panel of observations in space, indexed by i =1,...,N, and time, indexed by t =1,...,T. The disturbances 9 u it,N can then be specified to follow a spatial autoregressive process in the form of: u it,N = ? N X j=1 w ij,N u jt,N +? it,N . (2.2.1) The disturbance u it,N for a cross-section i at a time t consists of a weighted av- erage of contemporaneous disturbances in other cross-sections and a mutually in- dependent innovation term ? it,N . The weights w ij,N are assumed to be observable quantities and, therefore, the extent of correlation in the model is a function of a 9 Of course spatial lags can also be applied to the endogenous or explanatory variables in the same manner. 34 single parameter ?. This model for spatial correlation wasintroducedbyCliffandOrd(1973, 1981). Anselin (1988) refers to this model as a first order spatial autoregres- sive model or SAR(1). The weights w ij,N are referred to as spatial weights and are assumed to be known, ? is called the spatial autoregressive parameter and P N j=1 w ij,N u jt,N is referred to as a spatial lag. The spatial weights w ij,N are typ- ically specified to be nonzero if cross sectional unit i relates to unit j in a mean- ingful way. In such cases, units i and j aresaidtobeneighbors.Inpractice,the spatial weights are often viewed as normalized in the sense that the summation term in (2.2.1) is an average of neighboring observations. e.g. one postulates that P N j=1 w ij,N =1. A more general model can include spatial lags in the disturbances as well as in the endogenous variable, denoted by y it,N ,e.g. y it,N = x it,N ?+? N X j=1 m ij,N y jt,N +u it,N , (2.2.2) where x it,N is a vector of exogenous variables, ? is a vector of parameters, ? is a scalar parameter, m ij,N are spatial weights, and the disturbance u it,N areasin (2.2.1). The term P N j=1 m ij,N y jt,N is then referred to as a spatial lag of the de- pendent variable. The weights in the spatial lag of the dependent variable (m ij,N ) can, but do not necessarily have to, correspond to those in the spatial lag in the disturbances (w ij,N ). Observe that all variables are indexed by the sample size N,e.g.theyform 35 triangular arrays. This also applies to situations where the spatial weight are spec- ified as fixed constants. Observe that in many cases, it is assumed that each cross- sectional location i has a fixed number of neighbors, say q,forwhichw ij,N 6=0. Hence each w ij,N is equal either to zero or a fixed number such as 1/q.Ob- serve that even in such cases, the number of cross-sectional units determines the number of units that enter into the solution of equation (2.2.1). As a result, the disturbances u it,N that are solution to (2.2.1) have to be indexed by the sample size. The fact that the disturbances u it,N are indexed by the sample size leads to certain technical complications and, for example, one has tobe careful inapplying central limit theorems and make sure that these also hold for triangular arrays. Contiguity Weights The specifications where each units is, only affected by its neighbors are sometimes referred to as contiguity weights. These could be specifiedasw ij,N =1, whenthetwounitsareneighbors, andw ij,N =0otherwise. DenotingW N theN?N matrixoftheweightsw ij,N , therow-normalizedweights are then given by W ? N = W N ./(e 0 N ?W N e N ), (2.2.3) where e N is anN ?1 vector of ones and ./ denotes element-by-element division. In practical applications, the definition of a neighbor often follows a nat- ural geographical interpretation. Thus if the space in question is a geographical space and the units of analysis are regions, two regions are classified as neighbors when they share a common border. Other popular specifications of the contigu- 36 ity weights are rook, queen and related configurations. Suppose that the space is divided in equally sized rectangular units. Below, I depict the rook and queen configuration using one to indicate the units that are neighbors to the unit x and zero to indicate other units that are not direct neighbors (these then correspond to entries on the x?th row of the spatial weighting matrix W N ): rook : 00000 00100 01x 10 00100 00000 queen : 00000 01110 01x 10 01110 00000 (2.2.4) An alternative is to assume that the spatial process has higher order components and use so-called double-rook or double-queen specification, which could be: double?rook : 00 1 2 00 0 1 2 1 1 2 0 1 2 1 x 1 1 2 0 1 2 1 1 2 0 00 1 2 00 double?queen : 1 2 1 2 1 2 1 2 1 2 1 2 111 1 2 1 2 1 x 1 1 2 1 2 111 1 2 1 2 1 2 1 2 1 2 1 2 (2.2.5) Of course the choice of entries 1 and 1 2 is arbitrary and these can be replaced by 37 some other constants. Another possibility is to assume that the cross-sectional units can be ordered linearly in space (as an analogy to the linear ordering of observations in time). The specification that is often referred to as q?ahead, r?behind (in terminology of Kelejian and Prucha, 1999) uses the weights matrix W (q,r) N consisting of zeros except for entries of ones on the first q subdiagonals below the main diagonal and entriesofonesonthefirst r subdiagonals above the main diagonal. For example, the 2?ahead, 2?behind matrix is: W (2,2) N = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 01 10??? 0 10 1 . . . . . . . . . 11 . . . . . . . . . 0 0 . . . . . . . . . 11 . . . . . . . . . 101 0 ??? 0110 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? . (2.2.6) An alternative is to assume a circular ordering of the observation in space. In this case, the q?ahead, r?behind weights matrices are as above but with added nonzero entries in positions (i,N ?j) where i,j =0,..,q? 1 and (N ?k,l) where k,l =0,..,r ? 1.Forthe2?ahead, 2?behind matrix, circularity implies that the first unit is also a neighbor of unitsN andN?1, hence the added entry of one in positions (N,1), (N ?1,1), (1,N),and(2,N). Additionally the second and last unit (N)aswellasthefirst and (N ?1) ?th units are neighbors, and hence the entries of one in positions (N,2) and (1,N?1). 38 Distance Based Weights When one views the cross-sectional observations as being located in a space, the extent of direct correlation between the disturbances at two locations can be interpreted as relatedtotheirdistancein the space under consideration. Hence the weights can be interpreted as being (inversely) related to some measure of distance among the observations. In practical applications the space does not necessarily have to be a geographical space. The observations can be located in an abstract space in which their proximity is a known function of some of their observable characteristics. For example, two industries can be considered to be ?close? to each other if they use a similar set of inputs, or two countries can be ?close?iftheyhavereceivedfinancial flows from the same inter- national lenders. Under the interpretation of the weights w ij,N being inversely related to a dis- tance measure, one is making an implicit assumption that the weights are sym- metric in the sense that w ij,N = w ji,N . This is an artefact of the symmetry of distance measures, i.e. the distance from i to j hastobeequaltothedistance from j to i. 10 Observe, however, that the model considered here is more general. In particular, I do not require the weights to be symmetric andw ij,N does not have to be equal tow ji,N . This can be advantageous in situations where the spillover of shocks is not necessarily symmetric. An example is the international transmission of shocks, where a shock originating in a very small country cannot be plausibly assumed to affect a large country in a same way as a shock originating in a large 10 Observe that the distance based weights can be adjusted (premultiplied) by a factor that ac- counts for the differences in the direction of the influence. In this case the weights can become asymmetric. Note that the specification in this thesis allows for such asymmetries. 39 country affects a small country (e.g. US shocks affect say Ecuador much more than Ecuador?s shocks can affect the US). The problem of symmetry of the spatial weights that are based on a distance measure is related to a more general issue of aggregation. Suppose that the data was generated for a larger (disaggregated) sample but is only observed for ag- gregated spatial units. Mutl (2006) considers such data generating designs in a Monte Carlo study and concludes that only specifications that adjust the spatial weights for the relative size of the units deliver estimates that do not change with the increases in the number of units observed in the sample. The appropriate mea- sure of the size depends on the units of measurement of the endogenous variable. For example, when the dependent variable is expressed as GDP per population, then the spatial weights w ij,N should be a postmultiplied by the population of the region i relative to the entire population of all regions in the sample. Construct- ing the distance based spatial weights in this fashion takes automatically account of the asymmetrical effects considered above. See also Giacomini and Granger (2004) for related issue of forecasting an aggregate of spatially interrelated obser- vations, and LeSage and Pace (2004) for dealing with missing values in models with spatial dependence. 2.2.2 Estimation TheestimationmethodformodelswithspatialautocorrelationsuggestedbyAnselin (1988) or Anselin and Hudak (1992) was maximum likelihood (ML). The asymp- totic properties of the ML estimator of a model such as (2.2.1) have been derived 40 only recently by Lee (2004) for one specific Cliff-Ord model. Furthermore, the maximum likelihood function contains a Jacobian term that is a determinant of a matrix that increases with the sample size N. Hence for moderate and large sam- ple sizes, the ML estimation might become infeasible. As an alternative, Kelejian and Prucha (1998) introduced spatial generalized moments (spatial GM) estima- tor and proved its consistency. The asymptotic distribution of the spatial GM estimator is derived in Kelejian and Prucha (2005). The spatial GM estimator is computationally much simpler and, as a result, is feasible also for large sample sizes. The OLS estimation of a model with SAR disturbances is inefficient but re- mains consistent. However, when spatial lags of the dependent variable are in- cluded, as in (2.2.2), OLS estimation becomes biased since the stochastic regres- sor P N j=1 w ij,N y jt,N on the left hand side is correlated with the error term (endo- geneity bias). However, an instrumental variable estimation with spatial lags of the explanatory variable as instruments, will be consistent (Kelejian and Prucha, 1998). Alternative instrument sets are considered in Lee (2003) and Kelejian, Prucha and Yuzefovich (2004). The stacked version of the model given in (2.2.1) and (2.2.2) is y N = X N ? +?M N y N +u N , (2.2.7) u N = ?W N u N +? N , where y N is the N ? 1 vector of the dependent variable, X N is the N ?p matrix 41 of exogenous variables, M N and W N are N ?N spatial weighting matrices, u N and? N are the N ? 1 vectors of disturbances and innovations. Under appropriate regularity conditions, the model can be solved as (see, for example, Das, Kelejian and Prucha, 2003, page 4): y N =(I N ??M N ) ?1 X N ? +(I N ??M N ) ?1 (I N ??W N ) ?1 ? N . (2.2.8) Under the assumption that the vector? N is normally distributed with ? N ?N (0 N?1 ,? 2 I N ), the likelihood function is: ln(L)=? N 2 ln(2?)? 1 2 ln|? N | (2.2.9) ? 1 2 ? y N ?(I N ??M N ) ?1 X N ? ? 0 ? ?1 N ? y N ?(I N ??M N ) ?1 X N ? ? , where? N is the variance covariance matrix of the disturbancesu N given by ? N = ? 2 (I N ??M N ) ?1 (I N ??W N ) ?1 (I N ??M 0 N ) ?1 (I N ??W 0 N ) ?1 . (2.2.10) The least squares procedure applied directly to equation (2.2.7) is inconsistent due to correlation of y it,N and u it,N . However, there are instrumental variables (IV) procedures that are consistent. Observe that for the current model (see Das, Kelejian and Prucha, 2003, page 7): E(y N )=(I N ??M N ) ?1 X N ? = ? X k=0 ? k W k N X N ?, (2.2.11) 42 andhenceidealinstrumentsarecombinationsofmatricesX N ?,W N X N ?,W 2 N X N ?, etc. KelejianandPrucha(1998)showthatanIVestimatorthatusesatleastthelin- early independent columns ofX N ,W N X N ,W 2 N X N asinstrumentsisconsistent and asymptotically normal. The spatial autocorrelation parameter ? can then be estimated with the spatial generalized moments (spatial GM) procedure, suggested by Kelejian and Prucha (1999). Denote bu N the estimated disturbances based on an initial consistent esti- mator. Let v 1,N ? ?,? 2 ? = N ?1 ? I N ??W N bu N ? 0 ? I N ??W N bu N ? ?? 2 , (2.2.12) v 2,N ? ?,? 2 ? = N ?1 ? I N ??W 2 N bu N ? 0 ? I N ??W 2 N bu N ? ?? 2 N ?1 tr(W 0 N W N ), v 3,N ? ?,? 2 ? = N ?1 ? I N ??W 2 N bu N ? 0 ? I N ??W N bu N ? . The spatial GM estimator is then defined as ? b?,b? 2 ? =argmin ( 3 X k=1 v 0 k,N ? ?,? 2 ? v k,N ? ?,? 2 ? : ? ?,? 2 ? ? [?a,a] ? ? 0,s 2 ? ) , (2.2.13) where a ? 1 and s 2 is the upper limit considered for ? 2 . Kelejian and Prucha (1999) show that the spatial GM estimator is consistent. Kelejian and Prucha (1998) also provide a proof that the spatial autoregressive parameter ? isa?nui- sance? parameter in the sense that the feasible generalized spatial two stage least squares (FGS2SLS) estimator has the same asymptotic distribution when it is basedonaconsistent estimator of ? as when it is based on the true value. Ini- 43 tially, the asymptotic distribution of the spatial GM estimator was not determined. As a result, tests for spatial autocorrelation had to be based on statistics such as the Moran I. Kelejian and Prucha (2001) and Pinske and Slade (1998) provide as- ymptotic distribution of the Moran I test statistics. The asymptotic distribution of the spatial GM estimator was then derived for a more general model that includes heteroscedastic disturbances in Kelejian and Prucha (2005). 2.3 Space-Time Models Time and space is a key feature of almost all human activities. Their interaction hasbeenstudiedinmanydisciplinesandhasreceivedsomeattentionineconomics as well. Studies outside economics include many applications in geostatistics (see e.g. Kyriakidis and Journel, 1999 for a review), geography but also in epidemi- ology, medicine, crime prevention and others. Short overviews can be found in Cressie (1991: 449-452) and Robinson (1998: 319-328). In economics and econometrics, some interesting cases complementary to the specification in the present thesis are, for example, generalized least squares test to test for unit roots in panel data (although without deriving any asymptotic prop- erties of the estimator) in O?Connell (1998), a two-step sieve least squares proce- dure to estimate a panel vector autoregression (VAR) model with a nondiagonal cross-sectional covariance matrix that is proportional to an observed economic distance measure in Chen and Conley (2001) who look at asymptotics in the less complicated case when the cross-sectional dimension is fixed, and, finally, Chang (2002) who derives asymptotic properties of a univariate panel model with a gen- 44 eral unrestricted form of cross-sectional heterogeneity when the cross-sectional dimension of the panel is also fixed. In this thesis, I will analyze dynamic model that includes a spatial lag in the disturbance process. This is a special case of the class of stochastic models known as space-time autoregressive (space-time AR) models introduced by Cliff et al. (1975) and generalized by Pfeifer and Deutsch (1980). More recent discussions and applications of the space-time AR model in econometrics are Elhorst (2001), while a generalization of the model to continuous space is proposed by Brown et al. (2000). Below I review papers that deal with this class of models in more detail. Note that if contemporaneous correlation is present, the observable data become a non- trivial transformation of the underlying random field, resulting in some technical difficulties. Hence I first focus on specifications that do not allow for contempora- neous correlation in the data but instead assume that spatial interactions act with a time lag. In the second subsection I therefore present models that allow for such complications. 45 2.3.1 Space-Time Autoregressive Moving Average (STARMA) Models Pfeifer and Deutsch (1980) were the first to propose a STARMA model. Their general STARMA(p,q;? 1 ,...,? p ,m 1 ,...,m q ) model is: y it = p X k=1 ? k X l=0 ? kl N X j=1 w ij,l y j,t?k (2.3.1) ? q X k=1 m k X l=0 ? kl N X j=1 w ij,l ? j,t?k +? it , wherepis the autoregressive order,q is the moving average order,? k is the spatial order of thek?thautoregressiveterm,m k isthespatialorderofthek?thmoving average term, ? kl and ? kl are parameters and the errors are normally distributed with E(? it )=0, E(? it ? j,s )=? 2 for i = j and t = s, and E(? it ? js )=0 otherwise. The spatial weights have the usual interpretation (see the previous subsection) and are assumed to be observable and the authors do not impose any restrictions on their structure. Observe that in contrast to Cliff-Ord type model considered in this thesis, their STARMA model does not allow for contemporaneous correlation between spatial units, i.e. for example ? it depends on ? j,t?1 but not on ? jt .Asa result, the likelihood function does not involveaJacobianterminaformofade- terminant of anN?N and, as a result, ML estimation is considerably simpler and it is the estimation method suggested by Pfeifer and Deutsch. The authors derive the likelihood function conditional on initial values of the process and note that 46 it is only appropriate for moderate or large T. However, the restrictions implied by the model on the initial observations are not explicitly derived. The paper also does not provide formal consistency or asymptotic normality results. Abraham (1983) derives the likelihood function for the STARMA model. Stoffer(1986) outlines different estimation procedure for a spatial STAR model with missing values (spatial ARX in his terminology). The model combines the time series parametrization of an autoregressive moving average process for miss- ing and noisy data with a Cliff and Ord type spatial structure. The data generating process is assumed to be a q-th order autoregressive process where the current observation is influenced by q time lags of its spatial neighbors: y it = q X k=1 N X j=1 w ij,k ? kj y j,t?k +x 0 it ? +? it , (2.3.2) where the autoregressive parameters? kj are allowed to vary with spatial location. The spatial weights w ij,k have the usual interpretation (e.g. they are inversely related to a distance) and are allowed to be different at different time lags. The p explanatory variables in x it are modelled as a stochastic process independent of the innovations ? it and the data sample is observed for i =1,...,N and t = 1,...,T. The estimates are solutions to approximated Yule-Walker equations. For ex- ample, withno data problems,q =1and without explanatory variables, the model 47 can be written as y t = W?y t?1 +? t , (2.3.3) wherey t =(y 1t ,...,y Nt ) 0 ,? t =(? 1t ,...,? Nt ) 0 ,Wis aN ?N matrix of the spatial weights w ij and? = diag(? 1 ,...,? N ). The proposed estimator of?is then: b ? = diag ? W ?1 b ? ?1 b ? ?1 0 ? , (2.3.4) where the estimated moments of the data are b ? 0 = T X t=2 y t y 0 t , (2.3.5) and b ? ?1 = T X t=1 y t y 0 t?1 . (2.3.6) There are no formal asymptotic claims made for the procedure. The method is illustrated with an application to fish catch data at five locations for 240 time peri- ods suggesting that the implicit asymptotic consistency and normality claims are for fixed spatial dimension N and increasing time dimension of the observations. Pace et al. (1998) model spatial and temporal dependence in housing price data in Fairfax County Virginia between 1961 and 1991. Unlike in standard STAR models, it is not assumed that the autocorrelation in the dependent variable is linearly separable in space and time. Instead an interaction of the space and time 48 lags is considered. In particular, the model is: y it = T X s=1 N X j=1 w ij,ts y js +x 0 it ?+ T X s=1 N X j=1 w ij,ts x 0 js ? +? it , (2.3.7) where the observable weights w ij,ts relate observation across time and space si- multaneously. It is assumed thatw ij,ts =0fors ?t, meaning that the current and future values of y js and x js do not influence the process for y it . Stacking w ij,ts into a NT ?NT matrix W, Pace et al. assume that W = ? s S?? T T+? ST ST+? TS TS, (2.3.8) wheretheSandTmatricesareinterpretedas filtersinspaceandtimerespectively. Their entries are related to the distance of the of the observation in space and time respectively. The main limitation of their approach is that it is assumed that there are no concurrent observations and that only past observations have an effect. If the matrix W is stacked so that the observations are sorted according to time, this assumption implies that both T and S are strictly lower (or upper) diagonal. As a result the model can be estimated by OLS. The paper does not provide formal results and does not spell out assumptions on the disturbance process. Giacomini and Granger (2004) show that the STARMA class of models can bederivedasatransformationofvectorautoregressivemovingaverage(VARMA) model, where the transformation is a restriction involving spatial weighting ma- 49 trices. When the number of locations is small, the model can be estimated by an overparametrized VARMA specification. With increasing number of location, the overparameterizedVARMAmodelhasalargenumberofinsignificantparameters. Therefore, estimation can be improved in a Bayesian framework by incorporating these as priors. Hence LeSage and Krivelyova (1999) propose a class of prior distributions for a Bayesian VAR model that will approximately constrain the in- significant parameters to zero. 2.3.2 Models with Contemporaneous Spatial Correlation The papers cited in the above subsection did not allow for contemporaneous de- pendence of the observations. When such interactions are included, the observa- tion become a nonlinear transformation of the innovations and, as a result, maxi- mum likelihood estimation is more difficult. We next review papers that allow for such complications. Congdon (1994) considers the spatiotemporal model of the following form: y it = x 0 it ? +? i +u it , (2.3.9) where t =1,...,T and i =1,...,N and the error term is both spatially and tem- porally autocorrelated: u it = ?u i,t?1 +? N X j=1 w ij u jt +? it . (2.3.10) 50 It is assumed thaty i0 andx i0 are known exogenous constants. The first step of the proposed estimation procedure eliminates the individual effects ? i by subtracting individual means y i and x i and estimating the slope coefficients? by OLS on (y it ?y i )=(x it ?x i ) 0 ? +(v it ?v i ). (2.3.11) In the second step, ? and ? are estimated by minimizing g(?,?)= N X i=1 T X t=1 ? y ? it ?x ?0 it b ? OLS ? 2 , (2.3.12) where y ? it =(y it ?y i )??(y i,t?1 ?y i )?? N X j=1 w ij ? y jt ?y j ? , (2.3.13) x ? it =(x it ?x i )??(x i,t?1 ?x i )?? N X j=1 w ij (x jt ?x j ). Based on Hordijk (1979), the transformation for the first time period is y ? 1 = ? (I??W) 0 (I??W)?? 2 I N ? 1/2 (y 1 ?y), (2.3.14) X ? 1 = ? (I??W) 0 (I??W)?? 2 I N ? 1/2 ? X 1 ?X ? , wherey 1 =(y 11 ,...,y 1N ) 0 ,y =(y 1 ,...,y N ) 0 ,X 1 =(x 0 11 ,...,x 0 1N ) 0 ,X =(x 0 1 ,...,x 0 N ) 0 and W is an N ?N matrix with elements w ij . The slope coefficients ? are esti- 51 matedbyOLSfrom y ? it ? b ?,b? ? = x ? it ? b ?,b? ? 0 ?+? it . (2.3.15) In the third step, the variance components ? 2 ? = Var(? it ) and ? 2 ? = Var(? i ) are estimated, e.g. b ? 2 ? = 1 NT N X i=1 T X t=1 ? y ? it ?x ?0 it b ? ? 2 , (2.3.16) where b ? is from step 2. 11 The final step is a generalized least squares (GLS) procedure to re-estimate?. The paper contains outline and an application of the estimation procedure to mortality rates in London but offers no formal proofs that would support the con- sistency claims. The estimated GLS procedure is based on suggestion in Anselin (1988), p.111. Driscoll and Kraay (1995, 1998) Provide a proof of consistency and asymp- totic normality of a GMM procedure based on a panel Newey and West (1987) nonparametric heteroscedasticity and autocorrelation consistent (HAC) covari- ance matrix estimator. 12 The limit is taken with respect to the time dimension 11 The expression for c ? 2 ? in the paper is c ? 2 ? = 1 N N X i=1 n? y i ? b ?y i,?1 ?b?Wy ? ? b ? ? x i ? b ?x i,?1 ?b?Wx ?o 2 ? c ? 2 ? T . This does not seem to have the correct dimensions. 12 The cross-sectional dimension of the data is collapsed by taking cross-sectional averages. Hence this is not a complete generalization of the HAC estimation to a panel setting. 52 of the data. Their specification requires that the data is an ?-mixing random field of the same size as the number of moment restrictions and hence places only weak restrictions on the form of spatial and temporal correlations. They consider r orthogonality conditions E[h it (z it ,?)] = 0,wherez it , i = 1,...,N, t =1,...,T is data and ? is a vector of parameters. The restrictions are assumedtoidentifytheparameters. Their GMM estimator is b ? T =argmin ? (" 1 T T X t=1 h t (?,z t ) # 0 b S ?1 T " 1 T T X t=1 h t (?,z t ) #) , (2.3.17) where z t =(z 1t ,...,z Nt ) 0 , h t (?,z t )=N ?1 P N i=1 h it (z it ,?),and b S T is the stan- dardHACestimatorappliedtothesequenceofcross-sectionalaveragesofh it (z it ,?). BronnenbergandMahajan(2001) Estimateamodelofretailersbehaviorwhere the market shares are related to marketing variables. Their model is y it = ? 0 +x 0 it ? +? i +u it , (2.3.18) where the disturbances are composed of innovations autocorrelated in time and individual effects autocorrelated in space: ? i = ? N X j=1 w ij ? j +? i , (2.3.19) u it = ? 1 u i,t?1 +v it . 53 The explanatory variables are also modelled as a stochastic process based on the same individual effects ? i ,withthej ?th explanatory variable x j,it specified as x j,it = ? jt +? j ? i +? j,it , (2.3.20) where ? j,it = ? 2j ? j,i,t?1 +? j ? t +? j,it . (2.3.21) The model is estimated by Maximum Likelihood under the assumption that the innovations ? i ,? i ,v it ,? j,it are all jointly normally distributed. Elhorst (2001) derives a likelihood function for a STAR(1,1) model where he also allows for contemporaneous spatial lags. His general model is y it = ?y i,t?1 +? 0 N X j=1 w ij y jt +? 1 N X j=1 w ij y jt,t?1 (2.3.22) +? 1 x it +? 2 x i,t?1 +? 3 N X j=1 w ij x it +? 4 N X j=1 w ij x i,t?1 +u it . The likelihood is derived under the assumption that the disturbances u it are nor- mally distributed with E(u it )=0, E(u 2 it )=? 2 and E(u it u sj )=0if t 6= s or i 6= j. The paper assumes that the matrix of the spatial weights W =(w ij ) has zeros on the diagonal and that the spatial autoregressive parameter ? is bounded by the inverse of the largest and smallest eigenvalue of W. It is also implicitly assumed that the matrixWis symmetric and that the model is dynamically stable 54 (this places a nontrivial condition on the parameters ? and ? 0 ). 13 The likelihood is not conditionalized on the initial values but instead it is assumed that the initial observations are draws from the stationary distribution of the process. Kapoor et al. (2005) extend the GM estimator of Kelejian and Prucha to a panel data. The contribution of thesis relative to Kapoor et al. (2005) is to allow for autocorrelation in the time dimension as well. Their specification is y it,N = x 0 it,N ?+u it,N , (2.3.23) where the disturbances are an SAR(1) process with individual effects: u it = ? N X j=1 w ij u ij,t +? i +? it . (2.3.24) ThepaperprovidesformalconsistencyproofofthespatialGMestimator(with alternative weighting schemes) of ?, as well as asymptotic normality of a gener- alized least squares (GLS) estimator of?. Baltagi et al. (2003) derive formulae for various Lagrange multiplier tests in a model that includes spatially correlated disturbances. The paper also provides experimental evidence of their performance in small samples. They consider the following model: y it = x 0 it ? +? i +u it , (2.3.25) 13 Such condition could be, for example |?| + |? 0 |?? max (W) < 1,where? max is the largest (in abolute value) eigenvalue of the matrixW that consists of the spatial weights w ij . 55 with the disturbances being an SAR(1) process: u it = ? N X j=1 w ij u jt +? it . (2.3.26) Observe that when the spatial lag does not operate on the individual effects, this specification implies that the covariance between y it and y js is zero for i 6= j andt 6= s.ThisisincontrasttothespecificationinKapooretal. (2005), wherethe individual effects are spatially correlated and, as a result, the covariance among y it and y js is nonzero for all values of i,j,t and s. Korniotis (2005) Building on work of Hahn and Kurstiener (2002), Korniotis (2005) considers a bias corrected OLS estimator in a dynamic panel data model that also includes spatial lag of the dependent variable. The specification is y it = ?y i,t?1 +? 1 N X j=1 w ij y jt +? 2 N X j=1 w ij y j,t?1 +x 0 it ? +? i +? it . (2.3.27) where the disturbances are independent in the time dimension but are allowed to havearbitrarycovariancematrix(constantovertime)inthecross-sectionaldimen- sion. ThepapergivestheasymptoticformulasforthebiasesoftheOLSestimators when both N and T simultaneously approach infinity. 56 Yang(2005) extendstheproofsofasymptoticnormalityinLee(2004)toastatic panel data model with random individual and fixed time effects. His model is y it = x it ? +? t +? i +u it , (2.3.28) where the disturbances u it are an SAR(1) process, i.e.: u it = ? N X j=1 w ij u ij,t +? it . (2.3.29) The QML function is derived under the assumption that {? it } and {? i } are mu- tually independent and identically distributed random variables with finite 4+? moments for some ?>0. 57 3Model In this chapter I specify the model and provide a discussion of the maintained assumptions. It proves to be useful to restate the following notational conventions and definitions: I use bold letters for matrices and vectors, and regular font letters to denote scalars. Furthermore, I use lower case letters for vectors and upper case letters for matrices. Let (A N ) NnullN be some sequence of Np?Npmatrices where p ? 1 is some fixed positive integer. I denote the (i,j)-th element as a ij,N .I say that the row and column sums of the sequence of matrices A N are uniformly bounded in absolute value if there exists a positive finite constant c independent of N such that max 1?i?Np Np X j=1 |a ij,N | ?cand max 1?i?Np Np X i=1 |a ij,N | ?c. (3.0.1) For future reference, I note that any finite sum and/or product of matrices with row and column sums uniformly bounded in absolute value will also have row and column sums uniformly bounded in absolute value; see Kelejian and Prucha (2004). As a consequence, ifBis a matrix of constants with fixed dimensions and A N isasequenceofmatriceswithrowand column sums uniformly bounded in absolute value, then the sequence of matrices (B?A N ) will also have row and column sums uniformly bounded in absolute value. 58 3.1 Model Specification Consider the following dynamic panel data model (1 ?i?N, 1 ?t?T): y it,N = ?y i,t?1,N +x it,N ? +u it,N , (3.1.1) where y it,N and x it,N denote the (scalar) dependent variable and the 1 ?p vector of exogenous variables corresponding to cross sectional unitiin periodt,?and? represent corresponding 1?1 andp?1 parameters, andu it,N denotes the overall disturbance term. In contrast to the existing dynamic panel data literature I do not assume that the disturbancesu it,N are cross-sectionally uncorrelated and I consider potentially heteroscedasticerrors. GiventhefactthatIwillderiveasymptoticpropertiesofthe model whenthe cross-sectional dimension tends to infinity, the cross-sectional co- variance structure will be parametrized with a finite number of parameters. In par- ticular, I assume that the disturbancesu it,N follow a spatial autoregressive process in the form of: u it,N = ? N X j=1 w ij,N u jt,N +? it,N , (3.1.2) where the overall disturbance u it,N consists of a spatial lag of contemporaneous disturbances in other cross-sections and an innovation ? it,N . Anselin (1988) refersto this model as a first order spatial autoregressivemodel or SAR(1). See the previous chapter for more detailed discussion of such spec- ification. The process for the disturbances contains one parameter ? and N 2 ob- 59 servable spatial weights w ij,N .The? it,N are the innovations that enter the spatial process. They are allowed to be correlated over time and I assume that they have the following error component structure: ? it,N = ? i,N +? it,N , (3.1.3) where ? i,N are unit specific error components, and ? it,N are the error components that vary both over cross-sectional units and time periods. The spatial weights, as well as the endogenous, exogenous and disturbance processes are all allowed to depend on the sample size, i.e., to depend on N. Observe that even if the innovations ? it,N did not depend on the sample size, the disturbances u it,N would still have to be indexed by the sample size due to the presence of the spatial lag ? P N j=1 w ij,N u jt,N in (3.1.2). 14 Stacking across units the model becomes (1 ?t?T) y t,N N?1 = ?y t?1,N N?1 +X t,N N?p ? p?1 +u t,N N?1 , (3.1.4) u t,N N?1 = ?W N N?N u t,N N?1 +? t,N N?1 , where ? t,N N?1 = ? N N?1 +? t,N N?1 , (3.1.5) 14 TheN?1vectorofdisturbancesu t,N isgivenbyu t,N =(I N ??W N ) ?1 ? t,N (seeequation 3.2.1). Note that the elements of (I N ??W N ) ?1 must depend on the sample size N. This would be true even if the elements w ij,N did not depend on the sample size. 60 and y t,N = ? ? ? ? ? ? y 1t,N . . . y Nt,N ? ? ? ? ? ? N?1 , X t,N = ? ? ? ? ? ? x 1t,N . . . x Nt,N ? ? ? ? ? ? N?p , (3.1.6) u t,N = ? ? ? ? ? ? u 1t,N . . . u Nt,N ? ? ? ? ? ? N?1 , ? N = ? ? ? ? ? ? ? 1,N . . . ? N,N ? ? ? ? ? ? N?1 , ? t,N = ? ? ? ? ? ? ? 1t,N . . . ? Nt,N ? ? ? ? ? ? N?1 , W N = ? ? ? ? ? ? w 11,N ??? w 1N,N . . . . . . . . . w N1,N ??? w NN,N ? ? ? ? ? ? N?N . In all of the ensuing discussionT is fixed and N ?? . I maintain the follow- ing assumptions: Assumption 1 For each N>1 the innovations {? it,N :1?i?N,t?T} are independentlydistributed, withzeromean, constantvariance? 2 ?,N with 0 0 and those moments are uniformly bounded by some finite constant. Assumption 2 For each N>1 the individual effects ? ? i,N :1?i?N ? are independently distributed, with zero mean, and are independent of the innova- tions {? it,N :1?i?N,t?T}. Furthermore, the individual effects have con- stant variance ? 2 ?,N with 0 0 and those moments are uniformly bounded by some finite constant. Assumption 3 The nonstochastic matrix W N has the following properties: (a) All diagonal elements of W N are zero. (b) The true parameter?satisfies |?| < 1; the matrixI N ?rW N is nonsingular for all |r| < 1. (c) TherowandcolumnsumsofW N andP N (?)=(I N ??W N ) ?1 arebounded uniformly in absolute value by, respectively, k W < ? and k P < ? where k P may depend on ?. It will be shown in the next section that the following assumption will guaran- tee that the variances of the disturbances u it,N are bounded away from zero: Assumption 4 ? min (P N P 0 N ) ?c P > 0 for some c P where c P may depend on ?. The analysis is conditionalized on the realized values of the exogenous vari- ables and I henceforth view them as constants. I make the following assumptions on the exogenous variables: Assumption 5 (a) The matrix of exogenous (nonstochastic) regressors X t,N , t?T, has a full column rank (for N sufficiently large). 62 (b) The elements of X t,N are uniformly bounded in absolute value. I complete the model by specifying a process that generates the initial obser- vation of the dependent variable: Assumption 6 The model defined in (3.1.4) is dynamically stable, i.e., |?| < 1, and has been in operation for an infinite period of time. 15 The error specification adopted in this thesis corresponds to that of a classi- cal one-way error component model, see e.g. Baltagi (1995, pp. 9). It is also a generalization of the literature on dynamic panel data models with independent innovations. Notice that with ? =0,myspecification becomes, for example, that of Arellano and Bond (1991), Ahn and Schmidt (1995), Arellano and Bover (1995), Blundell and Bond (1998), 16 or Anderson and Hsiao (1981 and 1981), case IVb. 17 Finally, note that the same error component specification of the dis- turbanceprocesswasadoptedinKapooretal. (2005), whoconsiderrandomeffect specification in the context of a static panel data model. 3.2 Model Implications I examine the asymptotic properties of the proposed estimation procedure when the time dimension of the panel is fixed. I assume slope homogeneity of the autoregressive parameters (? does not have an i subscript) 18 and I also assume 15 Note that Assumptions 1 and 2 have been consistently specified to hold for ?? 0.Inthefixed effects case, the central limit theorems would be applied to a vector of random variables that excludes ? N . Observe that the sequence of vectors ? N would in this case be required to satisfy some regularity condition such as Assumption A2 in Appendix A. 64 explanatory variables. From (3.1.4), we have that for 1 ?t?T y t,N = ?y t?1,N +X t,N ?+u t,N (3.2.2) = ?[?y t?2,N +X t?1,N ?+u t?1,N ]+X t,N ? +u t,N . . . = t?1 X j=0 ? j [X t?j,N ? +u t?j,N ]+? t y 0,N = t?1 X j=0 ? j [X t?j,N ? +P N ? t?j,N ]+? t y 0,N , and hencey t,N isawelldefined transformation of the innovations? t,N , the initial values of the processy 0,N , and the exogenous variables X t,N . Assumption 3(c) restricts the degree of permissible cross-sectional correlation in the sample. Note that some restriction on the correlation is necessary for any largesampleresultstohold.Inpractice in the spatial literature, with T fixed and N ?? , it is often assumed that each cross-sectional unit has a finite number of neighbors, or that the rows of the weight matrices are normalized to sum to unity. It is also often the case that although the matrices may not be sparse, the weights are proportional to an inverse of some distance measure. Therefore, under reason- able conditions, the weight matrices will have row and column sums uniformly bounded in absolute value. Assumption 4 rules out degenerate weighting matrices that would imply zero variance of the disturbances u t,N . Observe that from Assumption 3, we have u t,N = P N (? N +? t,N ) and hence the variance covariance matrix of the distur- 65 bances u t,N is VC(u t,N )= ? ? 2 ?,N +? 2 ?,N ? P N P 0 N . (3.2.3) In particular, notice that each diagonal element of VC(u t,N ) is bounded from below by the smallest eigenvalue 21 and hence the assumption implies that each u it,N hasvarianceboundedawayfromzero. Inamodelwithoutspatialcorrelation, P N = I N and this Assumption is trivially satisfied. Assumption 5 is an exogeneity assumption of explanatory variables. Finally, under Assumption 6, together with the assumptions on the exogenous variables and the spatial weighting matrix, we have by backward substitution: y 0,N = ? X j=0 ? j ? X ?j,N ? +u ?j,N ? (3.2.4) = ? X j=0 ? j ? X ?j,N ?+P N ? ?j,N ? +(1??) ?1 P N ? N . Hence y 0,N is a random variable that in general depends on N with mean that is not necessarily equal to zero. Notice that {u it,N :1?i?N,?? 0 and those moments are uniformly bounded by some finite constant (see the appendix for a 21 See e.g. Lemma 2 in Kelejian and Prucha (2003). 66 proof). 22 For future reference, I note that the variance of y 0,N is VC(y 0,N )= ? ? 2 ?,N 1?? 2 + ? 2 ?,N (1??) 2 ? P N P 0 N . (3.2.5) 22 Similarly, it can be shown that the stochastic process y it,N has finite absolute moments of order 4+? y for some ? y > 0 and that those moments are uniformly bounded by some finite constant. The proof of this claim is also in the appendix. 67 4 Estimation and Inference This chapter will present the key results of the thesis. I present a procedure to es- timate the parameters of the model outlinedinChapter3andderiveitsasymptotic properties. The proposed estimation method consists of three steps. In the first step, I propose to use an instrumental variables (IV) estimator of the slope coef- ficients ? and ? without efficiently accounting for the spatial correlation of the disturbances. 23 In the second step of the estimation, the estimated disturbances from the first stage are utilized in a spatial generalized moments (GM) estimator to estimate the degree of spatial autocorrelation in the disturbances (?). In the last step of the procedure, I propose a GMM estimator of ? and ? with an optimal weighting of the moments that is based on the initial estimators. For expositional purposes, I choose to present for the first stage an IV estima- tor that uses a simple set of instruments due to Anderson and Hsiao (1981). Ob- serve, however, that the results on the third stage generalized method of moments (GMM) estimators presented subsequently are sufficiently general to guarantee consistency of IV estimators that use an extended set of instruments, such as the one in Arellano and Bond (1991). 4.1 Initial IV Estimation In this section I propose a simple estimation procedure to estimate the parameters ? =[?,? 0 ] 0 ofthe model(3.1.1) anddemonstrate that themethodisconsistent and 23 I do not account for the spatial correlation in formulating the initial IV estimator. However, it is taken into account in the analysis of its properties. 68 asymptotically normal. Since the model contains individual effects, these cannot be consistently estimated with fixed T. Hence the model is considered after a transformation that removes the individual effects from the dependent variable. I followtheliteratureondynamicpanelsandusefirstdifferences. Notethatitwould also be possible to use other transformations such as central differences. I use moment conditions based on the fact that the first difference of the disturbances is uncorrelatedwiththeleveloftheendogenousvariablelaggedtwice(ormore). 24 In particular, the estimator corresponds to the one suggested by Anderson and Hsiao (1982). Inspection of the proofs reveals that the random effects Assumption 2 is not strictly necessary for the initial estimator to work. 25 I write the model in first differences as (t =2,...,T): ?y t,N N?1 = ? 1?1 ?y t?1,N N?1 +?X t,N N?p ? p?1 +?u t,N N?1 , (4.1.1) where?is the first difference operator and, in particular,?y t,N = y t,N ?y t?1,N , ?X t,N = X t,N ?X t?1,N and?u t,N = u t,N ?u t?1,N . Stacking the observationsovertimeyields ?y N (T?1)N?1 = ?Z N (T?1)N?(1+p) ? (1+p)?1 + ?u N (T?1)N?1 , (4.1.2) 24 This claim is formally proved in Lemma 2. 25 Note that it is not the case that no assumption has to be made on the individual effects, as is often claimed in the literature. Since the lagged endogenous variable is used as an instrument, one stillneedtomaintainthattheindividualeffectsareuncorrelatedwiththeidiosyncraticdisturbances and satisfy certain moment restrictions as well. This of would of course be satisfied if we view the individual effects as constants. 69 where ?Z N (T?1)N?p+1 = " ?y ?1,N (T?1)N?1 , ?X N (T?1)N?p # (4.1.3) and 26 ?y N = ? ? ? ? ? ? ?y 2,N . . . ?y T,N ? ? ? ? ? ? (T?1)N?1 , ?y ?1,N = ? ? ? ? ? ? ?y 1,N . . . ?y T?1,N ? ? ? ? ? ? (T?1)N?1 ,(4.1.4) ?X N = ? ? ? ? ? ? ?X 2,N . . . ?X T,N ? ? ? ? ? ? (T?1)N?p , ?u N = ? ? ? ? ? ? ?u 2,N . . . ?u T,N ? ? ? ? ? ? (T?1)N?1 . Since ?y t?1,N is correlated with ?u t,N the ordinary least squares estimator for ? from the above model will generally be inconsistent. However, the level of the dependent variable lagged twice (or more) will not be correlated with the disturbances?u t,N . Motivated by this, I define an instrument matrix H t,N N?(1+p) =[y t?2,N N?1 ,?X t,N N?p ]. (4.1.5) Given the model assumptions we have, as demonstrated in Lemma 2 below: E ? H 0 t,N (1+p)?N ?u t,N N?1 ! = 0 (1+p)?1 ,t=2,...,T. (4.1.6) 26 Note that most of the dynamic panel data literature stacks the data by first collecting the T observations of each unit in a vector and then stacks those N vectors. The grouping used in this paper is more convenient for modelling spatial correlation via (3.1.2). 70 The initial IV estimator of ? utilizes H t,N as instruments 27 for ?y t?1,N and is defined as b ? N (1+p)?1 = h ? b Z 0 N ?Z N i ?1 (1+p)?(1+p) ? b Z 0 N (1+p)?(T?1)N ?y N (T?1)N?1 , (4.1.7) where ? b Z N (T?1)N?(1+p) = H N (H 0 N H N ) ?1 H 0 N (T?1)N?(T?1)N ? ?Z N (T?1)N?(1+p) , (4.1.8) and H N (T?1)N?(1+p) = ? ? ? ? ? ? H 2,N . . . H T,N ? ? ? ? ? ? . (4.1.9) is a (T ?1)N ? (p+1)matrix of instruments. 28 TheinitialAndersonandHsiaoIVestimatorisaspecialcaseofamoregeneral GMM estimator discussed in Section 4.3. However, for expositional purposes I derive its asymptotic properties here. Substituting in the definition of the IV 27 We note that it is possible to use additional lags and/or levels of the dependent variable as instruments and obtain a consistent initial estimator as well. For example, we could use the instru- ments suggested in Section 4.3, i.e. H t =[y t?2,N ,...,y 0,N ,X t,N ,...,X 1,N ]. 28 Writing the instruments in this fashion leads to an estimator that is based on moment con- ditions that are averaged both over N and T. It is also possible to define the H N matrix as H N = diag(H 2 ,..,H T ), and the moment conditions are then only averaged over N.Inthiscase the expressions in Lemmas 1 and 2 have to be modified. Note that these two specifications of the instrument matrix lead to different estimators. The projection matrixH N (H 0 N H N ) ?1 H 0 N in the first case has elements in the form H t,N ? P T s=2 H 0 s,N H s,N ? ?1 H 0 t,N while in the second case they are H t,N ? H 0 t,N H t,N ? ?1 H 0 t,N . The case of estimators based on moments averaged only over T will be considered in Section 4.2 below. 71 estimator in equation (4.1.7) yields b ? N = ?+ h ? b Z 0 N ?Z N i ?1 ? b Z 0 N ?u N (4.1.10) = ?+ h ?Z 0 N H N (H 0 N H N ) ?1 H 0 N ?Z N i ?1 ?Z 0 N H N (H 0 N H N ) ?1 H 0 N ?u N . For the instruments to be valid, I make the following assumption. Assumption IV1 The matrix M H?Z (1+p)?(1+p) = plim 1 (T ?1)N H 0 N (1+p)?(T?1)N ?Z N (T?1)N?(1+p) , exist and is finite with full column rank. The matrix M HH (1+p)?(1+p) = plim 1 (T ?1)N H 0 N (1+p)?(T?1)N H N (T?1)N?(1+p) , exists and is nonsingular. We can also define M ?Z (1+p)?(1+p) = plim 1 (T ?1)N ? b Z N (1+p)?(T?1)N ?Z N (T?1)N?(1+p) , (4.1.11) Observe that ? b Z 0 N ?Z N =?Z 0 N H N (H 0 N H N ) ?1 H N ?Z N and hence M ?Z = M H?Z M ?1 HH M H?Z . Assumption IV1 thus implies that M ?Z exists and is fi- nite. Also note that the assumption that the M matrices are finite can be de- 72 rived from earlier restrictions 29 . However, the existence and invertability ofM ?Z and M HH is not guaranteed by Assumptions 1-6. 30 Observe that one could de- rive Assumption IV1 from existence and nonsingularity of the limits such as lim(TN) ?1 P T t=j+1 X 0 t?j,N X t,N . To derive the asymptotic distribution of b ? N , I note that given Assumption IV1, it remains be to shown that the term H 0 N ?u N converges in distribution (when appropriately normalized). It will prove convenient to introduce the following additional notation for lagged exogenous variables X ?2,N (T?1)N?p = ? 0 p?N ,X 0 1,N ,..,X 0 T?2,N ? 0 , (4.1.12) the vector collecting all of the model orthogonal innovations ? N (T+2)N?1 = ? ? 0 N ,? 0 N ,? 0 1,N ,...,? 0 T,N ? 0 , (4.1.13) with? N = P ? j=0 ? j ? ?j,N ,anda(T ?1)?T differenceoperatorDanda(T ?1)? 29 For example, the elements of M HH consist of first and second moments of the stochastic process y it interacted with the exogenous variables. These are bounded by Assumptions 1-6. 30 For example, Arrelano (1989) examines a univariate AR(1) model with first-order autoregres- sive exogenous variables, and finds that when the first differences of endogenous variables lagged twice are used as instruments, there exists a significant range of parameters for which there is a singularity point in the estimator. The paper also suggests that the estimator that uses second lags of the levels of the endogenous variables does not have the singularity problem for a reason- able range of parameters. However, this conclusion does not readily generalize for all possible exogenous variables. 73 (T ?1) matrix? D (T?1)?T = ? ? ? ? ? ? ? ? ? ?11 0??? 0 0 ?11 . . . . . . . . . . . . . . . . . . 0 0 ??? 0 ?11 ? ? ? ? ? ? ? ? ? , ? (T?1)?(T?1) = ? ? ? ? ? ? ? ? ? 1 ? ??? ? T?2 01 . . . . . . . . . ? 0 ??? 01 ? ? ? ? ? ? ? ? ? . (4.1.14) Observe that given Assumptions 1, 2, and 6, the variance covariance matrix of ? N is E(? N ? 0 N ) (T+2)N?(T+2)N = ? ? ?,N (T+2)?(T+2) ?I N ! , (4.1.15) where the (T +2)? (T +2)diagonal matrix? ?,N is ? ?,N (T+2)?(T+2) = diag ? ? 2 ?,N , ? 2 ?,N 1?? ,? 2 ?,N ,...,? 2 ?,N ? . (4.1.16) IfirstexpresstheelementsofH 0 N ?u N (whicharey 0 ?2,N ?u N and?X 0 N ?u N ) in terms of lagged model disturbances and dependent variables: Lemma 1 Under the specification (3.1.4) with Assumptions 1-6 and IV1 we have that y 0 ?2,N ?u N = f 0 N (I T+2 ?P N )? N +? 0 N (F?P 0 N P N )? N , 74 where the (T +2)N ? 1 vector f N is given by f N = ? ? ? ? ? ? 0 2?(T?1) D 0 T?(T?1) ? ? ? ?I N ? ? ? (? 0 ?I N ) (T?1)N?(T?1)N ? ? ? ? ? ? ? ? ? ? X ?2,N (T?1)N?p ? p?1 + ? ? ? E(y 0,N ) N?1 0 (T?2)N?1 ? ? ? (T?1)N?1 ? ? ? ? ? ? ? ? ? ? and the T +2?T +2matrix Fis F (T+2)?(T+2) = ? ? ? ? ? ? ? ? ? 1 1?? 1 1?(T?2) 1 1?1 0 1?(T?2) 0 (T?2)?1 I T?2 0 2?1 0 2?(T?2) ? ? ? ? ? ? ? ? ? (T+2)?(T?1) ? (T?1)?(T?1) ? 0 (T?1)?2 ,D ? (T?1)?(T+2) . Furthermore?X 0 N ?u N can also be expressed as a linear function of? N : ?X 0 N ?u N = ?X 0 N ?? 0 (T?1)?2 ,D ? ?P N ? ? N . Proof. See the Appendix C.1. Notice that as indicated by the subscript, the size of the f N vector depends on the sample size. Since T is fixed, I do not use subscripts for matrices F and D whose size and elements only depend on T and not on N. To determine the asymptotic variance of the estimator, I will make use of the following Lemma that gives an expression for expected value and variance covariance matrix of the moment conditions: 75 Lemma 2 Suppose Assumptions 1-6 hold. The expected value of the vector of quadratic forms H 0 N ?u N is zero. Its variance covariance matrix is given by V N (1+p)?(1+p) = E(H 0 N ?u N ?u 0 N H N ) = S 0 N (1+p)?(T+2)N (? ?,N ?P N P 0 N ) (T+2)N?(T+2)N S N (T+2)N?(1+p) + ? ? ? ? N 0 1?p 0 p?1 0 p?p ? ? ? , where S N = ? ? f N (T+2)N?1 , " ? 0 (T?1)?2 ,D ? 0 (T?1)?(T+2) ?I N # (T?1)N?(T+2)N ?X N (T?1)N?p ? ? (T+2)N?(1+p) , and ? N =2tr ? F S ? ?,N F S ? ?,N ? ?tr(P N P 0 N P N P 0 N ), with F S = 1 2 (F+F 0 ). Proof. See the Appendix C.1. To rule out cases where the moment conditions have zero asymptotic variance, I make the following assumption: Assumption IV2 The smallest eigenvalue of [(T ?1)N] ?1 S 0 N S N is uniformly bounded away from zero for T ? 2. Although S N depends on the sample size, the dimensions of S 0 N S N do not changewithN.Furthermore,noticethattheassumptionalsoimpliesthatE(H 0 N H N ) 76 has eigenvalues uniformly bounded away from zero and, therefore, also implies the invertability of M HH in Assumption IV1. 31 The above Assumption together with Assumption 4 allows us to prove the following Lemma: Lemma 3 Suppose Assumptions 1-4 and IV2 hold. The smallest eigenvalue of [(T ?1)N] ?1 V N is uniformly bounded away from zero for T ? 2. Proof. See the Appendix C.1. The representation of y 0 ?2,N ?u N and?X 0 N ?u N as linear-quadratic forms in ? N , lets us apply a central limit theorem for quadratic forms of triangular arrays and derive the asymptotic distribution of the IV estimator. The central limit theo- rem (CLT) I use is given in Appendix A. It is based on a result from Kelejian and Prucha (2005) and is an extension of a CLT in Kelejian and Prucha (2001). Proposition 1 Under Assumptions 1-6, IV1 and IV2, we have that V ?1/2 N ?H 0 N ?u N d ? N (0,I p+1 ), where ? V 1/2 N ?? V 1/2 N ? 0 = V N . Proof. See the Appendix C.1. 31 However, it does not guarantee the existence of the limit in Assumption IV1. 77 To be able to write down explicit asymptotic distribution of the estimator, I make the following assumption. Assumption IV3 lim N?? 1 (T?1)N V N = V,where V is finite. We then have the following Theorem: Theorem 1 Under Assumptions 1-6, and IV1-IV3, we have that p (T ?1)N? ? b ? N ?? ? d ? N (0,?), with ? (1+p)?(1+p) = M ?1 ?Z (1+p)?(1+p) M 0 H?Z (1+p)?(1+p) M ?1 HH (1+p)?(1+p) ? V (1+p)?(1+p) M ?1 HH (1+p)?(1+p) M H?Z (1+p)?(1+p) M ?1 ?Z (1+p)?(1+p) Proof. See the Appendix C.1. Idonotprovideanestimateof? sinceitwoulddependonanestimateof the P N =(I N ??W N ) ?1 matrix which includes an unknown parameter ?.I will provide small sample guidance for the second stage estimator in Section 4.3. Note that by Theorem 17 in P?tcher and Prucha (2001), the result in the above Theorem implies that p (T ?1)N b ? N isO p (1) and the initial estimator IV satis- fies the conditions required in the following section and hence can be used in the subsequent estimation steps. 78 4.2 Estimation of the Degree of Spatial Autocorrelation The specification in this thesis reduces to that of Kapoor et al. (2005) in the static case (? =0) which is in turn a generalization of the single cross-section case in KelejianandPrucha (1999). Inthis section, Iwill showthatthe procedureadopted in Kapoor et al. (2005) provides a consistent estimate of the spatial autoregres- sive parameter in a dynamic panel data model as well. To do that, I define the generalized moments (GM) estimator following Kapoor et al. (2005) and then extend their proofs for the dynamic case. For simplicity, I only consider one of the weighting schemes for the moment condition in Kapoor et al. (2005). Observe that the spatial GM estimator in this section is essentially the same as the estimator in Kapoor et al. (2005). However, the presence of stochastic regres- sors (lagged dependent variable) renders the proofs in that paper inapplicable to the specification considered in this thesis. Nevertheless, the proofs in this section, with small exceptions (most notably Lemmas C4 and C6 in the Appendix C.2), are a direct analogy of those in Kapoor et al. (2005). I take an initial consistent estimate of the spatially correlated errors and use it to estimate the spatial autocorrelation parameter based on a set of moment condi- tions. The initial consistent estimate of the errors can be, for example based on theIVestimatorintheprevioussection. The moment conditions are chosen so that the estimator will have an Analysis of Variance interpretation. Consider an estimator b ? N p+1?1 of the parameter vector ? p+1?1 such that 79 p (T ?1)N b ? N = O p (1) and denote the predictors of u t by bu t : bu t,N N?1 = y t,N N?1 ?(y t?1,N ,X t,N ) N?p+1 ? b ? N p+1?1 , 1 ?t?T. (4.2.1) The model implies that (see equation 3.1.2 in Chapter 3) u t,N N?1 = ?W N N?N u t,N N?1 +? t,N N?1 , 1 ?t?T, (4.2.2) where? t,N = ? t,N +? N .Inastackednotationthisbecomes u N NT?1 =?(I T ?W N ) NT?NT u N NT?1 + ? N NT?1 , (4.2.3) where u N = ? u 0 1,N ,...,u 0 T,N ? 0 and? N = ? N +(e T ?? N ),with ? N = ? ? 0 1,N ,...,? 0 T,N ? 0 ,e T being aT ?1 vector of unit elements, and? N theN?1 vector of individual effects. It will prove convenient to introduce the following notation: u N =(I T ?W N )u N , (4.2.4) u N =(I T ?W N )u N , ? N =(I T ?W N )? N . I will also use the following transformation matrices that are utilized in the error 80 component literature: Q 0,N NT?NT = ? I T ? J T T ? ?I N , (4.2.5) Q 1,N NT?NT = J T T ?I N , where J T = e T e 0 T is a T ? T matrix of unit elements. 32 Note that using the transformation matrices, we can express the variance-covariance matrix of the innovations as E(? N ? 0 N ) NT?NT = ? 2 ?,N I NT +? 2 ?,N (J T ?I N ) (4.2.6) = ? 2 ?,N Q 0,N +? 2 1,N Q 1,N , where ? 2 1,N = ? 2 ?,N +T ?? 2 ?,N . The spatial GM estimator is based on the following moment conditions: E(? 0 N Q 0,N ? N )=N (T ?1)? 2 ?,N , (4.2.7) E(? 0 N Q 0,N ? N )=(T ?1)? 2 ?,N ?tr(W 0 N W N ), E(? 0 N Q 0,N ? N )=0, E(? 0 N Q 1,N ? N )=N? 2 1,N , E(? 0 N Q 1,N ? N )=? 2 1,N ?tr(W 0 N W N ), E(? 0 N Q 1,N ? N )=0. 32 The Q 1 transformation calculates unit specific sample means while the Q 0 transformation substracts them from the original variable. 81 For derivation of the moment conditions see Kapoor et al. (2005). Notice that based on (4.2.3), the moment conditions can be rewritten in terms of the trans- formed (by Q j,N ) disturbance vectors u N , u N and u N : ? N = ? N ?, (4.2.8) where? = ? ?,? 2 ,? 2 ?,N ,? 2 1,N ? 0 , and ? N 6?4 = E ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 11,N ? 0 12,N ? 0 13,N 0 ? 0 21,N ? 0 22,N ? 0 23,N 0 ? 0 31,N ? 0 32,N ? 0 33,N 0 ? 1 11,N ? 1 12,N 0 ? 1 13,N ? 1 21,N ? 1 22,N 0 ? 1 23,N ? 1 31,N ? 1 32,N 0 ? 1 33,N ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , ? N 6?1 = E ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 1,N ? 0 2,N ? 0 3,N ? 1 1,N ? 1 2,N ? 1 3,N ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , (4.2.9) 82 with (j =0,1) ? j 11,N = 2 N (T ?1) 1?j u 0 N Q j,N u N ,? j 12 = ?1 N (T ?1) 1?j u 0 N Q j,N u N , ? j 21,N = 2 N (T ?1) 1?j u 0 N Q j,N u N ,? j 22 = ?1 N (T ?1) 1?j u 0 N Q j,N u N , ? j 31,N = 1 N (T ?1) 1?j ? u 0 N Q j,N u N +u 0 N Q j,N u N ? , ? j 32,N = ?1 N (T ?1) 1?j u 0 N Q j,N u N , (4.2.10) ? j 13,N =1,? j 1 = 1 N (T ?1) 1?j u 0 N Q j,N u N , ? j 23,N = 1 N tr(W 0 N W N ),? j 2 = 1 N (T ?1) 1?j u 0 N Q j,N u N , ? j 33,N =0,? j 3 = 1 N (T ?1) 1?j u 0 N Q j,N u N . The sample counterparts of the six equations in (4.2.9) replace u N with bu N = ? bu 0 1,N ,...,bu 0 T,N ? 0 basedon(4.2.1)withthe implied notation b u N =(I T ?W N )bu N and b u N =(I T ?W N ) b u N : g N 6?1 = G N 6?4 ? 4?1 +? N 6?1 , (4.2.11) 83 where? N can be viewed as a vector of regression residuals and G N 6?4 = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? g 0 11,N g 0 12,N g 0 13,N 0 g 0 21,N g 0 22,N g 0 23,N 0 g 0 31,N g 0 32,N g 0 33,N 0 g 1 11,N g 1 12,N 0 g 1 13,N g 1 21,N g 1 22,N 0 g 1 23,N g 1 31,N g 1 32,N 0 g 1 33,N ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , g N 6?1 = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? g 0 1,N g 0 2,N g 0 3,N g 1 1,N g 1 2,N g 1 3,N ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , (4.2.12) with (j =0,1) g j 11,N = 2 N (T ?1) 1?j bu 0 N Q j,N b u N ,g j 12 = ?1 N (T ?1) 1?j b u 0 N Q j,N b u N , g j 21,N = 2 N (T ?1) 1?j b u 0 N Q j,N b u N ,g j 22 = ?1 N (T ?1) 1?j b u 0 N Q j,N b u N , g j 31,N = 1 N (T ?1) 1?j ? bu 0 N Q j,N b u N + b u 0 N Q j,N b u N ? , g j 32,N = ?1 N (T ?1) 1?j b u 0 N Q j,N b u N , (4.2.13) g j 13,N =1,g j 1 = 1 N (T ?1) 1?j bu 0 N Q j,N bu N , g j 23,N = 1 N tr(W 0 N W N ),g j 2 = 1 N (T ?1) 1?j b u 0 N Q j,N b u N , g j 33,N =0,g j 3 = 1 N (T ?1) 1?j bu 0 N Q j,N b u N . The generalized moments (GM) estimator of ? = ? ?,? 2 ?,N ,? 2 1,N ? 0 say b ? N can 84 be written as b ? N 3?1 =argmin ??? ?? g N 6?1 ?G N 6?4 ? 4?1 ? 0 A N 6?6 ? g N 6?1 ?G N 6?4 ? 4?1 ?? , (4.2.14) where?is the admissible optimization space; in particular it is assumed that? = ? ?,? 2 ?,N ,? 2 1,N ? ? ?? [0,b 1 ],? 2 ?,N ? [0,b 2 ],? 2 1,N ? [0,b 3 ] ? with b 1 , b 2 and b 3 being predetermined constants. The moments are weighted by a sequence of weighting matricesA N . Following Kapoor et al. (2005), two choices forA N areconsidered. An initial unweighted spatial GM estimators uses A N = I 6 . The second choice is to use an approximation to variance covariance matrix of the moments. In particular, Kapoor et al. (2005) show that under normality the variance covariance matrix of the six moment conditions in (4.2.7) is given by ? N 6?6 = ? ? ? 1 T?1 ? 2 ?N 0 0 ? 2 1N ? ? ? ?T W,N 2?2 , (4.2.15) where T W,N 2?2 = ? ? ? ? ? ? ? 22tr ? W 0 N W N N ? 0 2tr ? W 0 N W N N ? 2tr ? W 0 N W N W 0 N W N N ? tr ? W 0 N W N( W N +W 0 N ) N ? 0 tr ? W 0 N W N( W N +W 0 N ) N ? tr ? W N W N +W 0 N W N N ? ? ? ? ? ? ? ? . (4.2.16) The weighted spatial GM estimator then replaces ? 2 ?N and ? 2 1N by their initial 85 estimators and utilizes the weighting matrices A N = b ? ?1 N ? b ? 2 ?N , b ? 2 1N ? where b ? N ? b ? 2 ?N , b ? 2 1N ? 6?6 = ? ? ? 1 T?1 b ? 2 ?N 0 0 b ? 2 1N ? ? ? ?T W,N 2?2 , (4.2.17) and the estimators b ? 2 ?N , b ? 2 1N are based on the initial unweighted spatial GM esti- mator. The following additional assumption is required in order to establish consis- tency of b ? GM,N (the assumption is used to demonstrate that the estimator is iden- tifiably unique): Assumption GM1 The smallest eigenvalue of? 0 N ? N is uniformly bounded away from zero. Furthermore, 0 1), Lemma 4 and Assumption 3 guarantee that (d 0 t ?P 0 N )a rt,N and (b 0 rt d t ?P 0 N P N ) satisfy Assumption A2. The condition E ? H 0 N k?(T?1)N ?u N (T?1)N?1 ! = 0 k?1 , (4.3.11) then implies that the matrix (b 0 rt d t ?P 0 N P N ) has zeros on the main diagonal and, therefore, the quadratic forms satisfy conditions of Lemma A1. In particular, their variances and covariances can be derived using the expressions in that Lemma. The following Lemma shows that under regularity conditions the quadratic forms h 0 rt,N ?u t,N converge in distribution when normalized by their standard errors. Lemma 5 Consider a set of k instrumentsH N given in (4.3.7), with the diagonal blocks H t,N =(h 1t,N ,...,h k t t,N ) being N ?k t matrices (k = k 2 + .. + k T )with 34 Observe that (d 0 t ?P 0 N )a rt,N then corresponds to the sequence of vectors b t ,while (b 0 rt d t ?P 0 N P N ) corresponds to the sequence of matrices A n ,and? N corresponds to the se- quence of vectors of random variables? n in Theorem A1 in Appendix A. 91 columns h rt,N = a rt,N +(b rt ?P N )? N ,where the sequence of nonstochastic N?1 vectorsa rt,N andthesequenceofnonstochastic 1?(T +2)vectorsb rt have elements uniformly bounded in absolute value. Under Assumptions 1-6, and given that the instruments are such thatE(H 0 N ?u N )=0 k?1 ,E(H 0 N ?u N ?u 0 N H N )= V N and [(T ?1)N] ?1 ? min (V N ) ?c>0,wehavethat V ?1/2 N H 0 N ?u N d ? N (0,I k ), where V N = E(H 0 N ?u N ?u 0 N H N )=V 1/2 N V 1/2 N . Proof. See the Appendix C.3. Given that the moment conditions converge in distribution, the GMM estima- tor defined in (4.3.5) will under appropriate regularity conditions also converge in distribution: Lemma 6 Consider a set of stochastic instruments H N such that V ?1/2 N H 0 N ?u N d ? N (0,I k ), where V N = E(H 0 N ?u N ?u 0 N H N )=V 1/2 N V 1/2 N , with p lim N?? [(T ?1)N] ?1 V N = V, where V is finite. Furthermore, consider a sequence of weighting (possibly sto- 92 chastic) matrices A N with nonsingular (probability) limit p lim N?? A N = A. Under Assumptions 1-6 and given that M H?Z = p lim N?? [(T ?1)N] ?1 H 0 N ?Z, existsandhasfullcolumnrank, wehavethattheGMMestimatordefinedin(4.3.5) converges in distribution and p (T ?1)N ? e ? N ?? ? d ? N (0,?), where ? = ? M ?ZH A ?1 M 0 ?ZH ? ?1 M ?ZH A ?1 VA ?1 M ?ZH ? M ?ZH A ?1 M 0 ?ZH ? ?1 . Proof. See the Appendix C.3. I give a small sample approximation for ? for the specific GMM estima- tor considered below. Note that given Lemmas 4 and 5, the asymptotic result in the above Lemma 6 applies to a general class of GMM estimators which in- cludes the initial IV estimator discussed in Section 4.1, 35 as well as the different 35 The lemma is directly applicable when the moment conditions in the initial IV estimator are averaged only over the cross-sectional units. Note that in Section 4.1, the moment conditions are averaged over both cross-sectional units and time. I have provided the asymptotic results for this 93 variants of the GMM estimators in Arellano and Bond (1991) and, in particular, the feasible GMM estimator discussed below. Note that in applying the above Lemma to these estimators it remains to be checked whether in the particular application the instruments satisfy the stipulated regularity conditions, e.g. that plim N?? [(T ?1)N] ?1 H 0 N ?Z N exists and has full column rank and that the variance covariance matrix of the moment conditions has the smallest eigenvalue uniformly bounded away from zero. I now consider the issue of an optimal choice of the sequence of the weighting matrices, given a set of instruments. I close this section with proving consistency, asymptotic normality and providing a small sample guidance for a feasible second stage GMM estimator based on moment conditions considered in the literature. 4.3.1 Optimal Weighting Matrix ConsidernowtheoptimalchoiceofthesequenceoftheweightingmatricesA N .It can be shown 36 that given a set of instruments, the asymptotic variance covariance matrix of an estimator defined as a minimizer of (4.3.4) is minimized 37 when p lim N?? [(T ?1)N] ?1 A N = V. (4.3.12) initial IV estimator in Theorem 1 above. 36 See e.g. Hansen (1982), Bates and White (1993), Newey and McFadden (1994), or Wooldridge (2002), Ch. 8 and 14. 37 In the sense that the difference with respect to any other VC matrix of an estimator that is a minimizer of (4.3.4) is positive semi-definite. 94 Given that plim N?? [(T ?1)N] ?1 V N = V, the small sample weighting matri- ces A N can be chosen to be estimators of the small sample variance covariance matrix V N = E(H 0 N ?u N ?u 0 N H N ). Observe that the matrix V N can be parti- tioned as V N = ? ? ? ? ? ? V 22,N V 2T,N . . . V T2,N V TT,N ? ? ? ? ? ? , (4.3.13) where V ts,N = E ? H 0 t,N ?u t,N ?u 0 s,N H s,N ? . I denote the ij-th element of V ts,N as v ij,ts,N = E ? h 0 it,N ?u t,N ?u 0 s,N h js,N ? . Given the structure of the instruments assumed in this section, the moment conditions are quadratic forms in ? N and satisfy conditions of Lemma A1 in Appendix A - see the discussion preceding Lemma 5. In particular, we have as in (4.3.10) above: h 0 it,N ?u t,N = a 0 it,N (d t ?P N )? N +? 0 N (b 0 it d t ?P 0 N P N )? N , (4.3.14) and hence from Lemma A1 in Appendix A, the covariance of h 0 it,N ?u t,N and h 0 js,N ?u s,N denoted as v ij,ts,N is given by: v ij,ts,N = a 0 it,N (d t ? ?,N d 0 s ?P N P 0 N )a js,N (4.3.15) +2tr(b 0 it d t ? ?,N d 0 s b js ? ?,N ?P 0 N P N P 0 N P N ), where? ?,N is defined in (4.1.16). Observe that for |s?t| > 1,wehaved t ? ?,N d 0 s =0and hence the above 95 covariance is zero. An expectations based estimator, say b V E N ,ofV N would then replace the true values of the parameters in the above expression by their initial consistent estimates. Note that in addition to ? ?,N and P N ,thetermsa it,N and b it alsopotentiallydependontheparameters of the model (compare e.g. the expressions for a t,N andb t in the proof of Lemma 4 in Appendix C.3). The exact form depends on the choice of the instruments. In Section 4.3.3 below, I consider a set of instruments utilized in the literature (e.g. Arellano and Bond, 1991) and I also provide an expression for such expectation based variance covariance matrix estimator given such choice of instruments. Note that the instruments considered in Section 4.1 are also of the form assumed here; see the proof of Lemma 1. The expression for V N isthengivenbyLemma2. As an alternative to b V E N , the small sample weighting matrices can be con- structed based on approximations to H 0 N E(?u N ?u 0 N )H N . For stochastic in- struments, such estimator will not in general be consistent estimator of E(H 0 N ?u N ?u 0 N H N ). Nevertheless, based on Lemma 6, the resultant second stage GMM estimator is consistent. It is also computationally simpler and has reasonable small sample properties (see Chapter 5). This estimator denoted by b V mix N ignores the fact that the instruments collected in H N are stochastic and replaces the disturbances ?u N ?u 0 N by an estimate of their expected value: b V mix N =[(T ?1)N] ?1 H 0 N b ? ?u,N H N , (4.3.16) 96 where b ? ?u,N isanestimatorofthevariancecovariancematrixofthedisturbances. In our case this could be: b ? ?u,N = b ? 2 ?N ? D? b P N ? Q 0,N ? D 0 ? b P 0 N ? , (4.3.17) whereb? N and b ? 2 ?N are initial estimates and b P N =(I N ?b? N W N ) ?1 . (4.3.18) 4.3.2 Feasible GMM Estimator Consider now a GMM estimator based on the moment conditions of the form E h e H 0 N ?u N i =0, (4.3.19) where e H N = ? ? ? ? ? ? e H 2,N 0 . . . 0 e H T,N ? ? ? ? ? ? N(T?1)?k , (4.3.20) with e H t,N =(y t?2,N ,...,y 0,N ,X t,N ,...,X 1,N ) being a N ? k t matrix of instru- ments at time t.Notethatk t =(t?1) +t?p and k = k 2 +..+k T .Let e V N = E ? e H 0 N ?u N ?u 0 N e H N ? , (4.3.21) 97 then the estimator is given by e ? N = h ?Z 0 N e H N e V ?1 N e H 0 N ?Z N i ?1 ?Z 0 N e H N e V ?1 N e H 0 N ?y N . (4.3.22) The instrument matrix in (4.3.20) utilizes moment conditions of the form E[(u t,i ?u t?1,i )y t?1?s,i ]=0 s =1,..,t?1, (4.3.23) E[(u t,i ?u t?1,i )X t?s,i ]=0 1?p , E[(u t,i ?u t?1,i )X t,i ]=0 1?p . Observe that the instruments consist of y t?1?s,N , X t,N and X t?s,N , s =1,...,t? 1 and hence by Lemma 4 they are linear forms in the innovations of the form considered above, e.g. they satisfy the conditions in Lemma 5. To complete the verification of the conditions stipulated in Lemma 5, I consider the smallest eigenvalues of the sequence of matrices e V N = E ? e H 0 N ?u N ?u 0 N e H N ? . Note that from Lemma 4 it follows that y t,N = a t,N +(b t ?P N )? N .Letus denote e S t,N =[a t?2,N ,...,a 0,N ,X t,N ,...,X 1,N ] N?k t , (4.3.24) and ? t,N = "? (b t?2 ,...,b 0 ) 1?(t?1)(T+2) ? P N N?N !? I t?1 (t?1)?(t?1) ? ? N N(T+2)?1 ! ,0 N?tp # . (4.3.25) 98 The instruments can then be expressed as e H t,N N?k t = e S t,N N?k t +? t,N N?k t , (4.3.26) As a result the full matrix of instruments is e H N = e S N +? N , (4.3.27) where the matrix e S N contains the nonstochastic elements of the instruments and is defined as e S N = ? ? ? ? ? ? e S 2,N 0 . . . 0 e S T,N ? ? ? ? ? ? N(T?1)?k , (4.3.28) while the stochastic components of the instrument matrix are ? N = ? ? ? ? ? ? ? 2,N 0 . . . 0 ? T,N ? ? ? ? ? ? N(T?1)?k . (4.3.29) To guarantee that the smallest eigenvalue of [(T ?1)N] ?1 e V N is uniformly bounded away from zero, I make the following assumption: Assumption GMM1 Thesmallesteigenvalueof[(T ?1)N] ?1 e S 0 N e S N isuniformly bounded away from zero. 99 Given the above Assumption, we have by Lemma 5 that the normalized mo- ment conditions converge in distribution. I next show that the estimator e ? N ,where the weighting matrix for the moment conditions is based on the true value of the parameters is consistent and asymptotically normal. Corresponding to Assump- tions IV1 and IV3 for the first stage estimator, I introduce the following assump- tions. Let f M H?Z (1+p)?(1+p) = plim 1 (T ?1)N e H 0 N ?Z N . (4.3.30) Assumption GMM2 The matrix f M H?Z exist and is finite with full column rank. Assumption GMM3 The matrix e V = plim N?? [(T ?1)N] ?1 e V N exists and is finite and invertible. As a consequence of Lemma 6, we now have the following Theorem. Theorem 3 Under Assumptions 1-6, and GMM1-GMM3, we have that p (T ?1)N ? e ? N ?? ? d ? N [0,?], where ? = h f M 0 H?Z e V ?1 f M H?Z i ?1 . 100 Proof. See the appendix. The above estimator is based on the true value of the parameters which are unknown and have to be estimated. I now provide an expression for the expec- tation based estimator of the variance covariance matrix of the moment condi- tions e V N , denoted by b V N ? b ? N ? where b ? N is an initial consistent estimator of ? = ? ? N ,? 2 ?N ,? 2 ? ,? ? . I then show that when the feasible GMM estimator uses h b V N ? b ? N ?i ?1 as the moment weighting matrix, the parameters collected in the vector ? are nuisance parameters. Thevariancecovariancematrixofthemomentconditionscollectedin e H 0 N ?u N with e Hdefined in (4.3.20), can be written analogically to (4.3.13) as e V N = ? ? ? ? ? ? e V 22,N e V 2T,N . . . e V T2,N e V TT,N ? ? ? ? ? ? , (4.3.31) where e V ts,N = E ? e H 0 t,N ?u t,N ?u 0 s,N e H s,N ? .Since e H t,N consists of stochastic part (y t?2,N ,..,y 0,N ) and nonstochastic part (X t,N ,...,X 1,N ),thematrix e V ts,N is partitioned accordingly: 38 e V ts,N = ? ? ? e V y ts,N 0 (t?1)?tp 0 sp?(s?1) e V X ts,N ? ? ? k t ?k s , (4.3.32) 38 I show that the off-diagonal blocks of e V ts,N are matrices of zeros as a part of the proof of the Lemma 7 below. 101 where the upper block is e V y ts,N = ? ev y qr,ts,N ? q=1,..,t?1 r=1,..,s?1 , (4.3.33) with ev y qr,ts,N = E ? y 0 t?1?q ?u t,N ?u 0 s,N y s?1?r ? . Given expressions fory t?1?q and y s?1?r in Lemma 4 and the expressions for ?u t,N and ?u s,N in (4.3.9), the mo- ment conditions y 0 t?1?q ?u t,N and y 0 s?1?r ?u s,N arequadraticformsin? N and their covariance is (using Lemma A1 in Appendix A) given by ev y qr,ts,N = E ? y t?1?q ?u t,N ?u 0 s,N y s?1?r ? (4.3.34) = a 0 t?1?q,N (d t ? ?,N d 0 s ?P N P 0 N )a s?1?r,N +2tr ? b 0 t?1?q,N d t ? ?,N d 0 s b s?1?r,N ? ?,N ?P 0 N P N P 0 N P N ? . Notethatby(4.3.9), thedisturbances?u t,N arelinearformsintheinnovations ? N . From Lemma A1 in Appendix it then follows that the variance covariance matrix of ?u t,N and?u s,N is (d t ? ?,N d 0 s ?P N P 0 N ). Hence the second block of b V ts,N is: e V X ts,N =(X t,N ,...,X 1,N ) 0 E ? ?u t,N ?u 0 s,N ? (X s,N ,...,X 1,N ) (4.3.35) =(X t,N ,...,X 1,N ) 0 (d t ? ?,N d 0 s ?P N P 0 N )(X s,N ,...,X 1,N ). The estimator b V N ? b ? N ? replaces the true values in the expressions (4.3.31)- (4.3.35) by their initial estimates collected in the vector b ? N = ? b? N , b ? 2 ?N , c ? 2 ? , b ? ? 0 . 102 In particular, it replaces? ?,N ,P N , a t,N ,andb t,N with b ? ?,N = diag ? b? 2 ?,N , b? 2 ?,N 1? b ? ,b? 2 ?,N ,...,b? 2 ?,N ! , (4.3.36) b P N =(I N ?b? N W N ) ?1 , ba t,N = t?1 X j=0 b ? j N X t?j,N b ? N , b b t,N = ? 1 1? b ? N ,1, b ? t?1 N ,.., b ? 0 N ,0 1?(T?t) ! . Note that in order to for the estimator ofthevariancecovariancematrixof the moment conditions to be feasible, this implicitly assumes that the past values of the exogenous variables are so that P ? j=0 ? j X ?j,N ? =0,i.e. thereareno individual effects other than those contained in? i . 39 The following Lemma shows that the estimator b V N is consistent. Lemma 7 Under Assumptions 1-6, and GMM1-GMM3, and given that b ? N p ? ? as N ?? , the row and column sums of the matrix rW N are uniformly bounded in absolute value for some r with |?| 0. Note that a sufficient condition for Assumption A2 is that the row and column sums of A n and the elements of b n are uniformly bounded in absolute value. Assumption A3 For r =1,...,mwe assume that one of the following two con- ditions holds. (a) sup 1?i?n,n?1 E|? i,n | 2+? 2 0 and a ii,r,n =0. (b) sup 1?i?n,n?1 E|? i,n | 4+? 2 0 (but possibly a ii,r,n 6=0). Consider the quadratic forms q r,n = ? 0 n A r,n ? n +b 0 r,n ? n (A.2) and definethevectoroflinearquadraticforms q n = ? ? ? ? ? ? q 1,n . . . q m,n ? ? ? ? ? ? , (A.3) 125 and let ? q n = Eq n , (A.4) ? q n = E(q n ?Eq n )(q n ?Eq n ) 0 denote the mean vector and the variance covariance matrix ofq n .Then ? q n = ? ? ? ? ? ? ? q 1,n . . . ? q m,n ? ? ? ? ? ? , ? q n = ? ? ? ? ? ? ? q 11,n ??? ? q 1m,n . . . . . . . . . ? q m1,n ??? ? q mm,n ? ? ? ? ? ? . (A.5) where? q r,n and? q rs,n denote the mean ofq r,n and the covariance betweenq r,n and q s,n , respectively, for r,s =1,...,m.WenowhavethefollowingCLT. Theorem A1 Suppose Assumptions A1-A3 hold and n ?1 ? min (? q n ) ?c for some c>0.Let? q n = ? ? 1/2 q n ?? ? 1/2 q n ? 0 ,then ? ?1/2 q n ? q n ?? q n ? d ? N (0,I m ). Of course, the theorem remains valid, if all assumptions are assumed to hold forn>n 0 where n 0 is finite. The above theorem can also be applied to situations wheren = TNwithT finite andN ?? ; see footnote 13 in Kelejian and Prucha (2001). I now illustrate this in more detail. Suppose, we have sample sizes T,2T,3T,...,NT,...,? as N ?? and the random variables are triangular ar- 126 rays is ? 1 =(? 11,1 ,...,? T1,1 ) 0 (A.6) ? 2 =(? 11,2 ,? 12,2 ,...,? T1,2 ,? T2,2 ) 0 . . . ? N =(? 11,N ,...,? 1N,N ,? 21,N ,...,? 2N,N ,...,? T1,N ,...,? TN,N ) 0 , Consider the sequence of vectors of linear quadratic forms and the vectors of linear quadratic forms v N =(v 1,N ,...,v m,N ) 0 , (A.7) with v r,N = ? 0 N A r,TN ? N +b 0 r,TN ? N . (A.8) As above, we denote by ? v N and ? v N the mean vector and variance covariance matrix of the vector v N . Supposethattherandomvariablescollectedin? N satisfyAssumptionsA1and A3, and the sequences of matrices A r,TN and vectors b r,TN satisfy Assumption A2. 127 We can define additional triangular arrays of sizes between tN and (t+1)N to obtain a sequence ? 1 =(? 11,1 ) (A.9) ? 2 =(? 11,1 ,? 21,1 ) 0 . . . ? T =(? 11,1 ,...,? T1,1 ) 0 ? T+1 =(? 11,2 ,...,? T1,2 ,? 12,2 ) 0 (A.10) ? T+2 =(? 11,2 ,...,? T1,2 ,? 12,2 ,? 12,2 ,? 22,2 ) 0 . . . ? 2T =(? 11,2 ,...,? T1,2 ,? 12,2 ,...,? T2,2 ) 0 . . . ? NT =(? 11,N ,...,? 1N,N ,? 21,N ,...,? 2N,N ,...,? T1,N ,...,? TN,N ) 0 . Observe that the new sequence ? n satisfies Assumptions A1 and A3 and that for n = NT we have? n = ? N . Similarly, we can extend the sequence of vectors of linear quadratic forms to q n =(q 1,n ,...,q m,n ) 0 , (A.11) where q r,n = ? 0 n A r,n ? n +b 0 r,n ? n , (A.12) 128 with A r,n = ? ? ? ? ? ? ? ? ? A r, [ n T ] T 0 [ n T ] T?1 ??? 0 [ n T ] T?1 0 1? [ n T ] T a 11,r, [ n T ] T+1 ??? a k1,r, [ n T ] T+1 . . . . . . . . . . . . 0 1? [ n T ] T a 1k,r, [ n T ] T+1 ??? a kk,r, [ n T ] T+1 ? ? ? ? ? ? ? ? ? , (A.13) b r,n = ? ? ? ? ? ? ? ? ? b r, [ n T ] T b 1,r, [ n T ] T+1 . . . b k,r, [ n T ] T+1 ? ? ? ? ? ? ? ? ? , andk = n? ? n T ? T,whereIuse ? r s ? to denote the whole part of a rational number r s . Observe that by definition forn = NT,wehaveq n = v N .Furthermore,since A r,n and b r,n satisfy Assumption A2 for n = NT, it follows from the construc- tion of A r,NT and b r,NT that they satisfy AssumptionA2foralln. As a result, quadratic forms q n fulfill conditions of Theorem A1 and ? ?1/2 q n ? q n ?? q n ? d ? N (0,I m ) as n ?? , where as before ? q n and ? q n denote the mean vector and variance covariance matrix of the vector q n . Hence the sequence of distribu- tion functions of? ?1/2 q n ? q n ?? q n ? converges weakly to the distribution function of N (0,I m ). We now select a subsequence from the distribution functions of ? ?1/2 q n ? q n ?? q n ? for n = NT (we treat T as a fixed constant) and observe that theseareequivalenttothesequenceofdistributionfunctionsof? ?1/2 v N ? v N ?? v N ? . 129 This subsequence must have the same limitand,asaconsequence,wehavethat ? ?1/2 v N ? v N ?? v N ? d ? N (0,I m ), (A.14) as N ?? . 130 B Appendix: Proof of Claims in Chapter 3 Lemma B1 :Let? j , j ? N, be a sequence of totally independent real valued random variables with E|? j | p ? k ? < ? for some 2 ? p0 there exist and index N ? such that m+k X i=m+1 |a i | 0 and ? = ? ? /(k p a k ? ),thenby argumentation analogous to above E ? ? ? m+k ?? m ? ? p = E ? ? ? ? ? m+k X i=1 a i ? i ? m X i=1 a i ? i ? ? ? ? ? p (B.4) ? k p/q a k ? m+k X i=m+1 |a i | ?k p a k ? ? = ? ?, for all m ? N ? and k ? 0. Thus under the maintained assumptions the sequence ? m is Cauchy in L p . By Theorem 7 in Shiryayev (1984, p.258) we then have that the sequence ? m converges in p-thmeantoarandomvariableinL p ,which 132 implies that ? exists as a limit in p-th mean. Of course, since for r ?p k? m ??k r ? k? m ??k p , (B.5) by Lyapunov?s inequality it follows that ? m converges to ? also in r-th mean for 0 0 such thatE|y it,N | r ?k y j+1and hence F ii = P T?1 j=1 A ij {?B} ji =0. Hence I can use Lemma A1 to derive the mean and variances and covariances of y 0 ?2,N ?u N and?X 0 N ?u N . In particular, we have that E ? y 0 ?2,N ?u N ? = E(?X 0 N ?u N )=0, (C.1.17) and VC ? y 0 ?2,N ?u N ? = f 0 N (? ?,N ?P N P 0 N )f N (C.1.18) +2tr ? F S ? ?,N F S ? ?,N ?P 0 N P N P 0 N P N ? = f 0 N (? ?,N ?P N P 0 N )f N +? N , VC(?X 0 N ?u N )=?X 0 N ?? 0 (T?1)?2 ,D ? ?I N ? (? ?,N ?P N P 0 N )? h ? 0 (T?1)?2 ,D ? 0 ?I N i ?X N . (C.1.19) 46 Notethatthebothmatrices?andDareupperdiagonal(inthesensethattheirij?thelements are zero for i 0,byAssumptionIV2 we have that ? min ? [(T ?1)N] ?1 S 0 N S N ? ? c S > 0.Since? ?,N is diagonal, we have ? min (? ?,N )=min ? ? 2 ? ,var ? ? i,N ? ,? 2 ? ? =min h ? 2 ? , ? 2 ? 1?? 2 ,? 2 ? i ? c ? > 0 and hence [(T ?1)N] ?1 ? min (V N ) ?c S c ? c P > 0. ProofofProposition1:The result in the Proposition is a special case of the general result in Lemma 5 in Section 4.3, 47 whichisinturnbasedontheCLTin TheoremA1inAppendixA.HereIverifydirectlythattheconditionsofTheorem A1 hold. 47 The conditions of that Lemma are satisfiedsincebyLemma1(andalsoLemma4inSection 4.3), the instruments y ?2,N and ?X N are linear forms in the innovations of the form assumed in Lemma 5. Furthermore, by Lemma 3, the smallest eigenvalue V N is uniformly bounded away from zero. Finally, the moment conditions are valid since by Lemma 2, we have E (H 0 N ?u N )= 0. Therefore, conditions of Lemma 5 are satisfied and we have thatV ?1/2 N H 0 N ?u N d ? N (0,I). 149 The moment conditions are H 0 N ?u N = ? ? ? ? ? ? H 2,N . . . H T,N ? ? ? ? ? ? 0 ?u N = ? ? ? ? ? ? (y 0,N ,?X 2,N ) . . . (y T?2,N ,?X T,N ) ? ? ? ? ? ? 0 ?u N (C.1.29) = ? ? ? ? ? ? ? ? ? ? ? ? y 0,N . . . y T?2,N ? ? ? ? ? ? , ? ? ? ? ? ? ?X 2,N . . . ?X T,N ? ? ? ? ? ? ? ? ? ? ? ? 0 ?u N = ? ? ? y 0 ?2,N ?u N ?X 0 N ?u N ? ? ? Observe that by Lemma 1, the instruments y ?2,N and ?X N are linear forms in the innovations and, as a result, the moment conditions collected in H 0 N ?u N are linear quadratic form in the innovations ? N = ? ? 0 N ,? 0 N ,? 0 1,N ,...,? 0 T,N ? , (C.1.30) where ? N = P ? j=0 ? j ? ?j,N . By Assumptions 1 and 6 it follows from Lemma B1 in Appendix B that the random variable? N satisfiesconditionA3inAppendixA. Therefore, by Assumptions 1 and 2, the elements of the innovations ? N satisfy conditions A1 and A3 in Appendix A. By Lemma 2, the variance covariance matrix of the moment conditions col- lectedinH 0 N ?u N isV N andbyLemma3,thesmallesteigenvalueof[(T ?1)N] ?1 V N is uniformly bounded away from zero. Hence it remains to be shown that the lin- ear quadratic forms collected in H 0 N ?u N satisfy condition A2 in Appendix A. 150 Note that from Lemma 1 we have that the elements ofH 0 N ?u N are y 0 ?2,N ?u N = f 0 N (I T+2 ?P N )? N +? 0 N (F?P 0 N P N )? N , (C.1.31) and ?X 0 N ?u N =?X 0 N ?? 0 (T?1)?2 ,D ? ?P N ? ? N . (C.1.32) Observe that any finite sum, product or Kronecker product of matrices with row and column sums uniformly bounded in absolute value will also have row andcol- umn sums uniformly bounded in absolute value; see Kelejian and Prucha (2001d) for details. From Lemma 1, we have that f 0 N = ? ? 0 X 0 ?2,N + ? E ? y 0 0,N ? ,0 1?(T?2)N ??? ? ? 0 (T?1)?2 ,D ? ?I N ? . (C.1.33) Elements and dimensions of? ? 0 (T?1)?2 ,D ? do not depend onN and hence triv- ially ? ? ? 0 (T?1)?2 ,D ? ?I N ? hasrowandcolumn sumsuniformlyboundedinab- solute value. Elements of the vector ? 0 X 0 ?2,N are uniformly bounded in absolute valuebyAssumption5andelementsof ? E ? y 0 0,N ? ,0 1?(T?2)N ? are uniformly bounded in absolute value since, as demonstrated by Lemma B3 in Appendix B, y it has uniformly bounded 4+? moments for some ?>0. Together we then have that f N haselementsuniformly bounded in absolute value. The sequence of matrices P N has row and column sums uniformly bounded in absolute value (Assumption 3) and hence elements off 0 N (I T+2 ?P N ) are uniformly bounded in 151 absolute value. Similarly, by Assumptions 5 and 3, ?X 0 N ?? 0 (T?1)?2 ,D ? ?P N ? has row and column sums uniformly bounded in absolute value. Finally, since dimensions of F do not change with N and its elements are also independent of N, the matrix (F?P 0 N P N ) has row and column sums uniformly bounded in absolute value. This completes the verification of conditions of Theorem A1 and, therefore, we have that V ?1/2 N H 0 N ?u N d ? N (0,I). Proof of Theorem 1: From equation (4.1.10) we have p (T ?1)N ? b ? N ?? ? (C.1.34) = p (T ?1)N h ?Z 0 N H N (H 0 N H N ) ?1 H 0 N ?Z N i ?1 ? ?Z 0 N H N (H 0 N H N ) ?1 H 0 N ?u N = p (T ?1)N " ?Z 0 N H N (T ?1)N ? H 0 N H N (T ?1)N ? ?1 H 0 N ?Z N (T ?1)N # ?1 ? ?Z 0 N H N (T ?1)N ? H 0 N H N (T ?1)N ? ?1 H 0 N ?u N (T ?1)N = " ?Z 0 N H N (T ?1)N ? H 0 N H N (T ?1)N ? ?1 H 0 N ?Z N (T ?1)N # ?1 ? ?Z 0 N H N (T ?1)N ? H 0 N H N (T ?1)N ? ?1 H 0 N ?u N p (T ?1)N . Given Assumptions IV1 and IV3, our result follows from Proposition 1 in this thesisandCorollary5inP?tcherandPrucha(2001). 152 C.2 Proofs for Section 4.2 I now give a sequence of Lemmas that will be used to prove Theorem 2. I use the notation k.k to denote the matrix norm kMk := [tr(M 0 M)] 1/2 . Lemma C4 Let bu N be based on a N 1/2 consistent estimate of ?. Then under Assumptions 1-6 we can write u N ?bu N = D N ? N . where the random matrix D N has elements d ij,N that have uniformly bounded absolute 4+? moments for some?>0,i.e.E|d ij,N | 4+? ?c d 0. 153 The nonstochastic elements of D N are uniformly bounded in absolute value by Assumption 5 and hence also their 4+? power is uniformly bounded in absolute value. ThusD N hasuniformlyboundedabsolute 4+?momentsforsome?>0. Note that the claim in the above lemma also holds for 2+? moments since by Lyapunov?s inequality, E|y i,t?1,N | 2+? ? h E|y i,t?1,N | (4+?) i (2+?)/(4+?) ?k (2+?)/(4+?) y 0 is a condition stipulated in the Lemma. 160 ProofofLemma6:Substituting the model (equation 4.3.1) into the definition of the GMM estimator in (4.3.5) leads to: p (T ?1)N ? e ? N ?? ? (C.3.5) = p (T ?1)N ? ?Z 0 N H N A ?1 N H 0 N ?Z N ? ?1 ? ?Z 0 N H N A ?1 N H 0 N ?u N = " ?Z 0 N H N (T ?1)N ? A N (T ?1)N ? ?1 H 0 N ?Z N (T ?1)N # ?1 ? ?Z 0 N H N (T ?1)N ? A N (T ?1)N ? ?1 H 0 N ?u N p (T ?1)N . By assumption in the lemma we have that V ?1/2 N H 0 N ?u N d ? N (0,I k ) with [(T ?1)N] ?1 V N p ? V finite. Hence by Corollary 5 in P?tcher and Prucha (2001), we have ? V N (T ?1)N ? 1/2 V ?1/2 N H 0 N ?u N = H 0 N ?u N p (T ?1)N d ? N (0,V). (C.3.6) Furthermore, the lemma assumes that ?Z 0 N H N (T ?1)N p ? M ?ZH , (C.3.7) A N (T ?1)N p ? A, whereM ?ZH is finite with full column rank andAis finite and invertible. Hence, by Corollary 5 in P?tcher and Prucha (2001), we have the desired result. 161 Proof of Theorem 3: Observe that the instruments collected in e H N consist of y t,N and columns of X t,N and hence by Lemma 4 are linear forms of the innova- tions of the form assumed in Lemma 5 and satisfy its conditions. Below I verify that [(T ?1)N] ?1 e V N has the smallest eigenvalue uniformly bounded away from zero. This will complete verification of conditions of Lemma 5 and hence we will have that e V ?1/2 N e H 0 N ?u N d ? N ? 0, e V ? . Observe that using the expression e H N = e S N +? N ,where e S N is the nonsto- chastic part of the instruments (see Section 4.3.3), we have [(T ?1)N] ?1 e V N =[(T ?1)N] ?1 E ? e H 0 N ?u N ?u 0 N e H N ? (C.3.8) =[(T ?1)N] ?1 E h? e S 0 N +? 0 N ? ?u N ?u 0 N ? e S N +? N ?i =[(T ?1)N] ?1 ? e V 1,N + e V 2,N + e V 3,N + e V 4,N ? , where e V 1,N = e S 0 N E(?u N ?u 0 N ) e S N (C.3.9) e V 2,N = e S 0 N E(?u N ?u 0 N ? N ) e V 3,N = E(? 0 N ?u N ?u 0 N ) e S N e V 4,N = E(? 0 N ?u N ?u 0 N ? N ). In the following I show that the smallest eigenvalue of [(T ?1)N] ?1 e V 1,N is uniformly bounded away from zero. I also show that e V 2,N = 0,and e V 3,N = 0. Sincetheeigenvaluesof e V 4,N arenonnegativeitthenfollowsfromLemmaC1that 162 the smallest eigenvalue of [(T ?1)N] ?1 e V N is uniformly bounded away from zero. Using ?u N = ?? 0 (T?1)?2 ,D ? ?P N ? ? N , (C.3.10) whereasin(4.1.15)E(? N ? 0 N )=(? ?,N ?I N ), it follows that e V 1,N = e S 0 N h ? 0 (T?1)?2 ,D ? ? ?,N ? 0 (T?1)?2 ,D ? 0 ?P N P 0 N i e S N . (C.3.11) By Lemma C2 the smallest eigenvalue of e V 1,N is then ? min ? e V 1,N ? ? ? min ? e S 0 N e S N ? ? (C.3.12) ? min h ? 0 (T?1)?2 ,D ? ? ?,N ? 0 (T?1)?2 ,D ? 0 ?(P N P 0 N ) i = ? min ? e S 0 N e S N ? ?? min [(D? ?,N D 0 )?(P N P 0 N )] = ? min ? e S 0 N e S N ? ?? min (D? ?,N D 0 ) ?? min (P N P 0 N ) = ? min ? e S 0 N e S N ? ?? min (DD 0 ) ?? min (? ?,N ) ?? min (P N P 0 N ),, where I also used Theorem 4.2.12 in Horn and Johnson (1991). Observe that from the definition of the first difference operator matrix D (see 4.1.14), it follows that DD 0 =2I T?1 and hence ? min (DD 0 )=2.Since? ?,N is diagonal, we have ? min (? ?,N )=min ? ? 2 ? ,var ? ? i,N ? ,? 2 ? ? =min h ? 2 ? , ? 2 ? 1?? 2 ,? 2 ? i ? c ? > 0.By Assumption 4 we have that ? min (P N P 0 N ) ?c P > 0 and, therefore ? min ? e V 1,N ? ? 2c ? c P ? min ? e S 0 N e S N ? . (C.3.13) 163 From Assumption GMM1 we have that? min ? [(T ?1)N] ?1 e S 0 t,N e S t,N ? ?c S > 0 and hence [(T ?1)N] ?1 ? min ? e V 1,N ? ? 2c ? c P c S > 0. (C.3.14) Next, I show that e V 2,N and e V 3,N are matrices of zeros. Recall that ? N con- sistsofblocks? t,N onthemaindiagonalandzeroselsewhere. Thus? 0 N ?u N ?u 0 N consists of blocks ? 0 t,N ?u t,N ?u 0 t,N on the main diagonal and zeros elsewhere. Observe that ? t,N =[((b t?2 ,...,b 0 )?P N )(I t?1 ?? N ),0 N?tp ], (C.3.15) and thus ? 0 t,N ?u t,N ?u 0 t,N = ? ? ? ? ? ? ? ? ? ? 0 N (b 0 0 ?P 0 N ) . . . ? 0 N ? b 0 t?2 ?P 0 N ? 0 tp?N ? ? ? ? ? ? ? ? ? ?u t,N ?u 0 t,N . (C.3.16) Observe that?u t,N =(d t ?P N )? N (as in 4.3.9) and thus ? 0 N ? b 0 t?s ?P 0 N ? ?u t,N = ? 0 N ? b 0 t?s d t ?P 0 N P N ? ? N , (C.3.17) where d t is a (t+1)? th row of ? 0 (T?1)?2 ,D ? ,withthe(T ?1) ? T matrix D is defined in (4.1.14). Hence the 1 ? (T +2)vector d t is a row vector with zeros in the first t positions. Furthermore, the 1?(T +2)vector b t?s (defined in 164 the proof of Lemma 4 above) has zero entries starting from position (t?2+s). Asaresult,fors>1, the product b 0 t?s d t is a (T +2)? (T +2)matrix with zeros on the main diagonal. Hence? 0 N ? b 0 t?s ?P 0 N ? ?u t,N is a quadratic form in the innovations ? N with zeros on the main diagonal (and no linear component). Each element of ?u t,N is a linear form in innovations ? N and hence can also be treated as a linear-quadratic form in ? N where the matrix defining the quadratic component consists of zeros. As a result, we can apply Lemma A1 in Appendix A to obtain that the covariance of? 0 N ? b 0 t?s ?P 0 N ? ?u t,N and?u it,N is zero. Thus it follows that E ? ? 0 N ? b 0 t?s ?P 0 N ? ?u t,N ?u 0 t,N ? =0, (C.3.18) where s>1, implying that E(? 0 N ?u N ?u 0 N ) isamatrixofzeros.Asaconse- quence e V 2,N = E(? 0 N ?u N ?u 0 N ) e S N = 0 k?k . (C.3.19) The same argument implies that e V 3,N is a matrix of zeros. Finally, observe that the matrix e V 4,N is itself a variance covariance matrix (i.e. symmetric positive semidefinite) and thus it has non-negative eigenvalues. This completes the verification of the conditions of Lemma 5 and hence we have that e V ?1/2 N e H 0 N ?u N d ? N ? 0, e V ? . We can now write the estimator as e ? N = ?+ h ?Z 0 N e H N e V ?1 N e H 0 N ?Z N i ?1 ?Z 0 N e H N e V ?1 N e H 0 N ?u N , (C.3.20) 165 where by Assumptions GMM2 and GMM3, p lim N?? 1 (T ?1)N ?Z 0 N e H N = f M H?Z , (C.3.21) and p lim N?? 1 (T ?1)N e V N = e V. (C.3.22) Therefore by Lemma 6, the estimator converges in distribution with p (T ?1)N ? e ? N ?? ? d ? N (0,?), (C.3.23) where ? = ? f M ?ZH e V ?1 f M 0 ?ZH ? ?1 f M ?ZH e V ?1 e V e V ?1 f M ?ZH ? ? f M ?ZH e V ?1 f M 0 ?ZH ? ?1 (C.3.24) = ? f M ?ZH e V ?1 f M 0 ?ZH ? ?1 = ?, which is the claim in the Theorem. To prove Lemma 7, I will use Lemma C.6 in Kelejian and Prucha (2005). For convenience of the reader, I restate a simplified version of that lemma: Lemma C7 Let a n and b n be sequences of n ? 1 vectors and let W n be a se- quence of n ? n matrices. Assume that the vectors a n and b n have elements uniformly bounded in absolute value and that the matrices (rW n ) have row and 166 columnsumsuniformlyboundedinabsolutevalueforr<1byoneandsome finite constant respectively. Consider a sequence of random variablese? n converging in probability to ? as n ?? ,where|?| 0), it follows that the diagonal elements of the quadratic forms are zeros. Because elements of X 0 t?q,N ?u t,N are linear forms in? N , it follows from Lemma A1 that E ? X 0 t?q,N ?u t,N ?u 0 s,N y s?1?r,N ? = 0 p?1 , and hence the off-diagonal blocks in both e V ts,N and b V ts,N are matrices of zeros. Thus we have together that [N (T ?1)] ?1 ? e V ts,N ? b V ts,N ? p ? 0 k t ?k s , (C.3.45) 178 or by repeating the above arguments for other values of t and s that [N (T ?1)] ?1 ? e V N ? b V N ? p ? 0 k?k . (C.3.46) From [(T ?1)N] ?1 e V N p ? e V N (Assumption GMM3) it now follows that [(T ?1)N] ?1 b V N p ? e V. Proof of Theorem 4: The feasible second stage GMM estimator is ? ? N ? b ? N ? = h ?Z 0 N e H N b V ?1 N ? b ? N ? e H 0 N ?Z N i ?1 ?Z 0 N e H N b V ?1 N ? b ? N ? e H 0 N ?y N . (C.3.47) To prove the claim it suffices to show that, see e.g. Schmidt (1976), p. 71: ? 1,N =[N (T ?1)] ?1 ?Z 0 N e H N b V ?1 N ? b ? N ? e H 0 N ?Z N ?[N (T ?1)] ?1 ?Z 0 N e H N e V ?1 N e H 0 N ?Z N p ? 0, (C.3.48) and ? 2,N =[N (T ?1)] ?1/2 ?Z 0 N e H N b V ?1 N ? b ? N ? ? e H 0 N ?u N ?[N (T ?1)] ?1/2 ?Z 0 N e H N e V ?1 N e H 0 N ?u N p ? 0. (C.3.49) Note that ? 1,N =[N (T ?1)] ?1 ?Z 0 N e H N ? (C.3.50) ? ? ? [N (T ?1)] ?1 b V N ? b ? N ?? ?1 ? ? [N (T ?1)] ?1 e V N ? ?1 ? ?[N (T ?1)] ?1 e H 0 N ?Z N . 179 From Lemma 7 and Assumption GMM3, it folows that the matrices [(T ?1)N] ?1 b V N ? b ? N ? and [N (T ?1)] ?1 e V N both converge to e V in probabil- ity. Since by Assumption GMM3 the matrix e Vis finite and nonsingular, it follows from Theorem 14 in P?tscher and Prucha (2001) that ? ? [N (T ?1)] ?1 b V N ? b ? N ?? ?1 ? ? [N (T ?1)] ?1 e V N ? ?1 ? = o p (1). (C.3.51) Given Assumption GMM2, it then follows that? 1,N p ? 0. Similarlywehavefor? 2,N : ? 2,N =[N (T ?1)] ?1 ?Z 0 N e H N ? (C.3.52) ? ? ? [N (T ?1)] ?1 b V N ? b ? N ?? ?1 ? ? [N (T ?1)] ?1 e V N ? ?1 ? ?[N (T ?1)] ?1/2 e H 0 N ?u N , whereasabove [N (T ?1)] ?1 ?Z 0 N e H N p ? f M 0 H?Z , (C.3.53) and ? ? [N (T ?1)] ?1 b V N ? b ? N ?? ?1 ? ? [N (T ?1)] ?1 e V N ? ?1 ? p ? 0 k?k . (C.3.54) Note that from Lemma 5, it follows that e V ?1/2 N e H 0 N ?u d ? N (0,I k ).GivenAs- 180 sumption GMM3, it follows from Theorem 15 in P?tscher and Prucha (2001) that e H 0 N ?u N [N (T ?1)] 1/2 = ? e V N N (T ?1) ! 1/2 ? e V ?1/2 N e H 0 N ?u N d ? N ? 0, e V ? (C.3.55) Hence by Corollary 5, part (a), in P?tscher and Prucha (2001), we have that ? 2,N p ? 0. ProofofLemma8:Given Assumption GMM3, the claim follows directly from C.3.48. 181 D Appendix: Tables of Monte Carlo Results 182 True Values ?? W RMSE Bias RMSE Bias RMSE Bias -0.90 -0.90 1 0.0977 0.0057 0.0948 0.0088 0.0883 -0.0169 -0.75 -0.90 1 0.1213 0.0079 0.1094 0.0087 0.1030 -0.0252 -0.25 -0.90 1 0.2367 0.0139 0.1601 0.0110 0.1634 -0.0858 0.00 -0.90 1 0.3246 0.0151 0.1885 0.0110 0.2089 -0.1369 0.25 -0.90 1 0.5168 0.0140 0.2223 0.0100 0.2758 -0.2090 0.75 -0.90 1 0.9098 -0.1899 0.2126 0.0099 0.3447 -0.2852 0.90 -0.90 1 0.3725 -0.0178 0.1594 0.0050 0.2694 -0.2165 -0.90 -0.50 1 0.0452 0.0014 0.0428 0.0016 0.0411 -0.0066 -0.75 -0.50 1 0.0554 0.0021 0.0502 0.0020 0.0477 -0.0097 -0.25 -0.50 1 0.1004 0.0044 0.0784 0.0013 0.0731 -0.0284 0.00 -0.50 1 0.1386 0.0047 0.0927 0.0041 0.0862 -0.0427 0.25 -0.50 1 0.2112 0.0070 0.1098 0.0059 0.1030 -0.0638 0.75 -0.50 1 1.0751 -0.2148 0.1156 0.0055 0.1167 -0.0864 0.90 -0.50 1 0.2612 -0.0031 0.0877 0.0043 0.0988 -0.0734 -0.90 -0.25 1 0.0372 0.0011 0.0362 0.0025 0.0343 -0.0046 -0.75 -0.25 1 0.0458 0.0008 0.0421 0.0021 0.0410 -0.0068 -0.25 -0.25 1 0.0866 0.0026 0.0655 0.0014 0.0606 -0.0199 0.00 -0.25 1 0.1187 0.0022 0.0781 0.0024 0.0701 -0.0296 0.25 -0.25 1 0.1711 0.0030 0.0902 0.0037 0.0816 -0.0441 0.75 -0.25 1 1.2234 -0.3342 0.0948 0.0052 0.0914 -0.0606 0.90 -0.25 1 0.2557 -0.0095 0.0733 0.0019 0.0783 -0.0533 -0.90 0.00 1 0.0364 0.0018 0.0362 0.0011 0.0346 -0.0048 -0.75 0.00 1 0.0446 0.0015 0.0413 0.0010 0.0396 -0.0064 -0.25 0.00 1 0.0845 0.0016 0.0644 0.0020 0.0591 -0.0181 0.00 0.00 1 0.1135 0.0022 0.0736 0.0014 0.0651 -0.0257 0.25 0.00 1 0.1660 0.0028 0.0856 0.0008 0.0766 -0.0391 0.75 0.00 1 1.3519 -0.3439 0.0884 0.0058 0.0849 -0.0538 0.90 0.00 1 0.2572 -0.0050 0.0713 0.0050 0.0725 -0.0487 -0.90 0.25 1 0.0385 0.0000 0.0365 0.0018 0.0357 -0.0055 -0.75 0.25 1 0.0477 0.0012 0.0430 0.0022 0.0427 -0.0084 -0.25 0.25 1 0.0884 0.0012 0.0669 0.0030 0.0629 -0.0196 0.00 0.25 1 0.1229 0.0022 0.0804 0.0032 0.0714 -0.0295 0.25 0.25 1 0.1824 0.0022 0.0927 0.0019 0.0825 -0.0423 0.75 0.25 1 1.2421 -0.3463 0.0979 0.0032 0.0912 -0.0622 0.90 0.25 1 0.2583 -0.0029 0.0768 0.0046 0.0792 -0.0544 Table D1 Initial IV Estimators of ? Estimator AH1 AH2 AB 183 True Values ?? W RMSE Bias RMSE Bias RMSE Bias -0.90 0.50 1 0.0473 0.0002 0.0443 0.0013 0.0426 -0.0071 -0.75 0.50 1 0.0574 0.0010 0.0528 0.0016 0.0497 -0.0107 -0.25 0.50 1 0.1058 0.0028 0.0807 0.0047 0.0732 -0.0264 0.00 0.50 1 0.1449 0.0019 0.0992 0.0040 0.0885 -0.0428 0.25 0.50 1 0.2197 0.0029 0.1153 0.0064 0.1037 -0.0616 0.75 0.50 1 1.0355 -0.2428 0.1204 0.0041 0.1190 -0.0896 0.90 0.50 1 0.2672 -0.0110 0.0948 0.0042 0.1001 -0.0751 -0.90 0.90 1 0.0950 -0.0029 0.0960 0.0013 0.0916 -0.0229 -0.75 0.90 1 0.1178 -0.0037 0.1145 0.0013 0.1100 -0.0321 -0.25 0.90 1 0.2298 -0.0047 0.1761 0.0073 0.1691 -0.0896 0.00 0.90 1 0.3335 -0.0058 0.2008 0.0084 0.2131 -0.1363 0.25 0.90 1 0.5477 -0.0176 0.2251 0.0143 0.2764 -0.2062 0.75 0.90 1 0.9974 -0.1566 0.2144 0.0061 0.3543 -0.2889 0.90 0.90 1 0.3929 -0.0086 0.1662 -0.0005 0.2672 -0.2119 -0.90 -0.90 2 0.0408 0.0006 0.0379 0.0002 0.0372 -0.0067 -0.75 -0.90 2 0.0498 -0.0002 0.0448 0.0004 0.0434 -0.0091 -0.25 -0.90 2 0.0937 0.0001 0.0676 0.0001 0.0655 -0.0235 0.00 -0.90 2 0.1300 -0.0027 0.0821 -0.0015 0.0788 -0.0353 0.25 -0.90 2 0.1905 0.0008 0.0923 0.0003 0.0881 -0.0509 0.75 -0.90 2 1.0960 -0.2450 0.0989 0.0011 0.0985 -0.0708 0.90 -0.90 2 0.2442 -0.0024 0.0770 0.0018 0.0853 -0.0604 -0.90 -0.50 2 0.0367 0.0015 0.0356 0.0009 0.0349 -0.0052 -0.75 -0.50 2 0.0451 0.0003 0.0413 0.0010 0.0412 -0.0074 -0.25 -0.50 2 0.0864 -0.0011 0.0638 0.0004 0.0603 -0.0202 0.00 -0.50 2 0.1170 0.0020 0.0770 -0.0002 0.0696 -0.0283 0.25 -0.50 2 0.1750 0.0060 0.0859 -0.0002 0.0806 -0.0423 0.75 -0.50 2 1.1873 -0.3030 0.0921 0.0035 0.0897 -0.0599 0.90 -0.50 2 0.2555 -0.0040 0.0723 0.0025 0.0762 -0.0521 -0.90 -0.25 2 0.0364 0.0015 0.0349 0.0011 0.0347 -0.0043 -0.75 -0.25 2 0.0441 0.0010 0.0403 0.0009 0.0400 -0.0064 -0.25 -0.25 2 0.0849 -0.0002 0.0640 0.0017 0.0583 -0.0180 0.00 -0.25 2 0.1141 0.0037 0.0747 0.0000 0.0677 -0.0270 0.25 -0.25 2 0.1697 0.0049 0.0844 0.0015 0.0782 -0.0406 0.75 -0.25 2 1.3170 -0.3437 0.0901 0.0032 0.0863 -0.0561 0.90 -0.25 2 0.2563 -0.0088 0.0703 0.0040 0.0732 -0.0489 Table D1 cont. Initial IV Estimators of ? Estimator AH1 AH2 AB 184 True Values ?? W RMSE Bias RMSE Bias RMSE Bias -0.90 0.00 2 0.0364 0.0018 0.0362 0.0011 0.0346 -0.0048 -0.75 0.00 2 0.0446 0.0015 0.0413 0.0010 0.0396 -0.0064 -0.25 0.00 2 0.0845 0.0016 0.0644 0.0020 0.0591 -0.0181 0.00 0.00 2 0.1135 0.0022 0.0736 0.0014 0.0651 -0.0257 0.25 0.00 2 0.1660 0.0028 0.0856 0.0008 0.0766 -0.0391 0.75 0.00 2 1.3519 -0.3439 0.0884 0.0058 0.0849 -0.0538 0.90 0.00 2 0.2572 -0.0050 0.0713 0.0050 0.0725 -0.0487 -0.90 0.25 2 0.0370 0.0011 0.0360 0.0015 0.0345 -0.0047 -0.75 0.25 2 0.0465 0.0011 0.0427 0.0017 0.0399 -0.0069 -0.25 0.25 2 0.0855 0.0013 0.0656 0.0034 0.0604 -0.0202 0.00 0.25 2 0.1192 0.0034 0.0783 0.0033 0.0693 -0.0289 0.25 0.25 2 0.1760 0.0026 0.0886 0.0029 0.0794 -0.0413 0.75 0.25 2 1.2863 -0.3508 0.0909 0.0043 0.0880 -0.0584 0.90 0.25 2 0.2585 0.0024 0.0755 0.0047 0.0759 -0.0516 -0.90 0.50 2 0.0416 0.0011 0.0409 0.0024 0.0391 -0.0063 -0.75 0.50 2 0.0507 0.0012 0.0479 0.0030 0.0460 -0.0090 -0.25 0.50 2 0.0958 0.0016 0.0721 0.0041 0.0679 -0.0250 0.00 0.50 2 0.1344 0.0075 0.0894 0.0060 0.0797 -0.0354 0.25 0.50 2 0.1997 0.0094 0.1063 0.0060 0.0935 -0.0526 0.75 0.50 2 1.2256 -0.2925 0.1091 0.0093 0.1046 -0.0755 0.90 0.50 2 0.2678 -0.0126 0.0885 0.0055 0.0907 -0.0659 -0.90 0.90 2 0.1252 -0.0041 0.1163 0.0077 0.1105 -0.0267 -0.75 0.90 2 0.1538 -0.0030 0.1380 0.0121 0.1285 -0.0365 -0.25 0.90 2 0.2908 -0.0019 0.2099 0.0137 0.2005 -0.1060 0.00 0.90 2 0.4118 -0.0013 0.2549 0.0156 0.2457 -0.1611 0.25 0.90 2 0.6497 -0.0186 0.2939 0.0166 0.3227 -0.2403 0.75 0.90 2 1.2519 -0.3148 0.2742 0.0071 0.4062 -0.3263 0.90 0.90 2 0.5361 -0.0507 0.2255 0.0068 0.3408 -0.2655 -0.90 -0.90 3 0.0392 0.0016 0.0370 0.0020 0.0364 -0.0052 -0.75 -0.90 3 0.0474 0.0021 0.0431 0.0023 0.0419 -0.0075 -0.25 -0.90 3 0.0900 0.0035 0.0664 0.0026 0.0635 -0.0184 0.00 -0.90 3 0.1228 0.0058 0.0790 0.0015 0.0728 -0.0291 0.25 -0.90 3 0.1857 0.0068 0.0916 0.0021 0.0834 -0.0442 0.75 -0.90 3 1.1327 -0.3200 0.0931 0.0023 0.0901 -0.0631 0.90 -0.90 3 0.2562 -0.0151 0.0741 0.0041 0.0777 -0.0534 Estimator AH1 AH2 AB Table D1 cont. Initial IV Estimators of ? 185 True Values ?? W RMSE Bias RMSE Bias RMSE Bias -0.90 -0.50 3 0.0372 0.0014 0.0359 0.0012 0.0344 -0.0042 -0.75 -0.50 3 0.0455 0.0021 0.0417 0.0007 0.0411 -0.0067 -0.25 -0.50 3 0.0886 0.0012 0.0642 0.0015 0.0605 -0.0181 0.00 -0.50 3 0.1181 0.0045 0.0764 0.0016 0.0685 -0.0278 0.25 -0.50 3 0.1757 0.0043 0.0863 0.0009 0.0791 -0.0404 0.75 -0.50 3 1.2685 -0.3750 0.0893 0.0046 0.0855 -0.0572 0.90 -0.50 3 0.2589 -0.0104 0.0714 0.0056 0.0736 -0.0499 -0.90 -0.25 3 0.0363 0.0019 0.0358 0.0005 0.0348 -0.0038 -0.75 -0.25 3 0.0454 0.0014 0.0413 0.0008 0.0397 -0.0062 -0.25 -0.25 3 0.0858 0.0008 0.0641 0.0015 0.0596 -0.0183 0.00 -0.25 3 0.1163 0.0041 0.0744 0.0018 0.0668 -0.0270 0.25 -0.25 3 0.1672 0.0051 0.0857 0.0020 0.0764 -0.0384 0.75 -0.25 3 1.3017 -0.3856 0.0886 0.0052 0.0843 -0.0555 0.90 -0.25 3 0.2542 -0.0107 0.0702 0.0051 0.0720 -0.0484 -0.90 0.00 3 0.0364 0.0018 0.0362 0.0011 0.0346 -0.0048 -0.75 0.00 3 0.0446 0.0015 0.0413 0.0010 0.0396 -0.0064 -0.25 0.00 3 0.0845 0.0016 0.0644 0.0020 0.0591 -0.0181 0.00 0.00 3 0.1135 0.0022 0.0736 0.0014 0.0651 -0.0257 0.25 0.00 3 0.1660 0.0028 0.0856 0.0008 0.0766 -0.0391 0.75 0.00 3 1.3519 -0.3439 0.0884 0.0058 0.0849 -0.0538 0.90 0.00 3 0.2572 -0.0050 0.0713 0.0050 0.0725 -0.0487 -0.90 0.25 3 0.0374 0.0009 0.0364 0.0015 0.0344 -0.0045 -0.75 0.25 3 0.0456 0.0013 0.0417 0.0016 0.0395 -0.0054 -0.25 0.25 3 0.0869 0.0003 0.0651 0.0021 0.0597 -0.0193 0.00 0.25 3 0.1154 0.0028 0.0772 0.0018 0.0693 -0.0276 0.25 0.25 3 0.1709 0.0037 0.0889 0.0025 0.0795 -0.0396 0.75 0.25 3 1.3535 -0.3464 0.0878 0.0023 0.0864 -0.0568 0.90 0.25 3 0.2564 0.0049 0.0739 0.0034 0.0743 -0.0494 -0.90 0.50 3 0.0405 0.0006 0.0397 0.0020 0.0379 -0.0052 -0.75 0.50 3 0.0493 0.0014 0.0472 0.0022 0.0447 -0.0081 -0.25 0.50 3 0.0940 0.0036 0.0720 0.0021 0.0674 -0.0227 0.00 0.50 3 0.1316 0.0046 0.0855 0.0039 0.0774 -0.0328 0.25 0.50 3 0.1919 0.0072 0.1013 0.0039 0.0889 -0.0471 0.75 0.50 3 1.2627 -0.3083 0.1014 0.0098 0.0954 -0.0670 0.90 0.50 3 0.2714 -0.0080 0.0802 0.0042 0.0847 -0.0588 Estimator AH1 AH2 AB Table D1 cont. Initial IV Estimators of ? 186 True Values ?? W RMSE Bias RMSE Bias RMSE Bias -0.90 0.90 3 0.1371 -0.0009 0.1252 0.0076 0.1180 -0.0288 -0.75 0.90 3 0.1699 -0.0027 0.1503 0.0126 0.1369 -0.0393 -0.25 0.90 3 0.3219 0.0020 0.2256 0.0187 0.2058 -0.1003 0.00 0.90 3 0.4354 0.0032 0.2673 0.0204 0.2551 -0.1572 0.25 0.90 3 0.6794 0.0100 0.3249 0.0148 0.3273 -0.2381 0.75 0.90 3 1.3234 -0.3345 0.3122 0.0082 0.4046 -0.3253 0.90 0.90 3 0.5756 -0.0721 0.2463 0.0068 0.3432 -0.2676 Table D1 cont. Initial IV Estimators of ? Estimator AH1 AH2 AB 187 True Values ?? W RMSE Bias RMSE Bias RMSE Bias -0.90 -0.90 1 0.0853 -0.0065 0.0713 -0.0082 0.0850 -0.0016 -0.75 -0.90 1 0.0987 -0.0093 0.0845 -0.0147 0.0987 -0.0070 -0.25 -0.90 1 0.1419 -0.0425 0.1333 -0.0536 0.1468 -0.0411 0.00 -0.90 1 0.1676 -0.0678 0.1616 -0.0822 0.1736 -0.0735 0.25 -0.90 1 0.1989 -0.0998 0.1934 -0.1158 0.2113 -0.1165 0.75 -0.90 1 0.1773 -0.0866 0.1757 -0.1082 0.2431 -0.1575 0.90 -0.90 1 0.1279 -0.0562 0.1291 -0.0787 0.1783 -0.0911 -0.90 -0.50 1 0.0417 -0.0030 0.0404 -0.0027 0.0426 0.0003 -0.75 -0.50 1 0.0490 -0.0039 0.0462 -0.0040 0.0499 -0.0022 -0.25 -0.50 1 0.0703 -0.0140 0.0681 -0.0138 0.0693 -0.0113 0.00 -0.50 1 0.0845 -0.0224 0.0783 -0.0186 0.0773 -0.0162 0.25 -0.50 1 0.0983 -0.0317 0.0927 -0.0306 0.0883 -0.0236 0.75 -0.50 1 0.0901 -0.0319 0.0868 -0.0378 0.0928 -0.0410 0.90 -0.50 1 0.0668 -0.0221 0.0688 -0.0274 0.0777 -0.0279 -0.90 -0.25 1 0.0347 -0.0017 0.0349 -0.0017 0.0367 0.0005 -0.75 -0.25 1 0.0408 -0.0027 0.0406 -0.0025 0.0424 -0.0008 -0.25 -0.25 1 0.0585 -0.0094 0.0589 -0.0085 0.0589 -0.0078 0.00 -0.25 1 0.0688 -0.0147 0.0667 -0.0146 0.0658 -0.0118 0.25 -0.25 1 0.0796 -0.0232 0.0765 -0.0210 0.0737 -0.0150 0.75 -0.25 1 0.0744 -0.0260 0.0771 -0.0281 0.0784 -0.0309 0.90 -0.25 1 0.0584 -0.0180 0.0594 -0.0188 0.0683 -0.0202 -0.90 0.00 1 0.0338 -0.0012 0.0338 -0.0013 0.0349 0.0012 -0.75 0.00 1 0.0386 -0.0021 0.0388 -0.0024 0.0403 -0.0003 -0.25 0.00 1 0.0572 -0.0091 0.0568 -0.0090 0.0556 -0.0058 0.00 0.00 1 0.0649 -0.0127 0.0646 -0.0126 0.0634 -0.0086 0.25 0.00 1 0.0738 -0.0189 0.0734 -0.0191 0.0707 -0.0133 0.75 0.00 1 0.0744 -0.0252 0.0742 -0.0252 0.0764 -0.0284 0.90 0.00 1 0.0572 -0.0167 0.0573 -0.0168 0.0657 -0.0205 -0.90 0.25 1 0.0345 -0.0024 0.0344 -0.0024 0.0378 -0.0005 -0.75 0.25 1 0.0410 -0.0029 0.0403 -0.0035 0.0422 -0.0008 -0.25 0.25 1 0.0594 -0.0099 0.0585 -0.0099 0.0572 -0.0072 0.00 0.25 1 0.0696 -0.0143 0.0688 -0.0149 0.0641 -0.0086 0.25 0.25 1 0.0770 -0.0228 0.0776 -0.0213 0.0739 -0.0118 0.75 0.25 1 0.0765 -0.0271 0.0787 -0.0291 0.0790 -0.0319 0.90 0.25 1 0.0619 -0.0197 0.0611 -0.0206 0.0696 -0.0236 Table D2 Second Stage GMM Estimators of ? Estimator ignoring mix exp 188 True Values ?? W RMSE Bias RMSE Bias RMSE Bias -0.90 0.50 1 0.0400 -0.0016 0.0390 -0.0034 0.0432 -0.0021 -0.75 0.50 1 0.0465 -0.0032 0.0471 -0.0047 0.0502 -0.0021 -0.25 0.50 1 0.0699 -0.0115 0.0678 -0.0131 0.0642 -0.0105 0.00 0.50 1 0.0792 -0.0191 0.0779 -0.0191 0.0736 -0.0131 0.25 0.50 1 0.0928 -0.0306 0.0890 -0.0314 0.0821 -0.0203 0.75 0.50 1 0.0938 -0.0340 0.0962 -0.0396 0.0923 -0.0426 0.90 0.50 1 0.0710 -0.0237 0.0723 -0.0284 0.0788 -0.0329 -0.90 0.90 1 0.0899 -0.0101 0.0765 -0.0138 0.0863 -0.0055 -0.75 0.90 1 0.1042 -0.0139 0.0882 -0.0171 0.1009 -0.0072 -0.25 0.90 1 0.1466 -0.0369 0.1290 -0.0453 0.1445 -0.0418 0.00 0.90 1 0.1733 -0.0592 0.1529 -0.0719 0.1709 -0.0700 0.25 0.90 1 0.1917 -0.0890 0.1819 -0.1069 0.2045 -0.1085 0.75 0.90 1 0.1767 -0.0865 0.1862 -0.1129 0.2410 -0.1530 0.90 0.90 1 0.1372 -0.0623 0.1417 -0.0823 0.1722 -0.0912 -0.90 -0.90 2 0.0367 -0.0028 0.0372 -0.0028 0.0399 -0.0005 -0.75 -0.90 2 0.0421 -0.0038 0.0439 -0.0040 0.0456 -0.0018 -0.25 -0.90 2 0.0611 -0.0108 0.0604 -0.0112 0.0595 -0.0069 0.00 -0.90 2 0.0713 -0.0175 0.0696 -0.0163 0.0681 -0.0126 0.25 -0.90 2 0.0834 -0.0265 0.0828 -0.0262 0.0780 -0.0185 0.75 -0.90 2 0.0812 -0.0278 0.0828 -0.0304 0.0871 -0.0365 0.90 -0.90 2 0.0615 -0.0175 0.0631 -0.0196 0.0703 -0.0230 -0.90 -0.50 2 0.0342 -0.0024 0.0345 -0.0021 0.0373 0.0007 -0.75 -0.50 2 0.0400 -0.0031 0.0407 -0.0033 0.0432 -0.0014 -0.25 -0.50 2 0.0579 -0.0093 0.0579 -0.0085 0.0561 -0.0060 0.00 -0.50 2 0.0655 -0.0131 0.0658 -0.0137 0.0628 -0.0088 0.25 -0.50 2 0.0763 -0.0211 0.0767 -0.0211 0.0731 -0.0148 0.75 -0.50 2 0.0752 -0.0246 0.0766 -0.0261 0.0802 -0.0308 0.90 -0.50 2 0.0586 -0.0171 0.0596 -0.0181 0.0666 -0.0208 -0.90 -0.25 2 0.0339 -0.0016 0.0340 -0.0013 0.0347 0.0011 -0.75 -0.25 2 0.0399 -0.0027 0.0400 -0.0025 0.0408 -0.0008 -0.25 -0.25 2 0.0563 -0.0086 0.0574 -0.0093 0.0560 -0.0057 0.00 -0.25 2 0.0645 -0.0123 0.0649 -0.0130 0.0615 -0.0093 0.25 -0.25 2 0.0762 -0.0188 0.0763 -0.0185 0.0693 -0.0140 0.75 -0.25 2 0.0756 -0.0255 0.0753 -0.0256 0.0779 -0.0303 0.90 -0.25 2 0.0577 -0.0168 0.0585 -0.0171 0.0651 -0.0206 Table D2 cont. Second Stage GMM Estimators of ? Estimator ignoring mix exp 189 True Values ?? W RMSE Bias RMSE Bias RMSE Bias -0.90 0.00 2 0.0338 -0.0012 0.0337 -0.0016 0.0349 0.0013 -0.75 0.00 2 0.0386 -0.0021 0.0387 -0.0021 0.0401 -0.0005 -0.25 0.00 2 0.0572 -0.0091 0.0565 -0.0090 0.0549 -0.0060 0.00 0.00 2 0.0649 -0.0127 0.0650 -0.0127 0.0621 -0.0084 0.25 0.00 2 0.0738 -0.0189 0.0734 -0.0192 0.0704 -0.0133 0.75 0.00 2 0.0744 -0.0252 0.0745 -0.0253 0.0766 -0.0284 0.90 0.00 2 0.0572 -0.0167 0.0570 -0.0166 0.0655 -0.0203 -0.90 0.25 2 0.0341 -0.0007 0.0346 -0.0013 0.0362 0.0001 -0.75 0.25 2 0.0401 -0.0028 0.0397 -0.0026 0.0411 -0.0009 -0.25 0.25 2 0.0581 -0.0084 0.0573 -0.0085 0.0571 -0.0059 0.00 0.25 2 0.0674 -0.0137 0.0680 -0.0137 0.0635 -0.0089 0.25 0.25 2 0.0754 -0.0203 0.0766 -0.0208 0.0723 -0.0138 0.75 0.25 2 0.0748 -0.0275 0.0741 -0.0277 0.0773 -0.0299 0.90 0.25 2 0.0589 -0.0191 0.0584 -0.0191 0.0673 -0.0213 -0.90 0.50 2 0.0384 -0.0023 0.0379 -0.0029 0.0383 -0.0016 -0.75 0.50 2 0.0453 -0.0030 0.0447 -0.0030 0.0449 -0.0023 -0.25 0.50 2 0.0661 -0.0106 0.0614 -0.0109 0.0611 -0.0080 0.00 0.50 2 0.0736 -0.0155 0.0708 -0.0173 0.0673 -0.0121 0.25 0.50 2 0.0850 -0.0258 0.0814 -0.0235 0.0752 -0.0156 0.75 0.50 2 0.0859 -0.0305 0.0858 -0.0350 0.0830 -0.0357 0.90 0.50 2 0.0676 -0.0223 0.0684 -0.0258 0.0724 -0.0271 -0.90 0.90 2 0.1070 -0.0068 0.0657 -0.0109 0.0730 -0.0073 -0.75 0.90 2 0.1243 -0.0118 0.0759 -0.0150 0.0837 -0.0103 -0.25 0.90 2 0.1726 -0.0500 0.1142 -0.0420 0.1173 -0.0326 0.00 0.90 2 0.2035 -0.0835 0.1381 -0.0656 0.1349 -0.0479 0.25 0.90 2 0.2406 -0.1138 0.1691 -0.0929 0.1492 -0.0674 0.75 0.90 2 0.2403 -0.1234 0.2314 -0.1464 0.1812 -0.1127 0.90 0.90 2 0.1807 -0.0827 0.1811 -0.1088 0.1585 -0.0905 -0.90 -0.90 3 0.0356 -0.0014 0.0359 -0.0015 0.0390 0.0004 -0.75 -0.90 3 0.0413 -0.0038 0.0417 -0.0028 0.0443 -0.0011 -0.25 -0.90 3 0.0589 -0.0095 0.0603 -0.0088 0.0595 -0.0063 0.00 -0.90 3 0.0688 -0.0148 0.0691 -0.0138 0.0679 -0.0116 0.25 -0.90 3 0.0821 -0.0215 0.0819 -0.0206 0.0762 -0.0152 0.75 -0.90 3 0.0779 -0.0252 0.0791 -0.0270 0.0840 -0.0339 0.90 -0.90 3 0.0610 -0.0172 0.0613 -0.0177 0.0712 -0.0232 Table D2 cont. Second Stage GMM Estimators of ? Estimator ignoring mix exp 190 True Values ?? W RMSE Bias RMSE Bias RMSE Bias -0.90 -0.50 3 0.0344 -0.0019 0.0342 -0.0018 0.0370 0.0011 -0.75 -0.50 3 0.0395 -0.0034 0.0397 -0.0027 0.0418 0.0000 -0.25 -0.50 3 0.0574 -0.0088 0.0569 -0.0078 0.0574 -0.0066 0.00 -0.50 3 0.0658 -0.0128 0.0664 -0.0121 0.0643 -0.0098 0.25 -0.50 3 0.0783 -0.0189 0.0783 -0.0187 0.0711 -0.0138 0.75 -0.50 3 0.0742 -0.0249 0.0747 -0.0249 0.0781 -0.0307 0.90 -0.50 3 0.0587 -0.0170 0.0603 -0.0174 0.0670 -0.0217 -0.90 -0.25 3 0.0338 -0.0016 0.0339 -0.0015 0.0357 0.0015 -0.75 -0.25 3 0.0391 -0.0023 0.0396 -0.0021 0.0407 -0.0004 -0.25 -0.25 3 0.0564 -0.0085 0.0569 -0.0083 0.0562 -0.0064 0.00 -0.25 3 0.0659 -0.0130 0.0665 -0.0135 0.0626 -0.0094 0.25 -0.25 3 0.0756 -0.0179 0.0745 -0.0182 0.0700 -0.0140 0.75 -0.25 3 0.0744 -0.0247 0.0746 -0.0244 0.0752 -0.0292 0.90 -0.25 3 0.0582 -0.0172 0.0588 -0.0173 0.0662 -0.0208 -0.90 0.00 3 0.0338 -0.0012 0.0336 -0.0012 0.0349 0.0013 -0.75 0.00 3 0.0386 -0.0021 0.0387 -0.0022 0.0401 -0.0006 -0.25 0.00 3 0.0572 -0.0091 0.0569 -0.0089 0.0554 -0.0059 0.00 0.00 3 0.0649 -0.0127 0.0651 -0.0129 0.0621 -0.0091 0.25 0.00 3 0.0738 -0.0189 0.0738 -0.0190 0.0706 -0.0136 0.75 0.00 3 0.0744 -0.0252 0.0741 -0.0253 0.0763 -0.0286 0.90 0.00 3 0.0572 -0.0167 0.0573 -0.0167 0.0658 -0.0204 -0.90 0.25 3 0.0347 -0.0009 0.0349 -0.0012 0.0354 0.0000 -0.75 0.25 3 0.0395 -0.0023 0.0398 -0.0023 0.0405 -0.0008 -0.25 0.25 3 0.0574 -0.0094 0.0582 -0.0090 0.0551 -0.0058 0.00 0.25 3 0.0658 -0.0137 0.0665 -0.0137 0.0614 -0.0091 0.25 0.25 3 0.0758 -0.0201 0.0777 -0.0208 0.0713 -0.0136 0.75 0.25 3 0.0741 -0.0270 0.0741 -0.0274 0.0776 -0.0284 0.90 0.25 3 0.0588 -0.0186 0.0591 -0.0186 0.0666 -0.0203 -0.90 0.50 3 0.0381 -0.0015 0.0364 -0.0029 0.0378 -0.0015 -0.75 0.50 3 0.0449 -0.0027 0.0439 -0.0039 0.0429 -0.0021 -0.25 0.50 3 0.0633 -0.0097 0.0604 -0.0100 0.0580 -0.0079 0.00 0.50 3 0.0720 -0.0162 0.0688 -0.0156 0.0646 -0.0117 0.25 0.50 3 0.0809 -0.0219 0.0805 -0.0232 0.0729 -0.0157 0.75 0.50 3 0.0816 -0.0309 0.0850 -0.0340 0.0808 -0.0325 0.90 0.50 3 0.0650 -0.0225 0.0677 -0.0253 0.0698 -0.0244 Table D2 cont. Second Stage GMM Estimators of ? Estimator ignoring mix exp 191 True Values ?? W RMSE Bias RMSE Bias RMSE Bias -0.90 0.90 3 0.1131 -0.0056 0.0570 -0.0086 0.0666 -0.0091 -0.75 0.90 3 0.1307 -0.0109 0.0674 -0.0131 0.0740 -0.0130 -0.25 0.90 3 0.1833 -0.0499 0.0986 -0.0335 0.1008 -0.0252 0.00 0.90 3 0.2080 -0.0836 0.1128 -0.0459 0.1135 -0.0371 0.25 0.90 3 0.2467 -0.1224 0.1455 -0.0736 0.1227 -0.0489 0.75 0.90 3 0.2560 -0.1356 0.2234 -0.1442 0.1462 -0.0799 0.90 0.90 3 0.1988 -0.0946 0.1803 -0.1092 0.1385 -0.0702 Table D2 cont. Second Stage GMM Estimators of ? Estimator ignoring mix exp 192 True Values ?? W RMSE Bias RMSE Bias RMSE Bias RMSE Bias -0.90 -0.90 1 0.028 0.013 0.028 0.013 0.033 0.016 0.018 -0.001 -0.75 -0.90 1 0.028 0.014 0.028 0.013 0.033 0.016 0.018 -0.001 -0.25 -0.90 1 0.033 0.019 0.030 0.014 0.036 0.019 0.018 -0.001 0.00 -0.90 1 0.039 0.024 0.030 0.016 0.040 0.024 0.018 -0.001 0.25 -0.90 1 0.052 0.032 0.030 0.017 0.048 0.030 0.018 -0.001 0.75 -0.90 1 0.069 0.046 0.029 0.017 0.055 0.037 0.018 -0.001 0.90 -0.90 1 0.039 0.024 0.029 0.015 0.045 0.030 0.018 -0.001 -0.90 -0.50 1 0.047 0.003 0.047 0.003 0.047 0.005 0.048 -0.001 -0.75 -0.50 1 0.047 0.004 0.047 0.004 0.048 0.005 0.048 -0.001 -0.25 -0.50 1 0.048 0.006 0.047 0.004 0.047 0.007 0.048 -0.001 0.00 -0.50 1 0.047 0.008 0.048 0.005 0.048 0.007 0.048 -0.001 0.25 -0.50 1 0.052 0.014 0.047 0.005 0.050 0.011 0.048 -0.001 0.75 -0.50 1 0.115 0.067 0.047 0.006 0.055 0.018 0.048 -0.001 0.90 -0.50 1 0.057 0.020 0.047 0.005 0.052 0.014 0.048 -0.001 -0.90 -0.25 1 0.057 0.001 0.056 0.000 0.057 0.001 0.057 -0.001 -0.75 -0.25 1 0.056 0.001 0.056 0.000 0.057 0.001 0.057 -0.001 -0.25 -0.25 1 0.058 0.001 0.057 0.001 0.056 0.002 0.057 -0.001 0.00 -0.25 1 0.057 0.002 0.057 0.001 0.058 0.003 0.057 -0.001 0.25 -0.25 1 0.057 0.005 0.057 0.001 0.057 0.004 0.057 -0.001 0.75 -0.25 1 0.088 0.041 0.056 0.002 0.057 0.007 0.057 -0.001 0.90 -0.25 1 0.061 0.011 0.057 0.001 0.058 0.006 0.057 -0.001 -0.90 0.00 1 0.061 -0.001 0.061 -0.001 0.060 -0.001 0.061 -0.001 -0.75 0.00 1 0.061 -0.001 0.061 -0.001 0.060 -0.001 0.061 -0.001 -0.25 0.00 1 0.061 -0.001 0.060 -0.002 0.061 -0.001 0.061 -0.001 0.00 0.00 1 0.062 -0.001 0.060 -0.002 0.061 -0.001 0.061 -0.001 0.25 0.00 1 0.061 -0.001 0.060 -0.002 0.061 -0.001 0.061 -0.001 0.75 0.00 1 0.075 -0.001 0.062 -0.001 0.060 -0.001 0.061 -0.001 0.90 0.00 1 0.063 0.001 0.061 -0.001 0.060 0.000 0.061 -0.001 -0.90 0.25 1 0.059 -0.003 0.060 -0.003 0.058 -0.003 0.058 -0.001 -0.75 0.25 1 0.059 -0.003 0.059 -0.003 0.057 -0.003 0.058 -0.001 -0.25 0.25 1 0.059 -0.004 0.059 -0.004 0.057 -0.004 0.058 -0.001 0.00 0.25 1 0.058 -0.005 0.058 -0.005 0.058 -0.005 0.058 -0.001 0.25 0.25 1 0.061 -0.007 0.059 -0.005 0.060 -0.006 0.058 -0.001 0.75 0.25 1 0.100 -0.049 0.060 -0.005 0.061 -0.008 0.058 -0.001 0.90 0.25 1 0.064 -0.011 0.060 -0.005 0.061 -0.006 0.058 -0.001 Table D3 Unweighted Spatial GM Estimators of ? True Initial Estimator AH1 AH2 AB 193 True Values ?? W RMSE Bias RMSE Bias RMSE Bias RMSE Bias -0.90 0.50 1 0.051 -0.006 0.051 -0.006 0.051 -0.006 0.049 -0.001 -0.75 0.50 1 0.051 -0.006 0.050 -0.005 0.051 -0.007 0.049 -0.001 -0.25 0.50 1 0.050 -0.007 0.051 -0.007 0.051 -0.008 0.049 -0.001 0.00 0.50 1 0.052 -0.009 0.050 -0.007 0.052 -0.010 0.049 -0.001 0.25 0.50 1 0.055 -0.013 0.051 -0.008 0.054 -0.013 0.049 -0.001 0.75 0.50 1 0.120 -0.075 0.051 -0.009 0.058 -0.020 0.049 -0.001 0.90 0.50 1 0.059 -0.018 0.051 -0.008 0.054 -0.014 0.049 -0.001 -0.90 0.90 1 0.028 -0.013 0.027 -0.013 0.031 -0.015 0.019 0.000 -0.75 0.90 1 0.028 -0.014 0.028 -0.013 0.031 -0.016 0.019 0.000 -0.25 0.90 1 0.034 -0.019 0.029 -0.016 0.035 -0.019 0.019 0.000 0.00 0.90 1 0.039 -0.023 0.031 -0.017 0.038 -0.023 0.019 0.000 0.25 0.90 1 0.051 -0.032 0.031 -0.018 0.049 -0.030 0.019 0.000 0.75 0.90 1 0.070 -0.048 0.031 -0.017 0.059 -0.040 0.019 0.000 0.90 0.90 1 0.040 -0.025 0.028 -0.015 0.045 -0.029 0.019 0.000 -0.90 -0.90 2 0.132 0.005 0.132 0.006 0.131 0.007 0.132 -0.002 -0.75 -0.90 2 0.132 0.005 0.132 0.006 0.130 0.007 0.132 -0.002 -0.25 -0.90 2 0.131 0.011 0.133 0.007 0.134 0.012 0.132 -0.002 0.00 -0.90 2 0.136 0.016 0.133 0.008 0.134 0.015 0.132 -0.002 0.25 -0.90 2 0.143 0.032 0.136 0.009 0.140 0.025 0.132 -0.002 0.75 -0.90 2 0.325 0.223 0.132 0.012 0.149 0.044 0.132 -0.002 0.90 -0.90 2 0.154 0.046 0.133 0.009 0.135 0.028 0.132 -0.002 -0.90 -0.50 2 0.128 -0.001 0.127 0.000 0.127 0.001 0.126 -0.003 -0.75 -0.50 2 0.127 -0.001 0.126 0.001 0.126 0.001 0.126 -0.003 -0.25 -0.50 2 0.124 0.002 0.126 0.000 0.123 0.002 0.126 -0.003 0.00 -0.50 2 0.125 0.005 0.125 0.001 0.124 0.003 0.126 -0.003 0.25 -0.50 2 0.126 0.012 0.125 0.002 0.127 0.007 0.126 -0.003 0.75 -0.50 2 0.209 0.118 0.128 0.001 0.131 0.016 0.126 -0.003 0.90 -0.50 2 0.136 0.020 0.127 0.000 0.127 0.008 0.126 -0.003 -0.90 -0.25 2 0.120 -0.004 0.119 -0.003 0.119 -0.002 0.116 -0.003 -0.75 -0.25 2 0.120 -0.003 0.118 -0.002 0.118 -0.002 0.116 -0.003 -0.25 -0.25 2 0.117 -0.001 0.117 -0.002 0.115 -0.002 0.116 -0.003 0.00 -0.25 2 0.118 -0.001 0.116 -0.003 0.118 -0.001 0.116 -0.003 0.25 -0.25 2 0.115 0.004 0.117 -0.002 0.118 0.000 0.116 -0.003 0.75 -0.25 2 0.161 0.055 0.119 -0.003 0.123 0.003 0.116 -0.003 0.90 -0.25 2 0.129 0.005 0.118 -0.004 0.120 0.001 0.116 -0.003 True Initial Estimator AH1 AH2 AB Unweighted Spatial GM Estimators of ? Table D3 cont. 194 True Values ?? W RMSE Bias RMSE Bias RMSE Bias RMSE Bias -0.90 0.00 2 0.106 -0.005 0.107 -0.004 0.106 -0.004 0.103 -0.003 -0.75 0.00 2 0.107 -0.004 0.106 -0.005 0.104 -0.003 0.103 -0.003 -0.25 0.00 2 0.107 -0.003 0.107 -0.006 0.104 -0.004 0.103 -0.003 0.00 0.00 2 0.106 -0.004 0.105 -0.005 0.104 -0.004 0.103 -0.003 0.25 0.00 2 0.105 -0.005 0.105 -0.005 0.107 -0.004 0.103 -0.003 0.75 0.00 2 0.139 0.004 0.107 -0.005 0.108 -0.005 0.103 -0.003 0.90 0.00 2 0.116 -0.006 0.106 -0.006 0.108 -0.005 0.103 -0.003 -0.90 0.25 2 0.089 -0.006 0.090 -0.006 0.090 -0.005 0.087 -0.002 -0.75 0.25 2 0.088 -0.006 0.090 -0.005 0.089 -0.005 0.087 -0.002 -0.25 0.25 2 0.089 -0.006 0.090 -0.007 0.088 -0.006 0.087 -0.002 0.00 0.25 2 0.089 -0.006 0.088 -0.006 0.087 -0.006 0.087 -0.002 0.25 0.25 2 0.090 -0.008 0.088 -0.007 0.089 -0.008 0.087 -0.002 0.75 0.25 2 0.130 -0.037 0.090 -0.007 0.095 -0.010 0.087 -0.002 0.90 0.25 2 0.099 -0.013 0.089 -0.006 0.093 -0.009 0.087 -0.002 -0.90 0.50 2 0.068 -0.007 0.068 -0.006 0.069 -0.006 0.068 -0.002 -0.75 0.50 2 0.068 -0.006 0.068 -0.007 0.068 -0.006 0.068 -0.002 -0.25 0.50 2 0.069 -0.007 0.069 -0.007 0.069 -0.007 0.068 -0.002 0.00 0.50 2 0.069 -0.008 0.068 -0.007 0.070 -0.009 0.068 -0.002 0.25 0.50 2 0.067 -0.009 0.069 -0.007 0.071 -0.010 0.068 -0.002 0.75 0.50 2 0.124 -0.058 0.070 -0.009 0.077 -0.016 0.068 -0.002 0.90 0.50 2 0.080 -0.020 0.070 -0.009 0.073 -0.013 0.068 -0.002 -0.90 0.90 2 0.028 -0.007 0.027 -0.007 0.027 -0.008 0.025 0.000 -0.75 0.90 2 0.028 -0.008 0.027 -0.007 0.027 -0.008 0.025 0.000 -0.25 0.90 2 0.031 -0.012 0.028 -0.009 0.030 -0.011 0.025 0.000 0.00 0.90 2 0.033 -0.016 0.029 -0.010 0.033 -0.014 0.025 0.000 0.25 0.90 2 0.040 -0.022 0.030 -0.012 0.039 -0.020 0.025 0.000 0.75 0.90 2 0.056 -0.031 0.030 -0.011 0.050 -0.027 0.025 0.000 0.90 0.90 2 0.041 -0.018 0.029 -0.010 0.044 -0.022 0.025 0.000 -0.90 -0.90 3 0.194 0.003 0.193 0.001 0.189 0.001 0.187 -0.011 -0.75 -0.90 3 0.193 0.003 0.193 0.000 0.187 0.002 0.187 -0.011 -0.25 -0.90 3 0.189 0.006 0.189 0.000 0.185 0.005 0.187 -0.011 0.00 -0.90 3 0.187 0.015 0.187 0.000 0.183 0.007 0.187 -0.011 0.25 -0.90 3 0.191 0.034 0.185 0.001 0.179 0.013 0.187 -0.011 0.75 -0.90 3 0.395 0.255 0.183 0.004 0.189 0.039 0.187 -0.011 0.90 -0.90 3 0.204 0.047 0.188 0.002 0.187 0.023 0.187 -0.011 Table D3 cont. Unweighted Spatial GM Estimators of ? Initial Estimator AH1 AH2 AB True 195 True Values ?? W RMSE Bias RMSE Bias RMSE Bias RMSE Bias -0.90 -0.50 3 0.170 -0.005 0.171 -0.005 0.170 -0.004 0.168 -0.010 -0.75 -0.50 3 0.171 -0.004 0.171 -0.005 0.168 -0.002 0.168 -0.010 -0.25 -0.50 3 0.169 -0.002 0.170 -0.006 0.166 -0.003 0.168 -0.010 0.00 -0.50 3 0.167 0.002 0.170 -0.006 0.165 -0.003 0.168 -0.010 0.25 -0.50 3 0.167 0.009 0.167 -0.007 0.164 -0.002 0.168 -0.010 0.75 -0.50 3 0.261 0.127 0.167 -0.003 0.170 0.011 0.168 -0.010 0.90 -0.50 3 0.172 0.020 0.171 -0.006 0.167 0.006 0.168 -0.010 -0.90 -0.25 3 0.154 -0.006 0.155 -0.007 0.154 -0.006 0.152 -0.009 -0.75 -0.25 3 0.155 -0.006 0.155 -0.006 0.152 -0.006 0.152 -0.009 -0.25 -0.25 3 0.154 -0.005 0.155 -0.007 0.150 -0.007 0.152 -0.009 0.00 -0.25 3 0.152 -0.005 0.154 -0.008 0.149 -0.007 0.152 -0.009 0.25 -0.25 3 0.150 -0.001 0.152 -0.009 0.150 -0.006 0.152 -0.009 0.75 -0.25 3 0.199 0.054 0.150 -0.007 0.151 0.001 0.152 -0.009 0.90 -0.25 3 0.152 0.001 0.153 -0.007 0.151 0.001 0.152 -0.009 -0.90 0.00 3 0.136 -0.007 0.135 -0.008 0.133 -0.009 0.132 -0.009 -0.75 0.00 3 0.136 -0.007 0.136 -0.007 0.133 -0.007 0.132 -0.009 -0.25 0.00 3 0.135 -0.007 0.134 -0.009 0.131 -0.008 0.132 -0.009 0.00 0.00 3 0.132 -0.008 0.134 -0.009 0.131 -0.009 0.132 -0.009 0.25 0.00 3 0.134 -0.007 0.133 -0.010 0.132 -0.008 0.132 -0.009 0.75 0.00 3 0.166 0.001 0.134 -0.009 0.132 -0.005 0.132 -0.009 0.90 0.00 3 0.136 -0.009 0.132 -0.010 0.132 -0.005 0.132 -0.009 -0.90 0.25 3 0.111 -0.007 0.112 -0.008 0.109 -0.008 0.109 -0.007 -0.75 0.25 3 0.111 -0.008 0.113 -0.008 0.109 -0.008 0.109 -0.007 -0.25 0.25 3 0.112 -0.009 0.111 -0.009 0.109 -0.010 0.109 -0.007 0.00 0.25 3 0.111 -0.008 0.111 -0.009 0.110 -0.011 0.109 -0.007 0.25 0.25 3 0.111 -0.011 0.110 -0.009 0.111 -0.011 0.109 -0.007 0.75 0.25 3 0.148 -0.040 0.111 -0.011 0.111 -0.013 0.109 -0.007 0.90 0.25 3 0.115 -0.019 0.112 -0.012 0.109 -0.011 0.109 -0.007 -0.90 0.50 3 0.083 -0.007 0.084 -0.007 0.083 -0.007 0.082 -0.006 -0.75 0.50 3 0.083 -0.007 0.085 -0.007 0.083 -0.008 0.082 -0.006 -0.25 0.50 3 0.085 -0.009 0.084 -0.009 0.084 -0.010 0.082 -0.006 0.00 0.50 3 0.085 -0.010 0.085 -0.009 0.085 -0.011 0.082 -0.006 0.25 0.50 3 0.084 -0.013 0.084 -0.009 0.085 -0.011 0.082 -0.006 0.75 0.50 3 0.139 -0.059 0.087 -0.012 0.088 -0.017 0.082 -0.006 0.90 0.50 3 0.094 -0.022 0.087 -0.012 0.086 -0.016 0.082 -0.006 Initial Estimator AH1 AH2 AB True Table D3 cont. Unweighted Spatial GM Estimators of ? 196 True Values ?? W RMSE Bias RMSE Bias RMSE Bias RMSE Bias -0.90 0.90 3 0.033 -0.007 0.033 -0.006 0.032 -0.006 0.031 0.001 -0.75 0.90 3 0.034 -0.007 0.033 -0.007 0.032 -0.007 0.031 0.001 -0.25 0.90 3 0.035 -0.012 0.033 -0.009 0.035 -0.010 0.031 0.001 0.00 0.90 3 0.038 -0.016 0.035 -0.011 0.037 -0.015 0.031 0.001 0.25 0.90 3 0.044 -0.022 0.036 -0.012 0.044 -0.020 0.031 0.001 0.75 0.90 3 0.066 -0.035 0.037 -0.012 0.054 -0.029 0.031 0.001 0.90 0.90 3 0.052 -0.020 0.037 -0.010 0.053 -0.023 0.031 0.001 True Unweighted Spatial GM Estimators of ? Initial Estimator AH1 AH2 AB Table D3 cont. 197 True Values ?? W RMSE Bias RMSE Bias RMSE Bias RMSE Bias -0.90 -0.90 1 0.036 0.017 0.036 0.017 0.041 0.021 0.023 0.000 -0.75 -0.90 1 0.037 0.018 0.036 0.017 0.041 0.022 0.023 0.000 -0.25 -0.90 1 0.044 0.026 0.039 0.020 0.047 0.027 0.023 0.000 0.00 -0.90 1 0.051 0.032 0.039 0.022 0.053 0.032 0.023 0.000 0.25 -0.90 1 0.069 0.044 0.041 0.024 0.063 0.041 0.023 0.000 0.75 -0.90 1 0.087 0.060 0.039 0.021 0.073 0.051 0.023 0.000 0.90 -0.90 1 0.052 0.032 0.037 0.019 0.059 0.039 0.023 0.000 -0.90 -0.50 1 0.048 0.002 0.049 0.003 0.048 0.004 0.047 -0.001 -0.75 -0.50 1 0.048 0.003 0.049 0.003 0.049 0.004 0.047 -0.001 -0.25 -0.50 1 0.049 0.006 0.049 0.004 0.049 0.006 0.047 -0.001 0.00 -0.50 1 0.049 0.009 0.048 0.004 0.048 0.007 0.047 -0.001 0.25 -0.50 1 0.052 0.015 0.048 0.006 0.050 0.010 0.047 -0.001 0.75 -0.50 1 0.117 0.070 0.049 0.006 0.054 0.018 0.047 -0.001 0.90 -0.50 1 0.058 0.020 0.048 0.004 0.051 0.013 0.047 -0.001 -0.90 -0.25 1 0.054 0.000 0.054 0.000 0.054 0.000 0.054 -0.002 -0.75 -0.25 1 0.054 0.000 0.055 0.000 0.053 0.000 0.054 -0.002 -0.25 -0.25 1 0.054 0.001 0.054 0.000 0.053 0.000 0.054 -0.002 0.00 -0.25 1 0.053 0.002 0.053 0.001 0.053 0.001 0.054 -0.002 0.25 -0.25 1 0.054 0.006 0.053 0.001 0.053 0.002 0.054 -0.002 0.75 -0.25 1 0.089 0.042 0.054 0.002 0.055 0.007 0.054 -0.002 0.90 -0.25 1 0.058 0.012 0.055 0.000 0.052 0.004 0.054 -0.002 -0.90 0.00 1 0.057 -0.002 0.057 -0.002 0.057 -0.002 0.056 -0.003 -0.75 0.00 1 0.057 -0.001 0.057 -0.002 0.056 -0.002 0.056 -0.003 -0.25 0.00 1 0.057 -0.002 0.057 -0.002 0.056 -0.003 0.056 -0.003 0.00 0.00 1 0.055 -0.003 0.056 -0.003 0.056 -0.003 0.056 -0.003 0.25 0.00 1 0.056 -0.002 0.055 -0.002 0.055 -0.003 0.056 -0.003 0.75 0.00 1 0.074 -0.001 0.057 -0.002 0.057 -0.002 0.056 -0.003 0.90 0.00 1 0.062 -0.001 0.057 -0.001 0.056 -0.001 0.056 -0.003 -0.90 0.25 1 0.056 -0.003 0.056 -0.003 0.055 -0.003 0.054 -0.002 -0.75 0.25 1 0.056 -0.003 0.056 -0.004 0.056 -0.004 0.054 -0.002 -0.25 0.25 1 0.056 -0.004 0.055 -0.004 0.055 -0.005 0.054 -0.002 0.00 0.25 1 0.055 -0.006 0.056 -0.005 0.055 -0.006 0.054 -0.002 0.25 0.25 1 0.057 -0.008 0.055 -0.005 0.056 -0.007 0.054 -0.002 0.75 0.25 1 0.095 -0.050 0.056 -0.006 0.057 -0.010 0.054 -0.002 0.90 0.25 1 0.061 -0.011 0.056 -0.004 0.057 -0.007 0.054 -0.002 Table D4 Initial Estimator AH1 AH2 AB True Weighted Spatial GM Estimators of ? 198 True Values ?? W RMSE Bias RMSE Bias RMSE Bias RMSE Bias -0.90 0.50 1 0.049 -0.006 0.049 -0.006 0.050 -0.007 0.047 -0.002 -0.75 0.50 1 0.050 -0.006 0.049 -0.006 0.050 -0.007 0.047 -0.002 -0.25 0.50 1 0.050 -0.008 0.050 -0.007 0.050 -0.009 0.047 -0.002 0.00 0.50 1 0.053 -0.011 0.051 -0.008 0.051 -0.010 0.047 -0.002 0.25 0.50 1 0.056 -0.015 0.051 -0.008 0.053 -0.013 0.047 -0.002 0.75 0.50 1 0.120 -0.076 0.050 -0.010 0.057 -0.021 0.047 -0.002 0.90 0.50 1 0.059 -0.018 0.050 -0.007 0.053 -0.015 0.047 -0.002 -0.90 0.90 1 0.037 -0.018 0.037 -0.017 0.041 -0.020 0.022 -0.001 -0.75 0.90 1 0.038 -0.019 0.038 -0.018 0.042 -0.021 0.022 -0.001 -0.25 0.90 1 0.044 -0.025 0.039 -0.021 0.047 -0.026 0.022 -0.001 0.00 0.90 1 0.053 -0.032 0.041 -0.023 0.052 -0.031 0.022 -0.001 0.25 0.90 1 0.071 -0.047 0.041 -0.024 0.065 -0.041 0.022 -0.001 0.75 0.90 1 0.094 -0.066 0.040 -0.022 0.078 -0.054 0.022 -0.001 0.90 0.90 1 0.055 -0.035 0.037 -0.020 0.061 -0.041 0.022 -0.001 -0.90 -0.90 2 0.115 0.005 0.115 0.005 0.118 0.007 0.118 -0.001 -0.75 -0.90 2 0.117 0.007 0.114 0.006 0.116 0.009 0.118 -0.001 -0.25 -0.90 2 0.116 0.011 0.115 0.009 0.118 0.014 0.118 -0.001 0.00 -0.90 2 0.120 0.019 0.116 0.009 0.121 0.018 0.118 -0.001 0.25 -0.90 2 0.126 0.034 0.117 0.012 0.128 0.025 0.118 -0.001 0.75 -0.90 2 0.307 0.210 0.119 0.016 0.134 0.041 0.118 -0.001 0.90 -0.90 2 0.140 0.046 0.117 0.011 0.125 0.026 0.118 -0.001 -0.90 -0.50 2 0.111 0.002 0.110 0.001 0.110 0.002 0.110 -0.001 -0.75 -0.50 2 0.110 0.002 0.110 0.001 0.110 0.002 0.110 -0.001 -0.25 -0.50 2 0.109 0.003 0.108 0.003 0.109 0.004 0.110 -0.001 0.00 -0.50 2 0.110 0.006 0.110 0.003 0.110 0.006 0.110 -0.001 0.25 -0.50 2 0.111 0.014 0.110 0.006 0.114 0.008 0.110 -0.001 0.75 -0.50 2 0.191 0.108 0.112 0.005 0.115 0.013 0.110 -0.001 0.90 -0.50 2 0.123 0.021 0.110 0.005 0.113 0.010 0.110 -0.001 -0.90 -0.25 2 0.102 -0.001 0.103 -0.001 0.103 0.001 0.102 -0.002 -0.75 -0.25 2 0.102 -0.001 0.102 -0.001 0.102 0.001 0.102 -0.002 -0.25 -0.25 2 0.101 0.000 0.102 -0.001 0.101 0.000 0.102 -0.002 0.00 -0.25 2 0.100 0.001 0.102 0.001 0.102 0.000 0.102 -0.002 0.25 -0.25 2 0.101 0.005 0.104 0.002 0.103 0.002 0.102 -0.002 0.75 -0.25 2 0.145 0.051 0.103 0.001 0.105 0.006 0.102 -0.002 0.90 -0.25 2 0.111 0.006 0.103 0.001 0.105 0.003 0.102 -0.002 Initial Estimator AH1 AH2 AB True Table D4 cont. Weighted Spatial GM Estimators of ? 199 True Values ?? W RMSE Bias RMSE Bias RMSE Bias RMSE Bias -0.90 0.00 2 0.092 -0.002 0.092 -0.002 0.093 -0.002 0.091 -0.002 -0.75 0.00 2 0.091 -0.002 0.092 -0.002 0.093 -0.002 0.091 -0.002 -0.25 0.00 2 0.091 -0.002 0.092 -0.001 0.094 -0.002 0.091 -0.002 0.00 0.00 2 0.091 -0.002 0.092 -0.001 0.094 -0.001 0.091 -0.002 0.25 0.00 2 0.090 0.000 0.092 -0.001 0.093 -0.001 0.091 -0.002 0.75 0.00 2 0.129 0.005 0.091 -0.003 0.094 -0.003 0.091 -0.002 0.90 0.00 2 0.104 -0.003 0.093 -0.003 0.096 -0.003 0.091 -0.002 -0.90 0.25 2 0.080 -0.003 0.080 -0.003 0.082 -0.003 0.079 -0.002 -0.75 0.25 2 0.080 -0.003 0.080 -0.002 0.081 -0.004 0.079 -0.002 -0.25 0.25 2 0.078 -0.005 0.080 -0.003 0.080 -0.004 0.079 -0.002 0.00 0.25 2 0.078 -0.004 0.080 -0.003 0.081 -0.004 0.079 -0.002 0.25 0.25 2 0.079 -0.004 0.080 -0.004 0.080 -0.005 0.079 -0.002 0.75 0.25 2 0.122 -0.033 0.079 -0.005 0.083 -0.009 0.079 -0.002 0.90 0.25 2 0.086 -0.010 0.081 -0.005 0.084 -0.006 0.079 -0.002 -0.90 0.50 2 0.065 -0.005 0.064 -0.005 0.066 -0.004 0.065 -0.001 -0.75 0.50 2 0.065 -0.005 0.064 -0.004 0.065 -0.004 0.065 -0.001 -0.25 0.50 2 0.064 -0.005 0.066 -0.005 0.066 -0.007 0.065 -0.001 0.00 0.50 2 0.064 -0.006 0.065 -0.005 0.067 -0.008 0.065 -0.001 0.25 0.50 2 0.064 -0.009 0.065 -0.007 0.067 -0.009 0.065 -0.001 0.75 0.50 2 0.123 -0.060 0.065 -0.008 0.071 -0.014 0.065 -0.001 0.90 0.50 2 0.072 -0.016 0.067 -0.007 0.069 -0.010 0.065 -0.001 -0.90 0.90 2 0.033 -0.009 0.032 -0.009 0.033 -0.010 0.029 0.000 -0.75 0.90 2 0.034 -0.011 0.033 -0.010 0.033 -0.010 0.029 0.000 -0.25 0.90 2 0.037 -0.016 0.034 -0.012 0.037 -0.014 0.029 0.000 0.00 0.90 2 0.040 -0.020 0.035 -0.013 0.040 -0.017 0.029 0.000 0.25 0.90 2 0.049 -0.028 0.037 -0.015 0.048 -0.025 0.029 0.000 0.75 0.90 2 0.068 -0.040 0.039 -0.014 0.059 -0.035 0.029 0.000 0.90 0.90 2 0.048 -0.023 0.036 -0.013 0.053 -0.028 0.029 0.000 -0.90 -0.90 3 0.166 0.013 0.165 0.014 0.167 0.016 0.165 0.003 -0.75 -0.90 3 0.167 0.013 0.164 0.014 0.167 0.015 0.165 0.003 -0.25 -0.90 3 0.164 0.018 0.160 0.012 0.163 0.016 0.165 0.003 0.00 -0.90 3 0.165 0.021 0.162 0.014 0.161 0.019 0.165 0.003 0.25 -0.90 3 0.173 0.033 0.162 0.016 0.167 0.027 0.165 0.003 0.75 -0.90 3 0.366 0.239 0.167 0.018 0.177 0.043 0.165 0.003 0.90 -0.90 3 0.182 0.049 0.167 0.010 0.164 0.027 0.165 0.003 Initial Estimator AH1 AH2 AB Table D4 cont. True Weighted Spatial GM Estimators of ? 200 True Values ?? W RMSE Bias RMSE Bias RMSE Bias RMSE Bias -0.90 -0.50 3 0.148 0.005 0.148 0.007 0.146 0.007 0.147 0.002 -0.75 -0.50 3 0.147 0.006 0.145 0.006 0.146 0.006 0.147 0.002 -0.25 -0.50 3 0.148 0.009 0.144 0.006 0.146 0.007 0.147 0.002 0.00 -0.50 3 0.146 0.012 0.142 0.007 0.145 0.007 0.147 0.002 0.25 -0.50 3 0.144 0.017 0.143 0.008 0.144 0.013 0.147 0.002 0.75 -0.50 3 0.240 0.122 0.147 0.006 0.144 0.018 0.147 0.002 0.90 -0.50 3 0.157 0.022 0.145 0.002 0.144 0.012 0.147 0.002 -0.90 -0.25 3 0.135 0.004 0.133 0.004 0.133 0.004 0.134 0.002 -0.75 -0.25 3 0.135 0.003 0.134 0.004 0.132 0.003 0.134 0.002 -0.25 -0.25 3 0.133 0.004 0.130 0.003 0.131 0.003 0.134 0.002 0.00 -0.25 3 0.132 0.005 0.129 0.004 0.128 0.004 0.134 0.002 0.25 -0.25 3 0.131 0.011 0.128 0.003 0.130 0.007 0.134 0.002 0.75 -0.25 3 0.177 0.058 0.132 0.002 0.131 0.005 0.134 0.002 0.90 -0.25 3 0.140 0.011 0.133 -0.001 0.132 0.005 0.134 0.002 -0.90 0.00 3 0.118 0.002 0.117 0.002 0.117 0.001 0.117 0.000 -0.75 0.00 3 0.118 0.001 0.117 0.001 0.116 0.001 0.117 0.000 -0.25 0.00 3 0.116 0.001 0.114 0.002 0.114 0.001 0.117 0.000 0.00 0.00 3 0.115 0.001 0.115 0.001 0.114 0.001 0.117 0.000 0.25 0.00 3 0.113 0.002 0.113 0.000 0.114 0.001 0.117 0.000 0.75 0.00 3 0.145 0.006 0.115 -0.002 0.117 -0.001 0.117 0.000 0.90 0.00 3 0.123 -0.003 0.119 -0.004 0.119 -0.002 0.117 0.000 -0.90 0.25 3 0.099 -0.001 0.098 -0.002 0.098 -0.003 0.098 -0.001 -0.75 0.25 3 0.098 -0.001 0.098 -0.001 0.098 -0.003 0.098 -0.001 -0.25 0.25 3 0.095 -0.002 0.096 -0.002 0.097 -0.003 0.098 -0.001 0.00 0.25 3 0.097 -0.002 0.096 -0.002 0.096 -0.004 0.098 -0.001 0.25 0.25 3 0.097 -0.005 0.097 -0.003 0.097 -0.003 0.098 -0.001 0.75 0.25 3 0.133 -0.038 0.097 -0.006 0.099 -0.006 0.098 -0.001 0.90 0.25 3 0.105 -0.011 0.103 -0.006 0.100 -0.008 0.098 -0.001 -0.90 0.50 3 0.076 -0.004 0.075 -0.004 0.075 -0.004 0.076 -0.002 -0.75 0.50 3 0.075 -0.004 0.075 -0.004 0.076 -0.005 0.076 -0.002 -0.25 0.50 3 0.075 -0.005 0.075 -0.004 0.076 -0.006 0.076 -0.002 0.00 0.50 3 0.075 -0.006 0.076 -0.004 0.077 -0.007 0.076 -0.002 0.25 0.50 3 0.078 -0.009 0.076 -0.005 0.078 -0.008 0.076 -0.002 0.75 0.50 3 0.137 -0.064 0.078 -0.008 0.081 -0.012 0.076 -0.002 0.90 0.50 3 0.088 -0.016 0.078 -0.008 0.079 -0.010 0.076 -0.002 Table D4 cont. AH1 AH2 AB True Weighted Spatial GM Estimators of ? Initial Estimator 201 True Values ?? W RMSE Bias RMSE Bias RMSE Bias RMSE Bias -0.90 0.90 3 0.038 -0.011 0.038 -0.010 0.037 -0.011 0.035 -0.001 -0.75 0.90 3 0.038 -0.011 0.039 -0.010 0.037 -0.011 0.035 -0.001 -0.25 0.90 3 0.042 -0.015 0.039 -0.012 0.040 -0.014 0.035 -0.001 0.00 0.90 3 0.046 -0.021 0.040 -0.014 0.044 -0.019 0.035 -0.001 0.25 0.90 3 0.055 -0.029 0.042 -0.016 0.053 -0.027 0.035 -0.001 0.75 0.90 3 0.077 -0.045 0.041 -0.016 0.066 -0.036 0.035 -0.001 0.90 0.90 3 0.057 -0.027 0.041 -0.014 0.059 -0.030 0.035 -0.001 True Initial Estimator AH1 AH2 AB Table D4 cont. Weighted Spatial GM Estimators of ? 202 Figure1: QQ Plot of IV Estimator AH1 Figure 2: QQ Plot of IV Estimator AH2 203 Figure 3: QQ Plot of IV Estimator AB Figure 4: QQ Plot of GMM Estimator AB Ignoring Spatial Correlation 204 Figure 5: QQ Plot of GMM Estimator AB based on b V mix Figure 6: QQ Plot of GMM Estimator AB based on b V E 205 Figure 7: Normal Probability QQ Plot Figure 8: Student t Probability QQ Plot 206 E Appendix: Symbols and Notation Used In this Appendix, I provide a brief explanation of the different (standard) symbols used throughout the thesis. N cross-sectional dimension of the data under consideration T time dimension of the data under consideration I N N ?N identity matrix e T T ? 1 vector of ones J T T ?T matrix of ones Q 0 transformation matrix that subtracts location specific sample means Q 1 transformation matrix that calculates location specificsamplemeans ? first difference operator (in time dimension) D first difference transformation matrix ? for all (logical predicate) ? exists (logical predicate) ? relation operator ?belongs to a set? ? infinity R setofrealnumbers N set of natural numbers 210 ?(x) neighborhood of a real number x sup supremum inf infimum min minimum argmin ??? {} argument that maximizes a maximization problem in brackets with parameters ? restricted to a set? lim n?? a n limes superior of the sequence a n ? Kronecker product operator kMk matrix norm [tr(M 0 M)] 1/2 ? min (?) smallest eigenvalue of a matrix? diag(d 1 ,...,d N ) diagonal matrix with d 1 ,...,d N on the main diagonal E(y) expected value of a vector/scalar y VC(y) variance covariance matrix of a vectory Cov(z 1 ,z 2 ) covariance of a two scalar random variables d ? convergence in distribution p ? convergence in probability r ? convergence in r-th mean 211 N (x,?) multivariate normal distribution with mean x and variance covari- ance matrix? L p space of random variables with finite p-th absolute moments |x| absolute value of a number/random variable k?k r [E(? r )] 1/r O p (k) sequence random variables is of order in probability of at most N k O(k) deterministic sequenceisoforderofatmostN k 2SLS two stage least squares 3SLS three stage least squares CV covariance (estimator) GLS generalized least squares GM generalized moments GMM generalized method of moments HAC heteroscedasticity and autocorrelation consistent IV instrumental variable LIML limited information maximum likelihood LSDV least-squares dummy variable (estimator) MD minimum distance ML maximum likelihood OLS ordinary least squares SAR spatial autoregressive STAR space-time autoregressive 212 STARMA space-time autoregressive moving average SUR seemingly unrelated regressions VAR vector autoregressive VARMA vector autoregressive moving average WG within group 213 F Appendix: Inequalities In this Appendix, I provide a list of inequalities used throughout the thesis. The following is based on, e.g. Bierens (1994), Section 1.4. F.1 Deterministic Inequalities (Bernoulli)Letx?R,x>1 and n?N.Then (1 +x) n ? 1+nx, (C.1.1) with the inequality being sharp for x 6=0and n>1. (Triangle)Letx,y ?C.Then |x|?|y| ? |x?y| ? |x| + |y|. (C.1.2) F.2 Stochastic Inequalities (Chebyshev)LetX be a non-negative random variable with a finite mean? X and finite variance ? 2 X . Then for any ??R,?>0 P ? |X ?? X | > r ? 2 X ? ! ??. (C.2.3) (Holder)LetX andY be random variables and letp,q ?R,p>1, 1 p + 1 q =1. 214 Then E(|XY|) ? [E(|X| p )] 1 p [E(|Y| q )] 1 q . (C.2.4) (Cauchy-Schwartz)Forp = q =2,wehave E(|XY|) ? q E ? |X| 2 ? q E ? |Y| 2 ? . (C.2.5) (Lyapunov)ForY =1we have for p>1 E(|X|) ? [E(|X| p )] 1 p . (C.2.6) (Minkowski)Ifforsomep? 1,E(|X| p ) 1, 1 p + 1 q =1: ? ? ? ? ? m X i=1 x i y i ? ? ? ? ? ? ? m X i=1 |x i | p !1 p ? m X i=1 |y i | q !1 q . (C.2.9) Similarly from Lyapunov?s inequality (or by selecting y i =1in the above): ? ? ? ? ? m X i=1 x i ? ? ? ? ? p ?m p?1 m X i=1 |x i | p ,p? 1. (C.2.10) Finally, by Minkowski?s inequality ? ? ? ? ? m X i=1 x i +y i ? ? ? ? ? 1 p ? ? m X i=1 |x i | p !1 p + ? m X i=1 |y i | q !1 q . (C.2.11) Note if x i and y i are random variables, then the last three inequalities hold for all their realizations. As a result, we can apply these inequalities also in cases where x i and y i are stochastic. The same holds for the triangle inequality. 216 References [1] Abraham, B., 1983, The Exact Likelihood for a Space Time Model, Metrika, 30, 239-243. [2] Ahn, S.C., and P. Schmidt, 1995, Efficient Estimation of Models for Dy- namic Panel Data, Journal of Econometrics, 68, 5-27. [3] Alonso-Borrego, C. and M. Arellano, 1999, Symmetrically Normalized Instrumental-Variable Estimation Using Panel Data, Journal of Business & Economic Statistics, 17, 36-49. [4] Alvarez, J. and M. Arellano, 2003, The Time-Series and Cross-Section As- ymptotics of Dynamic Panel Data Estimators, Econometrica, 71(4), 1121- 59. [5] Anderson, T.W., and C. Hsiao, 1981, Estimation of Dynamic Models with Error Components, Journal of American Statistical Association, 76, 598- 606. [6] Anderson, T.W., and C. Hsiao, 1982, Formulation and estimation of dy- namic models using panel data, Journal of Econometrics, 18, 47-82. [7] Anselin, L., 1988, Spatial Econometrics: Methods and Models (Kluwer Academic Publishers, Boston). [8] AnselinL.andS.Hudak, 1992,SpatialEconometricsinPractice; AReview ofSoftwareOptions, RegionalScienceandUrbanEconomics, 22,509-536. 217 [9] Arellano, M., 1989, A Note on the Anderson-Hsiao Estimator for Panel Data, Economics Letters, 31, 337-341. [10] Arellano, M., and S.R. Bond, 1991, Some Tests of Specification for Panel Data: Monte Evidence and an Application to Employment Equations, Re- view of Economic Studies, 58, 277-297. [11] Arellano, M., and O. Bover, 1995, Another Look at the Instrumental Vari- able Estimation of Error-Components Models, JournalofEconometrics, 68, 29-51. [12] Audretsch, D.B., Feldmann, M.P., 1996.R&Dspilloversandthegeography of innovation and production, American Economic Review, 86, 630-640. [13] Baltagi, B.H., 1995 and 2002, Econometric Analysis of Panel Data Wiley, New York. [14] Baltagi, B.H., Li, D., 2001a. Double length artificial regressions for testing spatial dependence, Econometric Reviews, 20, 31-40. [15] Baltagi, B.H., Li, D., 2001b. LM test for functional form and spatial error correlation, International Regional Science Review, 24, 194-225. [16] Baltagi, B.H., S.H. Song and W. Koh, 2003, Testing Panel Data Regression Models with Spatial Error Correlation, Journal of Econometrics,117(1),123-50. 218 [17] Bartsch, H.J., 1987, Mathematische Formeln (VEB Fachbuchverlag, Leipzig, DDR). [18] Bernat Jr., G., 1996, Does Manufacturing matter? A spatial econometric view of Kaldor?s laws, JournalofRegionalScience, 36, 463-477. [19]Besley,T.,Case,A.,1995. Incumbent behavior: Vote-seeking, taxsetting, and yardstick competition, American Economic Review, 85, 25-45. [20] Bhargava,A.andJ.D.Sargan,1983, Estimating Dynamic Random Effects Models from Panel Models Covering Short Time Periods, Econometrica, 51, 1635-1659. [21] Bierens, H.J., 1994, Topics in Advanced Econometrics (Cambridge Univer- sity Press, Cambridge, U.K.). [22] Binder, M., C. Hsiao, and M.H. Pesaran, 2002, Estimation and Inference in Short Panel Vector Autoregressions with Unit Roots and Cointegration, Working Paper, University of Maryland, College Park. [23] Binder, M., C. Hsiao, J. Mutl, and M.H. Pesaran, 2002, Computational Issues in the Estimation of Higher-Order Panel Vector Autoregressions, mimeo, University of Maryland, College Park. [24] Blundell, R.W., and S.R. Bond, 1998, Initial Conditions and Moment Re- strictions in Dynamic Panel Data Models, Journal of Econometrics,82, 135-156. 219 [25] Blundell, R.W., Smith, R.J., 1991, Initial conditions and efficient estima- tion in dynamic panel data models, Annales d?Economie et de Statistique, 20/21, 109?123. [26] Bollinger, C., Ihlanfeldt, K., 1997. The impact of rapid rail transit on eco- nomic development: The case of Atlanta?s Marta, Journal of Urban Eco- nomics, 42, 179-204. [27] Bronnenberg, B.J. and V. Mahajan, 2001, Unobserved Retailer Behavior in Multimarket Data: Joint Spatial Dependence in Market Shares and Promo- tion Variables, Marketing Science, 20, 284-299. [28] Brown, P.E., K.F. Karesen, G.O. Roberts and S. Tonellato, 2000, Blur- Generated Non-Separable Space-Time Models, Journal of the Royal Sta- tistical Society, Series B (Statistical Methodology), 62(4), 847-860. [29] Buettner, T., 1999. The effect of unemployment, aggregate wages, and spa- tial contiguity on local wages: An investigation with German district level data, Papers in Regional Science, 78, 47-67. [30] Case, A., 1991. Spatial patterns in household demand, Econometrica,59, 953-966. [31] Case, A., Hines Jr., J., Rosen, H., 1993. Budget spillovers and fiscal policy independence: Evidence from the States, Journal of Public Economics,52, 285-307. 220 [32] Chamberlain, G., 1982, Multivariate Regression Models for Panel Data, Journal of Econometrics, 18, 5-46. [33] Chamberlain, G., 1984, Panel data, Ch. 22 in: Z. Griliches and M. Intriliga- tor, eds., Handbook of Econometrics, Vol. II (North-Holland, Amsterdam). [34] Chang, Y., 2002, Nonlinear IV Unit Root Tests in Panels with Cross- Sectional Dependency, Journal of Econometrics, 110(2), 261-92. [35] Chen, X. and T. Conley, 2001, A New Semiparametric Spatial Model for Panel Time Series, JournalofEconometrics, 105(1), 59-83. [36] Cliff, A. and J. Ord, 1973, Spatial Autocorrelation (Pion, London). [37] Cliff, A. and J. Ord, 1981, Spatial Processes,Models and Applications (Pion, London). [38] Conley, T., 1999, GMM estimation with cross sectional dependence, Jour- nalofEconometrics92, 1-45. [39] Cressie, N., 1993, Statistics of Spatial Data (Wiley, New York). [40] Das, D., Kelejian, H.H., Prucha, I.R., 2003. Small sample properties of es- timators of spatial autoregressive models with autoregressive disturbances, Papers in Regional Science, 82, 1-26. [41] Dhrymes, P.J., 1984, Mathematics forEconometrics(Springer-Verlag, New York). 221 [42] Do?l?,Z.andO.Do?l?,1999,Differenci?ln?po?cetfunkc?v?cepromn?enn?ch (Masarykova univerzita v Brn?e, Czech Republic). [43] Dowd,M.R.,LeSage,J.P.,1997. Analysis of spatial contiguity influences on state price level formation, International Journal of Forecasting,13, 245-253. [44] Driscoll, J. and A. Kraay, 1995, Spatial Correlations on Panel Data, Policy Research Working Paper, 1553, The World Bank. [45] Driscoll, J. and A. Kraay, 1998, Consistent covariance matrix estimation with spatially dependent panel data, The Review of Economics and Statis- tics,80, 549-560. [46] Elhorst, J.P., 2001, Dynamic models in space and time, Geographical Analysis, 33, 119?140. [47] G?nsler, P. andW. Slute, Wahrscheinlichkeitstheorie (SpringerVerlag, New York). [48] Giacomini, R. and C.W.J. Granger, 2004, Aggregation of space-time processes, Journal of Econometrics, 118, 7-26. [49] Hahn, J., 1999, How informative is the initial condition in the dynamic panel model with fixed effects, Journal of Econometrics, 93, 309?326. 222 [50] Hahn, J. and G. Kuersteiner, 2002, Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects when both n, and T are Large, Econometrica, 70(4), 1639-1657. [51] Haining, R., 1990, Spatial data Analysis in the Social and Environmental Sciences (Cambridge University Press, Cambridge). [52] Hansen, L.P., 1982, Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, 1029-1054. [53] Hansen, L.P., J. Heaton and A. Yaron, 1996, Finite Sample Properties of Some Alternative GMM Estimators, Journal of Business & Economic Sta- tistics, 14, 262-280. [54] Harris, R.D.F and E. Tzavalis, 1999, Inference for Unit Roots in Dynamic Panels where the Time Dimension is Fixed, Journal of Econometrics,91, 201-226. [55] Horn, R.H. and C.H. Johnson, 1985, Matrix Analysis (Cambridge Univer- sity Press, Cambridge). [56] Horn, R.H. and C.H. Johnson, 1991, Topics in Matrix Analysis (Cambridge University Press, Cambridge). [57] Holtz-Eakin, D., 1994. Public sector capital and the productivity puzzle, Review of Economics and Statistics, 76, 12-21. 223 [58] Holtz-Eakin, D., W. Newey and H. Rosen, 1988, EstimatingVector Autore- gressions with Panel Data, Econometrica, 56, 1371-1395. [59] Hordijk, L., 1979, Problems in estimating econometric relations in space, Papers of the Regional Science Association, 42, 99-115. [60] Hsiao, C., M.H. Pesaran and A.K. Tahmiscioglu, 2002, Maximum Like- lihood Estimation of Fixed Effects Dynamic Panel Data Models Covering Short Time Periods, Journal of Econometrics, 109(1), 107-50. [61] Judson, R.A., Owen, A.L., 1999, Estimating dynamic panel data models: a guide for macroeconomists, Economic Letters, 65, 9?15. [62] Karr, Alan F., 1993, Probability (Springer-Verlag, New York). [63] Kapoor, M., H.H. Kelejian and I.R. Prucha, 2005, Panel Data Models with Spatially Correlated Error Components, Journal of Econometrics,forth- coming. [64] Keane, M.P. and D.E. Runkle, 1992, On the Estimation of Panel-Data Mod- els with Serial Correlation When Instruments Are Not Strictly Exogenous, Journal of Business & Economic Statistics, 10(1), 1-9. [65] Kelejian, H.H., Prucha., I.R., 1997. Estimation of spatialregressionmod- els with autoregressive errors by Two-Stage Least Squares procedures: A serious problem, International Regional Science Review, 20, 103-111. 224 [66] Kelejian, H.H. and I.R. Prucha, 1998, A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with au- toregressive disturbances, Journal of Real Estate Finance and Economics 17, 99-121. [67] Kelejian, H.H. and I.R. Prucha, 1999, A generalized moments estimator forthe autoregressive parameter ina spatialmodel,International Economic Review 40, 509-533. [68] Kelejian, H.H. and I.R. Prucha, 2001, On the asymptotic distribution of the Moran I test statistic with applications, Journal of Econometrics,104, 219-257. [69] Kelejian, H.H. and I.R. Prucha, 2005, HAC Estimation in a Spatial Frame- work, Department of Economics, University of Maryland, College Park, mimeo. [70] Kelejian, H.H. and I.R. Prucha, 2004, Estimation of simultaneous systems of spatially interelated cross sectional equations, JournalofEconometrics, 118, 27-50. [71] Kelejian, H.H., I.R. Prucha and Y. Yuzefovich, 2004, Instrumental Variable Estimation of a Spatial Autoregressive Model with Autoregressive Distur- bances: Large and Small Sample Results, Department of Economics, Uni- versity of Maryland, forthcoming in J. LeSage and R.K. Pace, Advances in Econometrics, New York: Elsevier. 225 [72] Kelejian, H.H., Robinson, D., 1997. Infrastructure productivity estimation and its underlying econometric specifications, Papers in Regional Science, 76, 115-131. [73] Kelejian, H.H., Robinson, D., 2000. Returns to investment in navigation infrastructure: An equilibrium approach, Annals of Regional Science,34, 83-108. [74] Kiviet, J.F., 1995, On bias, inconsistency and efficiency in various estima- tors of dynamic panel data models, Journal of Econometrics, 68, 53?78. [75] Korniotis, G.M.,2005, A Dynamic Panel Estimator with Both Fixed and Spatial Effects, mimeo, University of Notre Dame. [76] Kyriakidis, P.CandA.G.Journel, 1999, Geostatistical Space-Time Models: AReview,Mathematical Geology, 31(6), 651-84. [77] Lee, L.-F., 2001a, Generalized method of moments estimation of spatial autoregressiveprocesses, Department ofEconomics, OhioStateUniversity, mimeo. [78] Lee, L.-F., 2001b, GMM and 2SLS estimation of mixed regressive, spatial autoregressive models, Department of Economics, Ohio State University, mimeo. [79] Lee, L.F., 2002. Consistency and efficiency of least squares estimation for mixed regressive, spatial autoregessive models, Econometric Theory,18, 252-277. 226 [80] Lee, L.F., 2003. Best spatial Two-Stage Least Squares estimators for a spa- tial autoregressive model with autoregressive disturbances, Econometric Reviews, 22, 307-335. [81] Lee, L.-F., 2004, Asymptotic distributions of maximum likelihood estima- tors for spatial autoregressive models, Econometrica, 72(6), 1899-1925. [82] LeSage, J. P., 1997. Bayesian estimation of spatial autoregressive models, International Regional Science Review, 20, 113-129. [83] LeSage, J. P., 1999. A spatial econometric analysis of China?s economic growth, Journal of Geographic Information Sciences, 5, 143-153. [84] LeSage, J. P., 2000. Bayesian estimation of limited dependent variable spa- tial autoregressive models, Geographic Analysis, 32, 19-35. [85] LeSage, J.P and A. Krivelyova, 1999, A spatial prior for Bayesian vector autoregressive models, Journal of Regional Science, 39(2), 297-317. [86] LeSage,J.P. and R.K. Pace, 2004, Models for Spatially Dependent Missing Data, Journal of Real Estate Finance and Economics, 29(2), 233-254. [87] Mutl, J., 2006, Misspecification of Space: An Illustration Using Growth Convergence Regressions, Workshop on Spatial Econometrics and Statis- tics,Rome,Italy. 227 [88] Nerlove, M., 1967, Experimental Evidence on the Estimation of Dynamic Economic Relations from a Time Series of Cross-Sections, Economic Stud- ies Quarterly, 18, 42-74. [89] Nerlove, M., 1967, Further Evidence on the Estimation of Dynamic Eco- nomic Relations from a Time Series of Cross Sections, Econometrica,39, 359-387. [90] Newey, W.K. and K.D. West, 1987, A Simple, Positive Semi-Definite Het- eroscedasticity and Autocorrelation Consistent Covariance Matrix, Econo- metrica, 55(3), 703-708. [91] Nickell, S., 1981, Biases in Dynamic Models with Fixed Effects, Econo- metrica, 16, 1-32. [92] Pace, R., Barry, R., 1997. Sparse spatial autoregressions, Statistics and Probability Letters, 33, 291-297. [93] Pace, R.K., R. Barry, J. Clapp and M. Rodriquez, 1998, Spatiotemporal Autoregressive Models of Neighborhood Effects, Journal of Real Estate Finance and Economics,17, 15-33. [94] Pfeifer, P.E., Deutsch, S.J., 1980, A three-stage iterative procedure for space-time modeling, Technometrics, 22, 35?47. [95] Pinkse, J., Slade, M.E., 1998. Contracting in space: An application of spa- tial statistics to discrete-choice models, Journal of Econometrics, 85, 125- 154. 228 [96] Pinkse, J., Slade, M. E., Brett, C., 2002. Spatial price competition: A semi- parametric approach, Econometrica, 70, 1111-53. [97] P?tscher, B.M. and I.R. Prucha, 1994, Generic uniform convergence and equicontinuity concepts for random functions: An exploration of the basic structure, Journal of Econometrics, 60, 23-63. [98] P?tscher, B.M. and I.R. Prucha, 1997, Dynamic Nonlinear Econometric Models, Asymptotic Theory (Springer Verlag, New York). [99] P?tscher, B.M. and I.R. Prucha, 2001, Basic elements of asymptotic theory. InB.H.Baltagi, ed., ACompaniontoTheoreticalEconometrics(Blackwell, New York), 201-229. [100] Prucha, I.R., 1985, Maximum likelihood and instrumental variable estima- tioninsimultaneousequationsystemswitherrorcomponents, International Economic Review 26, 491-506. [101] Prucha, I.R., 2004, Econ 721 Handout: Lag Operators and Difference Equations, mimeo, University of Maryland. [102] Prucha, I.R., 2005, Econ 624 Handout: Classical Nonlinear Econometrics Models, mimeo, University of Maryland. [103] Rao, C.R., 1973, Linear Statistical Inference and Its Applications (Wiley, New York). 229 [104] Rey, S. and M. Boarnet, 1999, A taxonomy of spatial econometric models for simultaneous systems. In L. Anselin and R. Florax, eds., Advances in Spatial Econometrics (Springer Verlag, New York). [105] Sargan, J.D., 1958, The Estimation of Economic Relationships Using In- strumental Variables, Econometrica, 26, 393-415. [106] Schmidt, P., 1976, Econometrics (Marcel Dekker, New York). [107] Sevestre, P. and A. Trognon, 1985, A note on autoregressive error- component models, Journal of Econometrics, 29, 231-245. [108] Shroder, M., 1995. Games the states don?t play: Welfare benefits and the theory of fiscal federalism, Review of Economics and Statistics, 77, 183- 191. [109] Stoffer, D.S., 1986, Estimation and Identification of Space-Time ARMAX ModelsinthePresenceofMissingData,JournaloftheAmericanStatistical Association, 81(395), 762-62. [110] Trognon, A., 1978, MiscellaneousAsymptoticPropertiesofOrdinaryLeast Squares and Maximum Likelihood Estimators in Dynamic Error Compo- nent Models, Annales de l?INSEE, 30, 632-657. [111] Vigil, R., 1998.Interactionsamongmunicipalitiesintheprovisionofpolice services: A spatial econometric approach, University of Maryland, Ph.D. Thesis. 230 [112] Whittle, P., 1954, On stationary processes in the plane, Biometrica 41, 434- 449. [113] Yang, Z., 2005, Quasi-Maximum Likelihood Estimation for Spatial Panel Data Regressions, mimeo, Singapore Management University. [114] Zellner, A., 1962, An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias, Journal of the American Statis- tical Society, 57, 348-368. 231