ABSTRACT 
 
 
 
 
Title of dissertation:  DYNAMIC PANEL DATA MODELS WITH  
    SPATIALLY CORRELATED DISTURBANCES 
 
 
Jan Mutl, Doctor of Philosophy, 2006 
 
 
Dissertation directed by: Professor Ingmar Prucha 
Department of Economics 
 
 
 
 
This thesis considers a dynamic panel data model with error components that 
are correlated both spatially (cross-sectionally) and time-wise. The model extends the 
literature on dynamic panel data models with cross-sectionally independent error 
components. The model for spatial dependence is a Cliff-Ord type model.  
We introduce a three step estimation procedure and give formal large sample 
results for the case of a finite time dimension. In particular, we show that a simple first 
stage instrumental variable (IV) estimator, that ignores the spatial correlation of the 
errors, is consistent and ?N-consistent, where N denotes the cross-sectional dimension. 
We then extend the generalized moments estimator introduced by Kelejian and Prucha 
(1999) for estimating the spatial autoregressive parameter and show that if it is based 
on a ?N-consistently estimated disturbances, it will also be consistent. Finally, we 
derive a large sample distribution of a second stage generalized method of moments 
(GMM) estimator based on a consistent estimator of the spatial autoregressive 
parameter. We also present results from a small Monte Carlo study to illustrate the 
small sample performance of the proposed estimation procedure. 
 
JEL Classification and Keywords: Cross-Sectional Models; Spatial Models (C21), 
Models with Panel Data (C23); Dynamic panels, Spatial Autocorrelation 
 
 
 
DYNAMIC PANEL DATA MODELS WITH SPATIALLY CORRELATED 
DISTURBANCES 
 
 
by 
 
Jan Mutl 
 
 
 
 
Dissertation submited to the Faculty of the Graduate School of the 
University of Maryland, College Park in partial fullfilment 
of the requirements for the degree of 
Doctor of Philosophy 
2006 
 
 
 
 
 
 
 
 
Advisory Committee: 
 
Professor Ingmar Prucha, Chair 
Professor John Chao, Co-Chair 
Professor Harry Kelejian 
Professor John Rust 
Professor Francis Alt 
 
Contents
1Introduction 1
2ReviewofLiterature 3
2.1 Dynamic Panel Data Models.................... 3
2.1.1 GMM Estimation . . .................... 6
2.1.2 Bias Correction . . . .................... 20
2.1.3 MDandMLEstimation .................. 21
2.2 ModellingCross-SectionalDependence .............. 32
2.2.1 Model Specifications.................... 34
2.2.2 Estimation ......................... 40
2.3 Space-Time Models . . . . ..................... 44
2.3.1 Space-Time Autoregressive Moving Average
(STARMA)Models..................... 46
2.3.2 Models with Contemporaneous Spatial Correlation . . . . 50
3Model 58
3.1 Model Specification......................... 59
3.2 Model Implications . . . . ..................... 63
4 Estimation and Inference 68
4.1 Initial IV Estimation ........................ 68
4.2 Estimation of the Degree of SpatialAutocorrelation........ 79
4.3 SecondStageGMMEstimation .................. 87
ii
4.3.1 OptimalWeightingMatrix................. 94
4.3.2 FeasibleGMMEstimator.................. 97
5 Monte Carlo Study 106
5.1 Estimators Considered . ......................106
5.1.1 Initial Estimators . .....................107
5.1.2 SpatialParameterEstimators................110
5.1.3 SecondStageGMMEstimators ..............111
5.2 Data Generation . . . . . ......................113
5.3 Designs Considered . . . . .....................114
5.4 Tables of Results . . . . ......................117
5.5 Conclusions and Comparison with Other Studies . . . . . . . . . 118
6 Directions for Future Research 122
A Appendix: Central Limit Theorem for Vectors of Linear Quadratic
Forms 123
B Appendix: Proof of Claims in Chapter 3 131
C Appendix: Proofs for Chapter 4 140
C.1 Proofs for Section 4.1 . . ......................142
C.2 Proofs for Section 4.2 . . ......................153
C.3 Proofs for Section 4.3 . . ......................159
D Appendix: Tables of Monte Carlo Results 182
iii
E Appendix: Symbols and Notation Used 210
F Appendix: Inequalities 214
F.1 Deterministic Inequalities......................214
F.2 Stochastic Inequalities . ......................214
iv
 v
LIST OF TABLES 
 
Table 1: Consistency of ML Estimation       28 
Table 2: Estimators Considered        110 
Table D1: Initial IV Estimators of ?       183 
Table D2: Second Stage GMM Estimators of ?      188 
Table D3: Unweighted Spatial GM Estimators of ?     193 
Table D4: Weighted Spatial GM Estimators of ?      198 
 vi
LIST OF FIGURES 
 
Figure 1: QQ Plot of IV Estimator AH1      203 
Figure 2: QQ Plot of IV Estimator AH2      203 
Figure 3: QQ Plot of IV Estimator AB      204 
Figure 4: QQ Plot of GMM Estimator AB Ignoring Spatial Correlation  204 
Figure 5: QQ Plot of GMM Estimator AB based on V
mix
    205 
Figure 6: QQ Plot of GMM Estimator AB based on V
E
    205 
Figure 7: Normal Probabilty QQ Plot      206 
Figure 8: Student t Probability QQ Plot      206 
 
1Introduction
This thesisconsiders estimationof paneldata modelswhenthe dependentvariable
isallowedtobecorrelatedinbothdimensions. Usinganaturalterminology,I
investigatemodelsinwhichthereiscorrelation both across time and between
the cross-sectional units. Although there might be many ways to write down such
model, Ichoosetoconcentrateonconcretespecificationthatarisesasanextension
of the existing literature on dynamic panel data models and on spatial modelling.
In doing so, I hope to offer a useful synthesis of the two strands of the literature.
My model is applicable to situations where the number of time periods over which
the data are observed is limited.
1
In the next chapter, I review the existing literature related to this topic. I first
focus on theoretical contributions to dynamic panels estimation methods, then
briefly outline the specifications used in spatial econometrics, and close with a
review of papers that have used specifications in which time and space are inter-
actinginanontrivialway.
Chapter 3 will then spell out the specification I chose to concentrate on. It will
also provide the generalassumptions maintained throughout thethesis and discuss
some implication of the model.
In Chapter 4, I provide an outline of several estimation methods and provide
a formal statements of their asymptotic properties. I start with an initial instru-
1
Of course, if the time dimension of the panel is sufficiently large, one can consider, for ex-
ample, a seemingly unrelated regression model that allows for a fairly general specification of the
correlation pattern in the cross-sectional dimension.
1
mental variable (IV) technique suggested by Anderson and Hsiao (1981) to esti-
mate the slope coefficients of the model. Although this method ignores possible
cross-sectional correlation in the data, I show that it is still consistent and asymp-
totically normal under the specification considered in this thesis. Next, I outline
a spatial generalized moments estimation technique that estimates the degree of
cross-sectional dependence in the disturbances. The method was suggested by
Kapoor et al. (2005) for a static model and is based on Kelejian and Prucha
(1999). I extend the proofs in Kapoor et al. (2005) for the dynamic case. The last
step of the proposed estimation method consists of a generalized method of mo-
ments (GMM) estimation of the slope coefficients. I discuss the optimal choice
weighting matrix for a given set of moment conditions. I provide formal large
sample results for a generic GMM estimator based on linear moment conditions
with stochastic instruments. I also provide formal large sample properties of a
feasible GMM estimator and its small sample covariance matrix approximation.
In Chapter 5, I investigate small sample properties of the different estimation
method via a Monte Carlo study. I also provide some simulation evidence that
supports the formal large sample claims made in the thesis.
2
2ReviewofLiterature
The purpose of this review is not to provide a comprehensive treatment of the
econometric work that has been done on panel data methods. For such there are
excellent book-length works, such as Hsiao (2003) or Baltagi (2002). Instead, I
will provide a more in depth review of the theoretical work that has been done
on dynamic panel data models on the one hand and then review the literature
relaxing the assumption of independently and identically distributed (iid) errors
both in panel and purely cross-sectional setting.
It proves to be useful to introduce the following notational conventions: I use
bold letters for matrices and vectors, and regular font letters to denote scalars.
Furthermore, I use lower case letters for vectors and upper case letters for matri-
ces. In general I will denote the cross-sectional dimension of the panel as N and
thetimedimensionasT.
2.1 Dynamic Panel Data Models
Models with individual effects and limited time dimension face the problem of in-
cidental parameters. Hence these are estimated after a suitable transformation that
removes the individual effects. In most cases this would be after first differencing.
If the model also includes a lagged endogenous variable, the first difference of the
error term will then be correlated with the explanatory variables. It has been long
recognized in the literature that in this situation, the ordinary least squares (OLS)
estimator will be biased, see, e.g., Trognon (1978) for an analytical treatment,
3
or Nerlove (1967 and 1971) who explores the properties of the bias of the OLS
estimation by Monte Carlo work. Trognon (1978), Nickell (1981) and Sevestre
and Trognon (1985) derive analytical expressions for the asymptotic biases of the
OLS estimator of an autoregressive panel data models with fixed time dimension.
Small sample bias correction has also been suggested by Kiviet (1995).
The bias of the OLS estimation also resulted in attention to other estimation
methods. Hence Anderson and Hsiao (1981, 1982) discuss maximum likelihood
(ML) estimation of various model specifications and provide a comprehensive
classification of the different conceptual possibilities of dynamic panel data mod-
els. They also suggest a simple instrumental variables (IV) estimator that is con-
sistent. Bhargava and Sargan (1983) provide a framework for maximum likeli-
hood estimation for a panel with lagged dependent variable and individual effects.
As an alternative, Chamberlain (1982) proposed a minimum distance (MD) type
of estimator for distributed lag models with heterogenous coefficients.
The subsequent developments have shifted attention to generalized method of
moments (GMM) estimators that utilize linear moment conditions. The literature
has focused on exploiting as many possible moment conditions while keeping
the resulting GMM estimator linear. Most of the large sample results are usually
backed by a reference to ?standard central limit theorems? or assumed to follow
from the general results on the asymptotic properties of GMM estimators in, for
example, Hansen(1982). The(non)optimalityofutilizingredundantmomentcon-
ditions has also not been explored in detail. Papers in this line of research include
Arellano and Bond (1991), Arellano and Bover (1995), Ahn and Schmidt (1995)
4
and Blundell and Bond (1998). The use of all lags as available instruments was
suggestedbyHoltz-Eakin,NeweyandRosen (1988). Keane and Runkle (1992)
provide an alternative method of exploiting the moment conditions.
2
Large sam-
ple results for the GMM estimators are in Alvarez and Arellano (2003), while
Harris and Tzavalis (1999) obtain the limiting distributions of pooled OLS, the
within-group (WG) and WG with individual trends estimators, under the null of a
unit root and normally distributed errors. Observe that, as noted by Kiviet (1995)
and Judson and Owen (1999), the number of possible instruments used by the
GMM estimators increases with T
2
, the GMM estimators may perform poorly in
samples with moderate and large T.
More recently several authors have proposed maximum likelihood and quasi-
maximum likelihood (ML and QML) procedures arguing that these are compu-
tationally feasible and providing some Monte Carlo evidence of improved small
sample performance even for non-normal errors. See the papers by Hsiao, Pe-
saran and Tahmicsioglu (2002) and Binder, Hsiao and Pesaran (2000) discussed
below. Some further Monte Carlo evidence is provided by Binder, Hsiao, Mutl
and Pesaran (2002).
Below I will review papers on the GMM, bias corrected OLS, MD and ML
estimation mentioned above and compare the various model specifications, as-
sumptions on the disturbance process involved and estimation methods. When
required, I modify the original notation to make the comparison feasible.
2
They propose to transform the model by a Cholesky decomposition of an initial estimate of
the variance covariance matrix and use the untransformed instruments in the second step of the
estimation. See below for a more detailed review.
5
2.1.1 GMM Estimation
I will now review the papers proposing GMM type of estimators in more detail.
The model under consideration can be written as
y
it
= ?y
i,t?1
+x
it
?+u
it
,t=1,..,T, i =1,...,N, (2.1.1)
where y
it
and x
it
denote the (scalar) dependent variable and the 1 ? p vector of
exogenous variables corresponding to cross sectional unit i in period t, ? and ?
represent corresponding 1 ? 1 and p ? 1 parameters, and u
it
= ?
i
+ ?
it
denotes
the overall disturbance term consisting of individual effects ?
i
and an innovation
?
it
. Under different assumptions on the disturbance process we obtain different
possible moment restrictions that are exploited by the estimator. The proposed
estimator also differs under different exogeneity assumptions on the p? 1 vector
of explanatory variables.
Arellano and Bond (1991) assume that the error terms are distributed as
?
i
?IID
?
0,?
2
?
?
, (2.1.2)
and
?
it
?IID
?
0,?
2
?
?
, (2.1.3)
6
independent of each other.
3
Because the disturbances as well as the endogenous
variable contain individual effects, they will be correlated when interacted in lev-
els. Therefore, the moment conditions considered involve first differences of the
disturbances and in particular they are
E[(u
it
?u
i,t?1
)y
i,t?k
]=0,t=2,..,T, k =2,..,t?1 i =1,...,N,
(2.1.4)
and with strictly exogenous variables also
E[x
0
is
(u
it
?u
i,t?1
)] = 0
p?1
,t=2,..,T, s =1,..,T i=1,...,N,
(2.1.5)
while with the variables being only predetermined these conditions hold only for
s =1,..,t?1.
Stacking the model by grouping the observation first by time and then by in-
dividuals
4
we can write the first differenced model (after dropping the initial ob-
servation) as
?y
(T?1)N?1
= ?Z?
(T?1)N?2
?
2?1
+ ??
(T?1)N?1
, (2.1.6)
3
These assumptions are not formally stated in the paper. However, the asymptotic claims are
based on the iid assumptions.
4
This stacking is commonly used in the literature on dynamic panel. Observe, however, that
we will use a different order of stacking in our model presented in later chapters.
7
where?Z =[?y
?1
,X] with
?y =
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
y
12
?y
11
.
.
.
y
1T
?y
1,T?1
.
.
.
y
N2
?y
N1
.
.
.
y
NT
?y
N,T?1
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
, ?y
?1
=
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
y
11
?y
10
.
.
.
y
1,T?1
?y
1,T?2
.
.
.
y
N1
?y
N0
.
.
.
y
N,T?1
?y
N,T?2
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
,
?X =
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
x
12
?x
11
.
.
.
x
1T
?x
1,T?1
.
.
.
x
N2
?x
N1
.
.
.
x
NT
?x
N,T?1
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
, ?? =
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
12
??
11
.
.
.
?
1T
??
1,T?1
.
.
.
?
N2
??
N1
.
.
.
?
NT
??
N,T?1
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
. (2.1.7)
We can define the matrix of instruments as H =(H
0
1
,...,H
0
N
)
0
whereforthecase
8
of strictly exogenous variables we have
H
0
i
=
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
y
i0
x
i1
.
.
.
x
i,T
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
y
i0
y
i2
x
i1
.
.
.
x
i,T
?
?
?
?
?
?
?
?
?
?
?
?
.
.
.
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
y
i0
.
.
.
y
i,T?2
x
i1
.
.
.
x
i,T
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
. (2.1.8)
The proposed estimator is of the form
b
? =
?
?Z
0
HA
?1
H
0
?Z
?
?1
?Z
0
HA
?1
H
0
?y, (2.1.9)
whereAis some weights matrix for the moments. More specifically, the first step
9
of the estimation uses a simple weighting matrix
A =
N
X
i=1
H
0
i
DD
0
H
i
(2.1.10)
= H
0
(I
N
?DD
0
)H,
where D is a T ?1 ?T first difference operator matrix:
D =
?
?
?
?
?
?
?
?
?
?11 0??? 0
0 ?11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
0 ??? 0 ?11
?
?
?
?
?
?
?
?
?
T?1?T
. (2.1.11)
In thesecond step themoment conditions areweightedbytheir estimatedvariance
covariance matrix and the authors propose to use
A =
N
X
i=1
H
0
i
?bu
i
?bu
0
i
H (2.1.12)
= H
0
"
I
N
?D
?
N
X
i=1
bu
i
bu
0
i
!
D
0
#
H,
where ?bu
i
=(?bu
i2
,...,?bu
iT
)
0
and bu
i
=(bu
i1
,...,bu
iT
)
0
are the fitted residuals
from the first step estimator.
Arellano and Bover (1995) consider a general nonsingular transformation of
the model that removes the individual effects. Consider again the model in (2.1.1)
10
and let K be any (T ?1) ? T transformation matrix of rank (T ?1) such that
Ke
T
= 0
T?1
,wheree
T
is a T ? 1 vector of ones. That is, the transformation by
K is nonsingular and removes the individual effects. Hence K can, for example,
be the matrix D considered above, or be equal to the ?Within Group? operator,
with
K =
?
?
?
?
?
?
?
?
?
?
1?
1
T
?
?
1
T
??? ?
1
T
?
1
T
?
1
T
?
1?
1
T
?
??? ?
1
T
?
1
T
.
.
.
.
.
.
.
.
.
.
.
.
?
1
T
?
1
T
???
?
1?
1
T
?
?
1
T
?
?
?
?
?
?
?
?
?
. (2.1.13)
ArellanoandBover(1995)alsosuggesttheorthogonaldeviationsoperatordefined
as:
K =
?
?
?
?
?
?
?
?
?
?
?
?
1 ?
1
(T?1)
?
1
(T?1)
??? ?
1
(T?1)
?
1
(T?1)
?
1
(T?1)
01?
1
(T?2)
??? ?
1
(T?2)
?
1
(T?2)
?
1
(T?2)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
00 0??? 1 ?
1
2
?
1
2
00 0??? 01?1
?
?
?
?
?
?
?
?
?
?
?
?
. (2.1.14)
Thistransformationsubtractsthemeanoffutureobservationsavailableinthesam-
ple from the first T ?1 observations.
The transformed model is then
(I
N
?K)y =(I
N
?K)Z? +(I
N
?K)?, (2.1.15)
11
If the transformation matrix is upper triangular and the disturbances ?
it
are not
serially correlated, then the same moment conditions as consider by Arellano and
Bond (1991) remain valid for the transformed model. Arellano and Bover (1995)
then show that the resulting GMM estimator is in fact invariant to the choice of
the transformation matrix.
If the exogenous variables are uncorrelated with the individual effects, Arel-
lano and Bover (1995) also suggest the use of additional moment conditions in the
form of
E
"?
1
T
T
X
t=1
u
it
!
x
is
#
= 0
p?1
. (2.1.16)
In this case the transformation matrix is appended with a row consisting of e
T
/T
and can be denoted as:
C =
?
?
?
K
e
T
/T
?
?
?
. (2.1.17)
12
The instrument matrixH
i
becomes
H
0
i
=
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
y
i?0
x
i1
.
.
.
x
i,T
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
y
i0
y
i2
x
i1
.
.
.
x
i,T
?
?
?
?
?
?
?
?
?
?
?
?
.
.
.
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
y
i0
.
.
.
y
i,T?2
x
i1
.
.
.
x
i,T
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
x
i1
.
.
.
x
i,T
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
. (2.1.18)
13
The GMM estimator of Arellano and Bover (1995) can then be expressed as
b
? =
?
Z
0
(I
N
?C
0
)HA
?1
H
0
(I
N
?C)Z
?
?1
Z
0
(I
N
?C
0
)HA
?1
H
0
(I
N
?C)y.
(2.1.19)
The preliminary estimates are obtained with A = H
0
(I
N
?CC
0
)H and the sec-
ond stage estimator uses consistently with (2.1.12):
A = H
0
"
I
N
?C
?
N
X
i=1
bu
i
bu
0
i
!
C
0
#
H, (2.1.20)
where bu
i
are the fitted residuals from the preliminary estimation. Given that the
estimator is invariant to the choice of the transformation matrix, the filtering is in
fact irrelevant and the estimator can be obtained by performing three stage least
squares (3SLS).
AhnandSchmidt(1995) show that there are additional moment conditions that
can be exploited. Ahn and Schmidt also make weaker assumptions that lead to the
set of moment restriction utilized by the Arellano and Bond (1991) and Arellano
and Bover (1995) estimators. In particular, Ahn and Schmidt assume that the
disturbances satisfy:
Cov(?
it
,y
i0
)=0,t=1,..,T (2.1.21)
Cov(?
it
,?
i
)=0,t=1,..,T
Cov(?
it
,?
is
)=0,t,s=1,..,T;t 6= s
14
The additional moment conditions pointed out by Ahn and Schmidt are
E[u
iT
(?
it
??
i,t?1
)] = 0,t=2,..,T?1. (2.1.22)
These restrictions, together with the moment conditions utilized by the Arellano
and Bond (1991) estimator, represent all the moment conditions implied by the
assumption that the innovations ?
it
are mutually uncorrelated among themselves
and with ?
i
and y
i0
.
Ahn and Schmidt also point out that further restrictions can be derived from
homogeneity and stationarity assumptions. The assumption that the innovations
?
it
have a variance that does not change over time implies the following additional
moment restrictions:
E[y
i,t?2
??
i,t?1
?y
i,t?1
??
it
]=0,t=4,..,T. (2.1.23)
In a model without exogenous variables the homogeneity restrictions can be im-
plemented by utilizing the extended instrument set defined as
H
+
i
=
?
?
?
?
?
?
?
?
?
?
?
?
H
i
y
i2
?y
i3
y
i3
?y
i4
.
.
.
.
.
.
y
i,T?2
?y
i,T?3
?
?
?
?
?
?
?
?
?
?
?
?
, (2.1.24)
15
where H
i
is the Arellano and Bond instrument matrix for the case without exoge-
nous variables, i.e.
H
i
=
?
?
?
?
?
?
?
?
?
y
i0
y
i0
y
i1
.
.
.
y
i0
??? y
i,T?2
?
?
?
?
?
?
?
?
?
. (2.1.25)
Ahn and Schmidt show that the GMM estimator based on the full set of mo-
ment restrictions is asymptotically equivalent to Chamberlain?s (1982, 1984) opti-
mal minimum distance estimator and that it reaches the semiparametric efficiency
bound.
Blundell and Bond (1998) document a potential gain in efficiency arising from
exploiting restrictions on the initial observations when the time dimension of the
panelissmallandthedegreeofautocorrelationishigh. Theestimationapproaches
discussedsofarusuallydropthefirst observation. WithN going to infinity andT
fixed this amounts to ignoring information from a fixedproportionofthesample
andthuscanleadtosizeableinefficiency.
In their simulation study Blundell and Bond consider two types of additional
restrictions. The first type of restriction justifies the use of an extended linear
GMM estimator that uses lagged differences of y
it
as instruments for equations
in levels (in addition to lagged levels of y
it
as instruments for equations in first
differences). The second type of restriction validates the use of the error compo-
16
nents GLS estimator on an extended model that conditions on the observed initial
values. This provides a consistent estimator under homoscedasticity which, under
normality, is asymptotically equivalent to conditional maximum likelihood (see
also Blundell and Smith, 1991).
Ina model without exogenous variables, Blundell and Bond show that after re-
moving redundant restrictions the extended GMM estimator they consider utilizes
the following instrument matrix:
H
++
i
=
?
?
?
?
?
?
?
?
?
H
+
i
?y
i2
.
.
.
?y
i,T?1
?
?
?
?
?
?
?
?
?
, (2.1.26)
where H
+
i
is the instrument matrix employedbytheAnhandSchmidtestimator
and is defined in (2.1.24) above.
Their Monte Carlosimulations andasymptoticvariancecalculations showthat
this extended GMM estimator offers considerable efficiency gains in situations
where the basic GMM estimator performs poorly. The GLS estimator that con-
ditions on the initial values is also found to have good finite sample properties.
However, the conditional GLS estimator requires homoscedasticity, and only ex-
tends to a model with regressors if the regressors are strictly exogenous which is
not the case for the GMM estimators.
The efficiency gain from incorporating the information in the initial observa-
tionisalsodocumentedbyasimulation study of Hahn (1999).
17
Alvarez and Arellano (2002) consider the same model (2.1.1) with |?| < 1 and
E(?
it
|?
i
,y
i0
,...,y
it?1
)=0.Theyassumey
i0
is also observed. To derive asymp-
totic results they assume that ?
it
for t =1,...,T and i =1,...,N are independent
and identically distributed across time and individuals and independent of ?
i
and
y
i0
, with E(?
it
)=0, Var(?
it
)=?
2
and finite fourth moments. Additionally
they assume that the initial observation are generated as
y
i0
=
?
i
1??
+
?
X
j=0
?
j
?
i,?j
. (2.1.27)
The article than establishes asymptotic properties of the ?Within Group? es-
timator, the GMM estimator, and the Limited Information Maximum Likelihood
(LIML) estimator when both T and N tend to infinity. The WG estimator can be
obtained by OLS estimation on the model transformed by the forward orthogonal
means transformation (see above Arellano and Bover, 1995). The GMM estima-
tor in their terminology is what I describe above as the first stage GMM estimator
on a model transformed by the orthogonal deviations transformation, using the
moment conditions of Arellano and Bond (1991). The second stage GMM esti-
mation with an estimated weighting matrix is not considered. Note that my results
contain this extension as a special case. See Chapter 4.
The LIML estimator is what has been suggested by Alonso-Borrego and Arel-
lano (1999) as a symmetrically normalized GMM estimator. It can also be re-
garded as a ?continuously updated GMM estimator? in terminology of Hansen,
18
Heaton and Yaron (1997).
5
The estimator is only an analogue LIML estimator
in the sense of the minimax instrumental variable interpretation given by Sargan
(1958) to the original LIML estimator. It is defined as
b
? =argmin
?
(y?Z?)
0
(I
N
?C
0
)H(H
0
H)
?1
H
0
(I
N
?C)(y?Z?)
(y?Z?)
0
(I
N
?C
0
)(I
N
?C)(y?Z?)
, (2.1.28)
where His an instrument matrix.
Alvarez and Arellano show that the asymptotic bias of the WG estimator only
disappears when N/T ? 0.WhenN/T tends to a positive constant, all three
estimators are asymptotically biased with negative asymptotic biases of order 1/T
, 1/N,and1/(2N ?T), respectively. When N/T tends to infinity, the fixed
T results assumed by the GMM literature remain valid. They also consider a
random effects maximum likelihood estimator that leaves the mean and variance
of the initial conditions unrestricted and show that this estimator is asymptotically
unbiased for all cases.
KeaneandRunkle(1992) suggestanalternativeestimationprocedurethattakes
into account the variance covariance structure of the disturbances. First the model
is estimated by an initial procedure, such as the instrumental variables (IV) with
instruments that could, for example, be the instruments suggested by Arellano and
Bond (1991). Then an estimate of the inverse of the variance covariance matrix
and its Cholesky decomposition is calculated. The model is then transformed and
5
Instead of keeping ?
2
fixed in the weighting matrix of the GMM criterion, it is continuously
updated by making it a function of the argument in the estimating criterion.
19
estimated with original (untransformed) instruments, i.e.
b
? =
h
Z
0
?
I
N
?
b
P
0
?
HA
?1
H
0
?
I
N
?
b
P
?
Z
i
?1
(2.1.29)
?Z
0
?
I
N
?
b
P
0
?
HA
?1
H
0
?
I
N
?
b
P
?
y,
where
b
P is Cholesky decomposition of the estimated inverse of the variance co-
variance matrix and A is moment weighting matrix that is chosen analogously to
the standard GMM estimators.
2.1.2 Bias Correction
Small sample bias correction procedure of the inconsistent OLS estimation has
been proposed by Kiviet (1995). Consider a dynamic panel data model as in
(2.1.1). The model in levels can be stacked as in (2.1.6)
y = Z? +(I
N
?e
T
)?+?, (2.1.30)
where Z =[y
?1
,X] with
y =(y
11
,...,y
1T
,...,y
N1
,...,y
NT
)
0
, (2.1.31)
y
?1
=(y
10
,...,y
1,T?1
,...,y
N0
,...,y
N,T?1
)
0
,
X =(x
11
,...,x
1T
,...,x
N1
,...,x
NT
)
0
,
? =(?
11
,...,?
1T
,...,?
N1
,...,?
NT
)
0
,
? =(?
1
,...,?
N
)
0
.
20
The within group estimator is defined as
b
? =(Z
0
AZ)
?1
Z
0
Ay, (2.1.32)
where the NT ?NT within group transformation matrix Ais defined as
A = I
N
?
?
I
T
?
e
T
e
0
T
T
?
. (2.1.33)
Kiviet (1995) calls this estimator Least-SquaresDummyVariables(LSDV)
while Anderson and Hsiao (1981) refer to is as Covariance (CV) estimator. This
estimator is inconsistent for fixed T due to presence of individual effects in both
the disturbances ? and the regressors y
?1
. Although consistent estimates can be
obtained by IV or GMM procedures, the inconsistent LSDV estimator has a rel-
atively low variance and hence can lead to an estimator with lower root mean
square error after the bias is removed. Theasymptoticformulaeforthebiasgiven
in Nickell (1981) for a model with no exogenous regressors has been found to be
accurate in small samples, except for large values ?. Similar results have been
reported by Sevestre and Trognon (1985). Kiviet (1995) provides approximating
formulae for the small sample bias that have robust performance over the entire
range of parameters.
2.1.3 MD and ML Estimation
Chamberlain?s (1982, 1984) proposed to treat each timeperiodasanequation
in a multivariate equation framework. Such approach is robust to certain kinds
21
of heteroscedasticity as well as autocorrelation in the errors without imposing a
priori restrictions on the variance covariance matrix.
To demonstrate the method assume for simplicity that the model is:
y
it
= x
0
it
? +?
i
+?
it
t =1,..,T; i =1,...,N, (2.1.34)
and
E(?
it
|x
i1
,...,x
iT
,?
i
)=0, (2.1.35)
where the p ? 1 vector of explanatory variables is assumed to be stochastic and
hence the model also covers the lagged dependent variable case. The variables
can be stacked by grouping observations for each individual into a vector y
i
=
(y
i1
,...,y
iT
)
0
and x
i
=(x
i1
,...,x
iT
)
0
.Assumethat(y
i
,x
i
) is an independent
draw from a common unknown multivariate distribution with finite fourth-order
moments and with E(x
i
x
0
i
) positive definite. The individual effects are possibly
correlated with the explanatory variables. Chamberlain (1984) assumes that the
minimum-mean-squared-error linear projection of ?
i
onto x
i
is given by
6
E
?
(?
i
|x
i
)=?+
T
X
t=1
a
0
t
x
it
. (2.1.36)
6
If the conditional expectation of ?
i
are linear, we have E
?
(?
i
|x
i
)=E (?
i
|x
i
).
22
The model can be rewritten as
E
?
(y
i
|x
i
)=E
?
{E
?
(?
i
|x
i
,?
i
)|x
i
} (2.1.37)
= E
?
{?
i
e
T
+(I
T
??
0
)x
i
|x
i
}
= ?
i
e
T
+?x
i
,
and
y
i
= ?
i
e
T
+(I
T
?x
i
)? +?
i
, (2.1.38)
where
? = I
T
??
0
+e
T
(a
0
1
,...,a
0
T
), (2.1.39)
and?
i
= y
i
?E
?
(y
i
|x
i
),and? = vec(?).
The proposed estimation procedure is then as follows. Treating the coef-
ficients in the above equation as unrestricted, one first obtains initial (usually
least-squares) estimate b? of ?. In the second step, the restrictions on ? in
(2.1.39) are incorporated by letting?be a function of the parameters of the model
? =(?
0
,a
0
1
,..,a
0
T
). The restrictions are imposed by using a minimum-distance
estimator, specifically
b
? =argmin
?
[b???(?)]
0
b
?[b???(?)], (2.1.40)
where
b
? is the estimated variance covariance matrix of the asymptotic variance
23
of b?:
b
? =
1
N
N
X
i=1
nh
(y
i
?y)?
b
?(x
i
?x)
i
(2.1.41)
h
(y
i
?y)?
b
?(x
i
?x)
i
0
?S
?1
XX
(x
i
?x)(x
i
?x)
0
S
?1
XX
?
,
where
S
?1
XX
=
1
N
N
X
i=1
(x
i
?x)(x
i
?x)
0
. (2.1.42)
Anderson and Hsiao (1981) consider the model (2.1.1) with |?| < 1.They
distinguishfourdifferentcasesbasedondifferentassumptionsontheinitialvalues
of the process (y
i0
):
? Case I. Fixed initial observations: y
i0
are fixed observed constants
? Case II. Random initial observations, common mean:
y
i0
= c+?
i
(2.1.43)
where ? has a mean zero and a finite variance and is independent of ?
i
and
?
it
. Here they also suggest that one could assume
y
i0
= c+?
i
(2.1.44)
so that the initial endowment affects the level.
24
? Case III. Random initial observations, different means (in this case there the
incidental parameter problem arrises and for fixed T the MLE is inconsis-
tent): the model is
y
it
= w
it
+?
i
t =0,1,..,T, (2.1.45)
w
it
= ?w
i,t?1
+?
it
t =1,..,T, (2.1.46)
where w
it
and ?
i
are unobservable. In this case w
i0
are unknown constants.
? Case IV. Random initial observations with stationary distribution: same as
above but w
i0
are(a)drawsfromstationarydistribution with mean zero and
variance
var(?
it
)
1??
2
or(b)samebutthevarianceisarbitrary.Inthesubcase(a),
the y
it
come from the stationary distribution of the process.
To derive the likelihood function they assume normality of the error terms
?
it
, ?
i
and when applicable also y
i0
. Implicit assumption is that E(?
it
)=0and
Var(?
it
)=?
2
(uniform over individuals).
Anderson and Hsiao (1982) have
y
it
= ?y
i,t?1
+x
it
? +z
i
? +?
i
+?
it
t =1,..,T; i =1,...,N, (2.1.47)
25
where |?| < 1 and
E(?
i
)=E(?
it
)=E(?
i
z
i
)=E(?
i
x
it
)=E(?
i
?
it
)=0 (2.1.48)
t =1,..,T; i =1,...,N,
and E
?
?
i
?
j
?
= ?
2
?
for i = j and =0for i 6= j,
E(?
it
?
js
)=?
2
?
i = j, t = s, (2.1.49)
=0 otherwise
They also assume normality of ?
i
and ?
it
and first consider the model with only
time-invariant exogenous regressors. Again several cases are distinguished:
? (I) y
i0
is fixed
? (II) y
i0
is random with
? (IIa) y
i0
independent of ?
i
,or
? (IIb) y
i0
correlated with ?
i
; in their wording ?If we wish the initial
endowment [y
i0
] affects the equilibrium level [
?
i
1??
] we may let?:
y
i0
= z
i
? +?
i
. (2.1.50)
? (III) (y
i0
??
i
) is fixed
? (IV) (y
i0
??
i
) is random with
26
? (IVa) variance
?
2
?
1??
2
? (IVb) unrestricted (but uniform over i)variance
Next Anderson and Hsiao consider the model with only time-varying regres-
sors and they offer two interpretations of the model:
(1) Serial correlation model:
y
it
= ?y
i,t?1
+x
it
???x
it
? +?
i
+?
it
. (2.1.51)
Here they again assume either that (y
i0
?x
i0
???
i
) is fixed, or random with zero
mean and variance
?
2
?
1??
2
.
(2) State dependence model:
y
it
= ?y
i,t?1
+x
it
? +?
i
+?
it
. (2.1.52)
As before, there is a variety of assumptions concerning y
i0
considered - the as-
sumption correspond exactly to cases I.-IV above, except that in case of IV they
distinguish whether (y
i0
??
i
) is random with
? ? (IVa) common mean and variance
?
2
?
1??
2
? (IVb) common mean and unrestricted variance
? (IVc) heterogeneous mean and variance
?
2
?
1??
2
? (IVd) heterogeneous mean and unrestricted variance
Table 1 below summarizes the consistency findings of Anderson and Hsiao:
27
Table 1. Consistency of ML Estimation
Case Estimated Parameters N fixed, T ?? T fixed, N ??
I. ?,?,?
2
?
Consistent Consistent
?,?
2
?
Inconsistent Consistent
II.a ?,?,?
2
?
Consistent Consistent
?,?
2
?
,?
2
y
0
,E(y
i0
) Inconsistent Consistent
II.b ?,?,?
2
?
Consistent Consistent
?,?
2
?
,?
2
y
0
Inconsistent Consistent
E(y
i0
),Cov(?
it
,?
i
)
III. ?,?,?
2
?
Consistent Inconsistent
?,?
2
?
,(y
i0
??
i
) Inconsistent Inconsistent
IV.a ?,?,?
2
?
Consistent Consistent
?,?
2
?
,E(y
i0
??
i
) Inconsistent Consistent
IV.b ?,?,?
2
?
Consistent Consistent
?,?
2
?
,E(y
i0
??
i
) Inconsistent Consistent
Var(y
i0
??
i
)
IV.c ?,?,?
2
?
Consistent Inconsistent
?,?
2
?
,E
i
(y
i0
??
i
) Inconsistent Inconsistent
Var(y
i0
??
i
)
IV.d ?,?,?
2
?
Consistent Inconsistent
?,?
2
?
,E
i
(y
i0
??
i
) Inconsistent Inconsistent
Var(y
i0
??
i
)
28
Bhargava and Sargan (1983) consider the dynamic panel data model with ex-
ogenous variable of essentially the same form as (2.1.1). They derive the maxi-
mum likelihood function under the assumption that the innovations and the indi-
vidual effects are normally and independently distributed with constant variances,
i.e. ?
it
? N (0,?
2
?
) and ?
i
? N
?
0,?
2
?
?
. The likelihood is derived first treating
the initial values y
i0
as exogenous and then as endogenous by assuming that the
initial values are generated from the stationary distribution of the process. In par-
ticular, they assume that y
i0
is generated by a series of equations (2.1.1) and can
be written as
y
i0
=
?
X
k=0
?
k
(x
i,t?k
? +?
i
+?
i,t?k
) (2.1.53)
= y
i0
+
?
i
1??
+
?
X
k=0
?
k
?
i,t?k
,
where y
i0
is exogenous part of the initial values and is in fact assumed to be
stochastic with y
i0
?N
?
y
?
i0
,?
2
y
0
?
, independent of ?
it
and ?
i
.
Hsiao, Pesaran and Tahmiscioglu (2002) consider the model (2.1.1) without
exogenous variables,
7
i.e.
y
it
= ?y
i,t?1
+?
i
+?
it
t =1,..,T; i =1,...,N, (2.1.54)
7
In the second part, the authors extend the model for both strictly and weakly exogenous vari-
ables.
29
withy
i0
observable. Under the assumption that the process has started at time?m
one can express the first difference of the initial observation as
?y
i1
= ?
m
?y
i,?m+1
+?
i
, (2.1.55)
where ?
i
=
P
m?1
j=0
?
j
??
i,1?j
. Hsiao, Pesaran and Tahmiscioglu then distinguish
two assumptions for the initial values of the process:
? Case (3.i) |?| < 1 andtheprocesshasbeengoingonforalongtime
(m ?? )andE(?y
i1
)=0, Var(?y
i1
)=2
Var(?
it
)
1+?
, Cov(?
i
,??
i2
)=
?Var(?
it
) and Cov(?
i
,??
it
)=0for t =3,4,...,T.
? Case (3.ii) m is finite and E(?y
i1
)=b, Var(?y
i1
)=c ? var(?
it
),
where c>0, Cov(?
i
,??
i2
)=?Var(?
it
) and Cov(?
i
,??
it
)=0for
t =3,4,...,T.
In both cases, the maximum likelihood function is then derived for the model
in first differences under the assumption that the error terms are normally distrib-
uted with ?
it
? N (0,?
2
?
). They also show that the ML function is invariant to
the choice of transformation that is used to remove the individual effects from the
model.
Hsiao, Pesaran and Tahmiscioglu also define a minimum distance estimator
and show that if it ignores the initial conditions, it will be inconsistent when T
is fixed. They also study the relationship of the ML estimator the the GMM esti-
mators suggested by Arellano and Bond (1991), Arellano and Bover (1995), and
30
Ahn and Schmidt (1995). Conditional on ?
2
?
and the variance of the initial obser-
vations, Hsiao, Pesaran and Tahmiscioglu show that the difference between the
asymptotic variance covariance matrix of the GMM and the ML (or MD) estima-
tors will be positive definite. They conjecture that the same holds even when ?
2
?
and the variance of the initial observations is unknown and document this by a
Monte Carlo study.
Binder, Hsiao and Pesaran (2000) consider a multivariate extension of the dy-
namic panel data model. Their specification is
w
it
= ?
i
+?t+?[w
i,t?1
??
i
??(t?1)] +?
it
, (2.1.56)
wherey
it
,?
i
,? and?
it
arem?1 vectors and?is anm?mmatrix. They define
y
it
= w
it
??
i
??t and hence the model becomes
y
it
= ?y
it
+?
it
(2.1.57)
They assume that the model started as time t = ?M, M ? 0 and the initial
deviations are given by
y
i,?M
=
?
X
j=0
?
?
j
?C
?
?
i,?M?j
+C?
i
, (2.1.58)
31
where ?
it
, i =1,2,...,N; t ? T, are i.i.d. across i and over t,and?
i
are i.i.d.
across i with
E
?
?
?
?
it
?
i
?
?
?
=0 and Var
?
?
?
?
it
?
i
?
?
?
=
?
?
?
??
?
0
z
?
?
?
. (2.1.59)
The matrix C is defined recursively as C =
P
?
j=0
C
j
where C
0
= I
m
, C
1
=
??I
m
, C
j
= C
j?1
?, j ? 2. Notice that for m =1,theC canonlybezeroor
one.
Binder, Hsiao and Pesaran then derive the quasi maximum likelihood function
for the model under the assumption the disturbances are {?
it
} and {?
i
} are mu-
tually independent and identically distributed. The authors also extend the GMM
and MD estimators to the multivariate context and provide simulation evidence
that the QML estimator dominates the GMM and MD procedures even when the
underlying disturbances are not normal.
8
Binder, Hsiao, Mutl and Pesaran (2002)
discuss the same model but with higher order autocorrelation structure and pro-
vide further Monte Carlo evidence.
2.2 Modelling Cross-Sectional Dependence
When T is large and N small, one does not have to parametrically specify the
cross sectional interdependencies and can allow for arbitrary covariance structure
of the disturbances. The model can then be consistently estimated by a general-
8
The authors consider a case where the underlying disturbances are drawn from a zero mean
chi-square distribution.
32
ized least squares method. This is what Zellner (1962) refers to as the seemingly
unrelated regressions (SUR) specification. On the other hand, observe that the
dimensions of the variance covariance matrix of the dependent variable (or dis-
turbances) grows with sample size (number of cross-sections). Therefore, when
the time dimension of the data is limited or fixed, it becomes impossible to in-
fer the cross-sectional covariance structure of the model without imposing some
parametric restrictions.
Typically the interaction among the cross-sectional units is modelled as pro-
portional to some observable distance. The most widely used parameterization
are variants of the one considered by Cliff and Ord (1973 and 1981) which I re-
view below. Recent applications include Audretsch and Feldmann (1996), Bernat
(1996), Besley and Case (1995), Bollinger and Ihlanfeldt (1997), Buettner (1999),
Case (1991), Case, Hines, and Rosen (1993), Dowd and LeSage (1997), Holtz-
Eakin (1994), LeSage (1999), Kelejian and Robinson (2000, 1997), Pinkse and
Slade (1998), Pinkse, Slade, and Brett (2002), Shroder (1995), and Vigil (1998).
See also a host of other papers presented for example at the Spatial Economet-
rics Workshop in Kiel, 2005 (http://www.uni-kiel.de/ifw/konfer/spatial/spatial-
econometrics.htm).
In this thesis, I follow the spatial econometrics literature and study a first order
spatial autocorrelation model with a known spatial weighting matrix. The panel
spatial autocorrelation model is a generalization of the single cross-section mod-
els that include Cliff and Ord (1973, 1981), Whittle (1954), Anselin (1988) or
Kelejian and Prucha (1998, 1999 and 2004). See also Lee (2004) who provides
33
asymptotic properties of ML procedure for spatial models. Other recent theo-
retical developments include Baltagi and Li (2001a,b), Baltagi, Song and Koh
(2003), Conley (1999), Das, Kelejian and Prucha (2005), Kelejian and Prucha
(2001, 1997), Lee (2003, 2002, 2001a,b), LeSage (2000, 1997), Pace and Barry
(1997), Pinkse and Slade (1998), Pinkse, Slade, and Brett (2002), and Rey and
Boarnet (2004). An excellent review of the different specifications in spatial
econometrics can be found in Anselin (1988). See also Haining (1990) and refer-
ences therein.
2.2.1 Model Specifications
I will now present the basic specification of spatial dependence suggested in the
literature. The Cliff-Ord type model of spatial dependence can be written in the
following form. Suppose that we have a panel of observations in space, indexed
by i =1,...,N, and time, indexed by t =1,...,T. The disturbances
9
u
it,N
can
then be specified to follow a spatial autoregressive process in the form of:
u
it,N
= ?
N
X
j=1
w
ij,N
u
jt,N
+?
it,N
. (2.2.1)
The disturbance u
it,N
for a cross-section i at a time t consists of a weighted av-
erage of contemporaneous disturbances in other cross-sections and a mutually in-
dependent innovation term ?
it,N
. The weights w
ij,N
are assumed to be observable
quantities and, therefore, the extent of correlation in the model is a function of a
9
Of course spatial lags can also be applied to the endogenous or explanatory variables in the
same manner.
34
single parameter ?.
This model for spatial correlation wasintroducedbyCliffandOrd(1973,
1981). Anselin (1988) refers to this model as a first order spatial autoregres-
sive model or SAR(1). The weights w
ij,N
are referred to as spatial weights and
are assumed to be known, ? is called the spatial autoregressive parameter and
P
N
j=1
w
ij,N
u
jt,N
is referred to as a spatial lag. The spatial weights w
ij,N
are typ-
ically specified to be nonzero if cross sectional unit i relates to unit j in a mean-
ingful way. In such cases, units i and j aresaidtobeneighbors.Inpractice,the
spatial weights are often viewed as normalized in the sense that the summation
term in (2.2.1) is an average of neighboring observations. e.g. one postulates that
P
N
j=1
w
ij,N
=1.
A more general model can include spatial lags in the disturbances as well as
in the endogenous variable, denoted by y
it,N
,e.g.
y
it,N
= x
it,N
?+?
N
X
j=1
m
ij,N
y
jt,N
+u
it,N
, (2.2.2)
where x
it,N
is a vector of exogenous variables, ? is a vector of parameters, ? is
a scalar parameter, m
ij,N
are spatial weights, and the disturbance u
it,N
areasin
(2.2.1). The term
P
N
j=1
m
ij,N
y
jt,N
is then referred to as a spatial lag of the de-
pendent variable. The weights in the spatial lag of the dependent variable (m
ij,N
)
can, but do not necessarily have to, correspond to those in the spatial lag in the
disturbances (w
ij,N
).
Observe that all variables are indexed by the sample size N,e.g.theyform
35
triangular arrays. This also applies to situations where the spatial weight are spec-
ified as fixed constants. Observe that in many cases, it is assumed that each cross-
sectional location i has a fixed number of neighbors, say q,forwhichw
ij,N
6=0.
Hence each w
ij,N
is equal either to zero or a fixed number such as 1/q.Ob-
serve that even in such cases, the number of cross-sectional units determines the
number of units that enter into the solution of equation (2.2.1). As a result, the
disturbances u
it,N
that are solution to (2.2.1) have to be indexed by the sample
size. The fact that the disturbances u
it,N
are indexed by the sample size leads to
certain technical complications and, for example, one has tobe careful inapplying
central limit theorems and make sure that these also hold for triangular arrays.
Contiguity Weights The specifications where each units is, only affected by
its neighbors are sometimes referred to as contiguity weights. These could be
specifiedasw
ij,N
=1, whenthetwounitsareneighbors, andw
ij,N
=0otherwise.
DenotingW
N
theN?N matrixoftheweightsw
ij,N
, therow-normalizedweights
are then given by
W
?
N
= W
N
./(e
0
N
?W
N
e
N
), (2.2.3)
where e
N
is anN ?1 vector of ones and ./ denotes element-by-element division.
In practical applications, the definition of a neighbor often follows a nat-
ural geographical interpretation. Thus if the space in question is a geographical
space and the units of analysis are regions, two regions are classified as neighbors
when they share a common border. Other popular specifications of the contigu-
36
ity weights are rook, queen and related configurations. Suppose that the space
is divided in equally sized rectangular units. Below, I depict the rook and queen
configuration using one to indicate the units that are neighbors to the unit x and
zero to indicate other units that are not direct neighbors (these then correspond to
entries on the x?th row of the spatial weighting matrix W
N
):
rook :
00000
00100
01x 10
00100
00000
queen :
00000
01110
01x 10
01110
00000
(2.2.4)
An alternative is to assume that the spatial process has higher order components
and use so-called double-rook or double-queen specification, which could be:
double?rook :
00
1
2
00
0
1
2
1
1
2
0
1
2
1 x 1
1
2
0
1
2
1
1
2
0
00
1
2
00
double?queen :
1
2
1
2
1
2
1
2
1
2
1
2
111
1
2
1
2
1 x 1
1
2
1
2
111
1
2
1
2
1
2
1
2
1
2
1
2
(2.2.5)
Of course the choice of entries 1 and
1
2
is arbitrary and these can be replaced by
37
some other constants.
Another possibility is to assume that the cross-sectional units can be ordered
linearly in space (as an analogy to the linear ordering of observations in time).
The specification that is often referred to as q?ahead, r?behind (in terminology
of Kelejian and Prucha, 1999) uses the weights matrix W
(q,r)
N
consisting of zeros
except for entries of ones on the first q subdiagonals below the main diagonal and
entriesofonesonthefirst r subdiagonals above the main diagonal. For example,
the 2?ahead, 2?behind matrix is:
W
(2,2)
N
=
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
01 10??? 0
10 1
.
.
.
.
.
.
.
.
.
11
.
.
.
.
.
.
.
.
.
0
0
.
.
.
.
.
.
.
.
.
11
.
.
.
.
.
.
.
.
.
101
0 ??? 0110
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
. (2.2.6)
An alternative is to assume a circular ordering of the observation in space. In
this case, the q?ahead, r?behind weights matrices are as above but with added
nonzero entries in positions (i,N ?j) where i,j =0,..,q? 1 and (N ?k,l)
where k,l =0,..,r ? 1.Forthe2?ahead, 2?behind matrix, circularity implies
that the first unit is also a neighbor of unitsN andN?1, hence the added entry of
one in positions (N,1), (N ?1,1), (1,N),and(2,N). Additionally the second
and last unit (N)aswellasthefirst and (N ?1) ?th units are neighbors, and
hence the entries of one in positions (N,2) and (1,N?1).
38
Distance Based Weights When one views the cross-sectional observations as
being located in a space, the extent of direct correlation between the disturbances
at two locations can be interpreted as relatedtotheirdistancein the space under
consideration. Hence the weights can be interpreted as being (inversely) related
to some measure of distance among the observations. In practical applications
the space does not necessarily have to be a geographical space. The observations
can be located in an abstract space in which their proximity is a known function
of some of their observable characteristics. For example, two industries can be
considered to be ?close? to each other if they use a similar set of inputs, or two
countries can be ?close?iftheyhavereceivedfinancial flows from the same inter-
national lenders.
Under the interpretation of the weights w
ij,N
being inversely related to a dis-
tance measure, one is making an implicit assumption that the weights are sym-
metric in the sense that w
ij,N
= w
ji,N
. This is an artefact of the symmetry of
distance measures, i.e. the distance from i to j hastobeequaltothedistance
from j to i.
10
Observe, however, that the model considered here is more general.
In particular, I do not require the weights to be symmetric andw
ij,N
does not have
to be equal tow
ji,N
. This can be advantageous in situations where the spillover of
shocks is not necessarily symmetric. An example is the international transmission
of shocks, where a shock originating in a very small country cannot be plausibly
assumed to affect a large country in a same way as a shock originating in a large
10
Observe that the distance based weights can be adjusted (premultiplied) by a factor that ac-
counts for the differences in the direction of the influence. In this case the weights can become
asymmetric. Note that the specification in this thesis allows for such asymmetries.
39
country affects a small country (e.g. US shocks affect say Ecuador much more
than Ecuador?s shocks can affect the US).
The problem of symmetry of the spatial weights that are based on a distance
measure is related to a more general issue of aggregation. Suppose that the data
was generated for a larger (disaggregated) sample but is only observed for ag-
gregated spatial units. Mutl (2006) considers such data generating designs in a
Monte Carlo study and concludes that only specifications that adjust the spatial
weights for the relative size of the units deliver estimates that do not change with
the increases in the number of units observed in the sample. The appropriate mea-
sure of the size depends on the units of measurement of the endogenous variable.
For example, when the dependent variable is expressed as GDP per population,
then the spatial weights w
ij,N
should be a postmultiplied by the population of the
region i relative to the entire population of all regions in the sample. Construct-
ing the distance based spatial weights in this fashion takes automatically account
of the asymmetrical effects considered above. See also Giacomini and Granger
(2004) for related issue of forecasting an aggregate of spatially interrelated obser-
vations, and LeSage and Pace (2004) for dealing with missing values in models
with spatial dependence.
2.2.2 Estimation
TheestimationmethodformodelswithspatialautocorrelationsuggestedbyAnselin
(1988) or Anselin and Hudak (1992) was maximum likelihood (ML). The asymp-
totic properties of the ML estimator of a model such as (2.2.1) have been derived
40
only recently by Lee (2004) for one specific Cliff-Ord model. Furthermore, the
maximum likelihood function contains a Jacobian term that is a determinant of a
matrix that increases with the sample size N. Hence for moderate and large sam-
ple sizes, the ML estimation might become infeasible. As an alternative, Kelejian
and Prucha (1998) introduced spatial generalized moments (spatial GM) estima-
tor and proved its consistency. The asymptotic distribution of the spatial GM
estimator is derived in Kelejian and Prucha (2005). The spatial GM estimator is
computationally much simpler and, as a result, is feasible also for large sample
sizes.
The OLS estimation of a model with SAR disturbances is inefficient but re-
mains consistent. However, when spatial lags of the dependent variable are in-
cluded, as in (2.2.2), OLS estimation becomes biased since the stochastic regres-
sor
P
N
j=1
w
ij,N
y
jt,N
on the left hand side is correlated with the error term (endo-
geneity bias). However, an instrumental variable estimation with spatial lags of
the explanatory variable as instruments, will be consistent (Kelejian and Prucha,
1998). Alternative instrument sets are considered in Lee (2003) and Kelejian,
Prucha and Yuzefovich (2004).
The stacked version of the model given in (2.2.1) and (2.2.2) is
y
N
= X
N
? +?M
N
y
N
+u
N
, (2.2.7)
u
N
= ?W
N
u
N
+?
N
,
where y
N
is the N ? 1 vector of the dependent variable, X
N
is the N ?p matrix
41
of exogenous variables, M
N
and W
N
are N ?N spatial weighting matrices, u
N
and?
N
are the N ? 1 vectors of disturbances and innovations. Under appropriate
regularity conditions, the model can be solved as (see, for example, Das, Kelejian
and Prucha, 2003, page 4):
y
N
=(I
N
??M
N
)
?1
X
N
? +(I
N
??M
N
)
?1
(I
N
??W
N
)
?1
?
N
. (2.2.8)
Under the assumption that the vector?
N
is normally distributed with
?
N
?N (0
N?1
,?
2
I
N
), the likelihood function is:
ln(L)=?
N
2
ln(2?)?
1
2
ln|?
N
| (2.2.9)
?
1
2
?
y
N
?(I
N
??M
N
)
?1
X
N
?
?
0
?
?1
N
?
y
N
?(I
N
??M
N
)
?1
X
N
?
?
,
where?
N
is the variance covariance matrix of the disturbancesu
N
given by
?
N
= ?
2
(I
N
??M
N
)
?1
(I
N
??W
N
)
?1
(I
N
??M
0
N
)
?1
(I
N
??W
0
N
)
?1
.
(2.2.10)
The least squares procedure applied directly to equation (2.2.7) is inconsistent
due to correlation of y
it,N
and u
it,N
. However, there are instrumental variables
(IV) procedures that are consistent. Observe that for the current model (see Das,
Kelejian and Prucha, 2003, page 7):
E(y
N
)=(I
N
??M
N
)
?1
X
N
? =
?
X
k=0
?
k
W
k
N
X
N
?, (2.2.11)
42
andhenceidealinstrumentsarecombinationsofmatricesX
N
?,W
N
X
N
?,W
2
N
X
N
?,
etc. KelejianandPrucha(1998)showthatanIVestimatorthatusesatleastthelin-
early independent columns ofX
N
,W
N
X
N
,W
2
N
X
N
asinstrumentsisconsistent
and asymptotically normal.
The spatial autocorrelation parameter ? can then be estimated with the spatial
generalized moments (spatial GM) procedure, suggested by Kelejian and Prucha
(1999). Denote bu
N
the estimated disturbances based on an initial consistent esti-
mator. Let
v
1,N
?
?,?
2
?
= N
?1
?
I
N
??W
N
bu
N
?
0
?
I
N
??W
N
bu
N
?
??
2
, (2.2.12)
v
2,N
?
?,?
2
?
= N
?1
?
I
N
??W
2
N
bu
N
?
0
?
I
N
??W
2
N
bu
N
?
??
2
N
?1
tr(W
0
N
W
N
),
v
3,N
?
?,?
2
?
= N
?1
?
I
N
??W
2
N
bu
N
?
0
?
I
N
??W
N
bu
N
?
.
The spatial GM estimator is then defined as
?
b?,b?
2
?
=argmin
(
3
X
k=1
v
0
k,N
?
?,?
2
?
v
k,N
?
?,?
2
?
:
?
?,?
2
?
? [?a,a] ?
?
0,s
2
?
)
,
(2.2.13)
where a ? 1 and s
2
is the upper limit considered for ?
2
. Kelejian and Prucha
(1999) show that the spatial GM estimator is consistent. Kelejian and Prucha
(1998) also provide a proof that the spatial autoregressive parameter ? isa?nui-
sance? parameter in the sense that the feasible generalized spatial two stage least
squares (FGS2SLS) estimator has the same asymptotic distribution when it is
basedonaconsistent estimator of ? as when it is based on the true value. Ini-
43
tially, the asymptotic distribution of the spatial GM estimator was not determined.
As a result, tests for spatial autocorrelation had to be based on statistics such as
the Moran I. Kelejian and Prucha (2001) and Pinske and Slade (1998) provide as-
ymptotic distribution of the Moran I test statistics. The asymptotic distribution of
the spatial GM estimator was then derived for a more general model that includes
heteroscedastic disturbances in Kelejian and Prucha (2005).
2.3 Space-Time Models
Time and space is a key feature of almost all human activities. Their interaction
hasbeenstudiedinmanydisciplinesandhasreceivedsomeattentionineconomics
as well. Studies outside economics include many applications in geostatistics (see
e.g. Kyriakidis and Journel, 1999 for a review), geography but also in epidemi-
ology, medicine, crime prevention and others. Short overviews can be found in
Cressie (1991: 449-452) and Robinson (1998: 319-328).
In economics and econometrics, some interesting cases complementary to the
specification in the present thesis are, for example, generalized least squares test
to test for unit roots in panel data (although without deriving any asymptotic prop-
erties of the estimator) in O?Connell (1998), a two-step sieve least squares proce-
dure to estimate a panel vector autoregression (VAR) model with a nondiagonal
cross-sectional covariance matrix that is proportional to an observed economic
distance measure in Chen and Conley (2001) who look at asymptotics in the less
complicated case when the cross-sectional dimension is fixed, and, finally, Chang
(2002) who derives asymptotic properties of a univariate panel model with a gen-
44
eral unrestricted form of cross-sectional heterogeneity when the cross-sectional
dimension of the panel is also fixed.
In this thesis, I will analyze dynamic model that includes a spatial lag in the
disturbance process. This is a special case of the class of stochastic models known
as space-time autoregressive (space-time AR) models introduced by Cliff et al.
(1975) and generalized by Pfeifer and Deutsch (1980). More recent discussions
and applications of the space-time AR model in econometrics are Elhorst (2001),
while a generalization of the model to continuous space is proposed by Brown et
al. (2000).
Below I review papers that deal with this class of models in more detail. Note
that if contemporaneous correlation is present, the observable data become a non-
trivial transformation of the underlying random field, resulting in some technical
difficulties. Hence I first focus on specifications that do not allow for contempora-
neous correlation in the data but instead assume that spatial interactions act with
a time lag. In the second subsection I therefore present models that allow for such
complications.
45
2.3.1 Space-Time Autoregressive Moving Average
(STARMA) Models
Pfeifer and Deutsch (1980) were the first to propose a STARMA model. Their
general STARMA(p,q;?
1
,...,?
p
,m
1
,...,m
q
) model is:
y
it
=
p
X
k=1
?
k
X
l=0
?
kl
N
X
j=1
w
ij,l
y
j,t?k
(2.3.1)
?
q
X
k=1
m
k
X
l=0
?
kl
N
X
j=1
w
ij,l
?
j,t?k
+?
it
,
wherepis the autoregressive order,q is the moving average order,?
k
is the spatial
order of thek?thautoregressiveterm,m
k
isthespatialorderofthek?thmoving
average term, ?
kl
and ?
kl
are parameters and the errors are normally distributed
with E(?
it
)=0, E(?
it
?
j,s
)=?
2
for i = j and t = s, and E(?
it
?
js
)=0
otherwise.
The spatial weights have the usual interpretation (see the previous subsection)
and are assumed to be observable and the authors do not impose any restrictions
on their structure. Observe that in contrast to Cliff-Ord type model considered in
this thesis, their STARMA model does not allow for contemporaneous correlation
between spatial units, i.e. for example ?
it
depends on ?
j,t?1
but not on ?
jt
.Asa
result, the likelihood function does not involveaJacobianterminaformofade-
terminant of anN?N and, as a result, ML estimation is considerably simpler and
it is the estimation method suggested by Pfeifer and Deutsch. The authors derive
the likelihood function conditional on initial values of the process and note that
46
it is only appropriate for moderate or large T. However, the restrictions implied
by the model on the initial observations are not explicitly derived. The paper also
does not provide formal consistency or asymptotic normality results. Abraham
(1983) derives the likelihood function for the STARMA model.
Stoffer(1986) outlines different estimation procedure for a spatial STAR model
with missing values (spatial ARX in his terminology). The model combines the
time series parametrization of an autoregressive moving average process for miss-
ing and noisy data with a Cliff and Ord type spatial structure. The data generating
process is assumed to be a q-th order autoregressive process where the current
observation is influenced by q time lags of its spatial neighbors:
y
it
=
q
X
k=1
N
X
j=1
w
ij,k
?
kj
y
j,t?k
+x
0
it
? +?
it
, (2.3.2)
where the autoregressive parameters?
kj
are allowed to vary with spatial location.
The spatial weights w
ij,k
have the usual interpretation (e.g. they are inversely
related to a distance) and are allowed to be different at different time lags. The
p explanatory variables in x
it
are modelled as a stochastic process independent
of the innovations ?
it
and the data sample is observed for i =1,...,N and t =
1,...,T.
The estimates are solutions to approximated Yule-Walker equations. For ex-
ample, withno data problems,q =1and without explanatory variables, the model
47
can be written as
y
t
= W?y
t?1
+?
t
, (2.3.3)
wherey
t
=(y
1t
,...,y
Nt
)
0
,?
t
=(?
1t
,...,?
Nt
)
0
,Wis aN ?N matrix of the spatial
weights w
ij
and? = diag(?
1
,...,?
N
). The proposed estimator of?is then:
b
? = diag
?
W
?1
b
?
?1
b
?
?1
0
?
, (2.3.4)
where the estimated moments of the data are
b
?
0
=
T
X
t=2
y
t
y
0
t
, (2.3.5)
and
b
?
?1
=
T
X
t=1
y
t
y
0
t?1
. (2.3.6)
There are no formal asymptotic claims made for the procedure. The method is
illustrated with an application to fish catch data at five locations for 240 time peri-
ods suggesting that the implicit asymptotic consistency and normality claims are
for fixed spatial dimension N and increasing time dimension of the observations.
Pace et al. (1998) model spatial and temporal dependence in housing price data
in Fairfax County Virginia between 1961 and 1991. Unlike in standard STAR
models, it is not assumed that the autocorrelation in the dependent variable is
linearly separable in space and time. Instead an interaction of the space and time
48
lags is considered. In particular, the model is:
y
it
=
T
X
s=1
N
X
j=1
w
ij,ts
y
js
+x
0
it
?+
T
X
s=1
N
X
j=1
w
ij,ts
x
0
js
? +?
it
, (2.3.7)
where the observable weights w
ij,ts
relate observation across time and space si-
multaneously. It is assumed thatw
ij,ts
=0fors ?t, meaning that the current and
future values of y
js
and x
js
do not influence the process for y
it
.
Stacking w
ij,ts
into a NT ?NT matrix W, Pace et al. assume that
W = ?
s
S??
T
T+?
ST
ST+?
TS
TS, (2.3.8)
wheretheSandTmatricesareinterpretedas filtersinspaceandtimerespectively.
Their entries are related to the distance of the of the observation in space and time
respectively.
The main limitation of their approach is that it is assumed that there are no
concurrent observations and that only past observations have an effect. If the
matrix W is stacked so that the observations are sorted according to time, this
assumption implies that both T and S are strictly lower (or upper) diagonal. As
a result the model can be estimated by OLS. The paper does not provide formal
results and does not spell out assumptions on the disturbance process.
Giacomini and Granger (2004) show that the STARMA class of models can
bederivedasatransformationofvectorautoregressivemovingaverage(VARMA)
model, where the transformation is a restriction involving spatial weighting ma-
49
trices. When the number of locations is small, the model can be estimated by an
overparametrized VARMA specification. With increasing number of location, the
overparameterizedVARMAmodelhasalargenumberofinsignificantparameters.
Therefore, estimation can be improved in a Bayesian framework by incorporating
these as priors. Hence LeSage and Krivelyova (1999) propose a class of prior
distributions for a Bayesian VAR model that will approximately constrain the in-
significant parameters to zero.
2.3.2 Models with Contemporaneous Spatial Correlation
The papers cited in the above subsection did not allow for contemporaneous de-
pendence of the observations. When such interactions are included, the observa-
tion become a nonlinear transformation of the innovations and, as a result, maxi-
mum likelihood estimation is more difficult. We next review papers that allow for
such complications.
Congdon (1994) considers the spatiotemporal model of the following form:
y
it
= x
0
it
? +?
i
+u
it
, (2.3.9)
where t =1,...,T and i =1,...,N and the error term is both spatially and tem-
porally autocorrelated:
u
it
= ?u
i,t?1
+?
N
X
j=1
w
ij
u
jt
+?
it
. (2.3.10)
50
It is assumed thaty
i0
andx
i0
are known exogenous constants. The first step of the
proposed estimation procedure eliminates the individual effects ?
i
by subtracting
individual means y
i
and x
i
and estimating the slope coefficients? by OLS on
(y
it
?y
i
)=(x
it
?x
i
)
0
? +(v
it
?v
i
). (2.3.11)
In the second step, ? and ? are estimated by minimizing
g(?,?)=
N
X
i=1
T
X
t=1
?
y
?
it
?x
?0
it
b
?
OLS
?
2
, (2.3.12)
where
y
?
it
=(y
it
?y
i
)??(y
i,t?1
?y
i
)??
N
X
j=1
w
ij
?
y
jt
?y
j
?
, (2.3.13)
x
?
it
=(x
it
?x
i
)??(x
i,t?1
?x
i
)??
N
X
j=1
w
ij
(x
jt
?x
j
).
Based on Hordijk (1979), the transformation for the first time period is
y
?
1
=
?
(I??W)
0
(I??W)??
2
I
N
?
1/2
(y
1
?y), (2.3.14)
X
?
1
=
?
(I??W)
0
(I??W)??
2
I
N
?
1/2
?
X
1
?X
?
,
wherey
1
=(y
11
,...,y
1N
)
0
,y =(y
1
,...,y
N
)
0
,X
1
=(x
0
11
,...,x
0
1N
)
0
,X =(x
0
1
,...,x
0
N
)
0
and W is an N ?N matrix with elements w
ij
. The slope coefficients ? are esti-
51
matedbyOLSfrom
y
?
it
?
b
?,b?
?
= x
?
it
?
b
?,b?
?
0
?+?
it
. (2.3.15)
In the third step, the variance components ?
2
?
= Var(?
it
) and ?
2
?
= Var(?
i
)
are estimated, e.g.
b
?
2
?
=
1
NT
N
X
i=1
T
X
t=1
?
y
?
it
?x
?0
it
b
?
?
2
, (2.3.16)
where
b
? is from step 2.
11
The final step is a generalized least squares (GLS)
procedure to re-estimate?.
The paper contains outline and an application of the estimation procedure to
mortality rates in London but offers no formal proofs that would support the con-
sistency claims. The estimated GLS procedure is based on suggestion in Anselin
(1988), p.111.
Driscoll and Kraay (1995, 1998) Provide a proof of consistency and asymp-
totic normality of a GMM procedure based on a panel Newey and West (1987)
nonparametric heteroscedasticity and autocorrelation consistent (HAC) covari-
ance matrix estimator.
12
The limit is taken with respect to the time dimension
11
The expression for
c
?
2
?
in the paper is
c
?
2
?
=
1
N
N
X
i=1
n?
y
i
?
b
?y
i,?1
?b?Wy
?
?
b
?
?
x
i
?
b
?x
i,?1
?b?Wx
?o
2
?
c
?
2
?
T
.
This does not seem to have the correct dimensions.
12
The cross-sectional dimension of the data is collapsed by taking cross-sectional averages.
Hence this is not a complete generalization of the HAC estimation to a panel setting.
52
of the data. Their specification requires that the data is an ?-mixing random field
of the same size as the number of moment restrictions and hence places only weak
restrictions on the form of spatial and temporal correlations.
They consider r orthogonality conditions E[h
it
(z
it
,?)] = 0,wherez
it
, i =
1,...,N, t =1,...,T is data and ? is a vector of parameters. The restrictions are
assumedtoidentifytheparameters. Their GMM estimator is
b
?
T
=argmin
?
("
1
T
T
X
t=1
h
t
(?,z
t
)
#
0
b
S
?1
T
"
1
T
T
X
t=1
h
t
(?,z
t
)
#)
, (2.3.17)
where z
t
=(z
1t
,...,z
Nt
)
0
, h
t
(?,z
t
)=N
?1
P
N
i=1
h
it
(z
it
,?),and
b
S
T
is the stan-
dardHACestimatorappliedtothesequenceofcross-sectionalaveragesofh
it
(z
it
,?).
BronnenbergandMahajan(2001) Estimateamodelofretailersbehaviorwhere
the market shares are related to marketing variables. Their model is
y
it
= ?
0
+x
0
it
? +?
i
+u
it
, (2.3.18)
where the disturbances are composed of innovations autocorrelated in time and
individual effects autocorrelated in space:
?
i
= ?
N
X
j=1
w
ij
?
j
+?
i
, (2.3.19)
u
it
= ?
1
u
i,t?1
+v
it
.
53
The explanatory variables are also modelled as a stochastic process based on the
same individual effects ?
i
,withthej ?th explanatory variable x
j,it
specified as
x
j,it
= ?
jt
+?
j
?
i
+?
j,it
, (2.3.20)
where
?
j,it
= ?
2j
?
j,i,t?1
+?
j
?
t
+?
j,it
. (2.3.21)
The model is estimated by Maximum Likelihood under the assumption that
the innovations ?
i
,?
i
,v
it
,?
j,it
are all jointly normally distributed.
Elhorst (2001) derives a likelihood function for a STAR(1,1) model where he
also allows for contemporaneous spatial lags. His general model is
y
it
= ?y
i,t?1
+?
0
N
X
j=1
w
ij
y
jt
+?
1
N
X
j=1
w
ij
y
jt,t?1
(2.3.22)
+?
1
x
it
+?
2
x
i,t?1
+?
3
N
X
j=1
w
ij
x
it
+?
4
N
X
j=1
w
ij
x
i,t?1
+u
it
.
The likelihood is derived under the assumption that the disturbances u
it
are nor-
mally distributed with E(u
it
)=0, E(u
2
it
)=?
2
and E(u
it
u
sj
)=0if t 6= s or
i 6= j. The paper assumes that the matrix of the spatial weights W =(w
ij
) has
zeros on the diagonal and that the spatial autoregressive parameter ? is bounded
by the inverse of the largest and smallest eigenvalue of W. It is also implicitly
assumed that the matrixWis symmetric and that the model is dynamically stable
54
(this places a nontrivial condition on the parameters ? and ?
0
).
13
The likelihood
is not conditionalized on the initial values but instead it is assumed that the initial
observations are draws from the stationary distribution of the process.
Kapoor et al. (2005) extend the GM estimator of Kelejian and Prucha to a
panel data. The contribution of thesis relative to Kapoor et al. (2005) is to allow
for autocorrelation in the time dimension as well. Their specification is
y
it,N
= x
0
it,N
?+u
it,N
, (2.3.23)
where the disturbances are an SAR(1) process with individual effects:
u
it
= ?
N
X
j=1
w
ij
u
ij,t
+?
i
+?
it
. (2.3.24)
ThepaperprovidesformalconsistencyproofofthespatialGMestimator(with
alternative weighting schemes) of ?, as well as asymptotic normality of a gener-
alized least squares (GLS) estimator of?.
Baltagi et al. (2003) derive formulae for various Lagrange multiplier tests in
a model that includes spatially correlated disturbances. The paper also provides
experimental evidence of their performance in small samples. They consider the
following model:
y
it
= x
0
it
? +?
i
+u
it
, (2.3.25)
13
Such condition could be, for example |?| + |?
0
|??
max
(W) < 1,where?
max
is the largest
(in abolute value) eigenvalue of the matrixW that consists of the spatial weights w
ij
.
55
with the disturbances being an SAR(1) process:
u
it
= ?
N
X
j=1
w
ij
u
jt
+?
it
. (2.3.26)
Observe that when the spatial lag does not operate on the individual effects,
this specification implies that the covariance between y
it
and y
js
is zero for i 6= j
andt 6= s.ThisisincontrasttothespecificationinKapooretal. (2005), wherethe
individual effects are spatially correlated and, as a result, the covariance among
y
it
and y
js
is nonzero for all values of i,j,t and s.
Korniotis (2005) Building on work of Hahn and Kurstiener (2002), Korniotis
(2005) considers a bias corrected OLS estimator in a dynamic panel data model
that also includes spatial lag of the dependent variable. The specification is
y
it
= ?y
i,t?1
+?
1
N
X
j=1
w
ij
y
jt
+?
2
N
X
j=1
w
ij
y
j,t?1
+x
0
it
? +?
i
+?
it
. (2.3.27)
where the disturbances are independent in the time dimension but are allowed to
havearbitrarycovariancematrix(constantovertime)inthecross-sectionaldimen-
sion. ThepapergivestheasymptoticformulasforthebiasesoftheOLSestimators
when both N and T simultaneously approach infinity.
56
Yang(2005) extendstheproofsofasymptoticnormalityinLee(2004)toastatic
panel data model with random individual and fixed time effects. His model is
y
it
= x
it
? +?
t
+?
i
+u
it
, (2.3.28)
where the disturbances u
it
are an SAR(1) process, i.e.:
u
it
= ?
N
X
j=1
w
ij
u
ij,t
+?
it
. (2.3.29)
The QML function is derived under the assumption that {?
it
} and {?
i
} are mu-
tually independent and identically distributed random variables with finite 4+?
moments for some ?>0.
57
3Model
In this chapter I specify the model and provide a discussion of the maintained
assumptions. It proves to be useful to restate the following notational conventions
and definitions: I use bold letters for matrices and vectors, and regular font letters
to denote scalars. Furthermore, I use lower case letters for vectors and upper case
letters for matrices. Let (A
N
)
NnullN
be some sequence of Np?Npmatrices where
p ? 1 is some fixed positive integer. I denote the (i,j)-th element as a
ij,N
.I
say that the row and column sums of the sequence of matrices A
N
are uniformly
bounded in absolute value if there exists a positive finite constant c independent
of N such that
max
1?i?Np
Np
X
j=1
|a
ij,N
| ?cand max
1?i?Np
Np
X
i=1
|a
ij,N
| ?c. (3.0.1)
For future reference, I note that any finite sum and/or product of matrices with
row and column sums uniformly bounded in absolute value will also have row
and column sums uniformly bounded in absolute value; see Kelejian and Prucha
(2004). As a consequence, ifBis a matrix of constants with fixed dimensions and
A
N
isasequenceofmatriceswithrowand column sums uniformly bounded in
absolute value, then the sequence of matrices (B?A
N
) will also have row and
column sums uniformly bounded in absolute value.
58
3.1 Model Specification
Consider the following dynamic panel data model (1 ?i?N, 1 ?t?T):
y
it,N
= ?y
i,t?1,N
+x
it,N
? +u
it,N
, (3.1.1)
where y
it,N
and x
it,N
denote the (scalar) dependent variable and the 1 ?p vector
of exogenous variables corresponding to cross sectional unitiin periodt,?and?
represent corresponding 1?1 andp?1 parameters, andu
it,N
denotes the overall
disturbance term.
In contrast to the existing dynamic panel data literature I do not assume that
the disturbancesu
it,N
are cross-sectionally uncorrelated and I consider potentially
heteroscedasticerrors. GiventhefactthatIwillderiveasymptoticpropertiesofthe
model whenthe cross-sectional dimension tends to infinity, the cross-sectional co-
variance structure will be parametrized with a finite number of parameters. In par-
ticular, I assume that the disturbancesu
it,N
follow a spatial autoregressive process
in the form of:
u
it,N
= ?
N
X
j=1
w
ij,N
u
jt,N
+?
it,N
, (3.1.2)
where the overall disturbance u
it,N
consists of a spatial lag of contemporaneous
disturbances in other cross-sections and an innovation ?
it,N
.
Anselin (1988) refersto this model as a first order spatial autoregressivemodel
or SAR(1). See the previous chapter for more detailed discussion of such spec-
ification. The process for the disturbances contains one parameter ? and N
2
ob-
59
servable spatial weights w
ij,N
.The?
it,N
are the innovations that enter the spatial
process. They are allowed to be correlated over time and I assume that they have
the following error component structure:
?
it,N
= ?
i,N
+?
it,N
, (3.1.3)
where ?
i,N
are unit specific error components, and ?
it,N
are the error components
that vary both over cross-sectional units and time periods.
The spatial weights, as well as the endogenous, exogenous and disturbance
processes are all allowed to depend on the sample size, i.e., to depend on N.
Observe that even if the innovations ?
it,N
did not depend on the sample size, the
disturbances u
it,N
would still have to be indexed by the sample size due to the
presence of the spatial lag ?
P
N
j=1
w
ij,N
u
jt,N
in (3.1.2).
14
Stacking across units the model becomes (1 ?t?T)
y
t,N
N?1
= ?y
t?1,N
N?1
+X
t,N
N?p
?
p?1
+u
t,N
N?1
, (3.1.4)
u
t,N
N?1
= ?W
N
N?N
u
t,N
N?1
+?
t,N
N?1
,
where
?
t,N
N?1
= ?
N
N?1
+?
t,N
N?1
, (3.1.5)
14
TheN?1vectorofdisturbancesu
t,N
isgivenbyu
t,N
=(I
N
??W
N
)
?1
?
t,N
(seeequation
3.2.1). Note that the elements of (I
N
??W
N
)
?1
must depend on the sample size N. This would
be true even if the elements w
ij,N
did not depend on the sample size.
60
and
y
t,N
=
?
?
?
?
?
?
y
1t,N
.
.
.
y
Nt,N
?
?
?
?
?
?
N?1
, X
t,N
=
?
?
?
?
?
?
x
1t,N
.
.
.
x
Nt,N
?
?
?
?
?
?
N?p
, (3.1.6)
u
t,N
=
?
?
?
?
?
?
u
1t,N
.
.
.
u
Nt,N
?
?
?
?
?
?
N?1
, ?
N
=
?
?
?
?
?
?
?
1,N
.
.
.
?
N,N
?
?
?
?
?
?
N?1
,
?
t,N
=
?
?
?
?
?
?
?
1t,N
.
.
.
?
Nt,N
?
?
?
?
?
?
N?1
, W
N
=
?
?
?
?
?
?
w
11,N
??? w
1N,N
.
.
.
.
.
.
.
.
.
w
N1,N
??? w
NN,N
?
?
?
?
?
?
N?N
.
In all of the ensuing discussionT is fixed and N ?? . I maintain the follow-
ing assumptions:
Assumption 1 For each N>1 the innovations {?
it,N
:1?i?N,t?T} are
independentlydistributed, withzeromean, constantvariance?
2
?,N
with 0 <?
2
?,N
<
b
?
< ? . Furthermore, the innovations have finite absolute moments of order
4+?
?
for some ?
?
> 0 and those moments are uniformly bounded by some finite
constant.
Assumption 2 For each N>1 the individual effects
?
?
i,N
:1?i?N
?
are
independently distributed, with zero mean, and are independent of the innova-
tions {?
it,N
:1?i?N,t?T}. Furthermore, the individual effects have con-
stant variance ?
2
?,N
with 0 <?
2
?,N
<b
?
< ? and finite absolute moments of
61
order 4+?
?
for some ?
?
> 0 and those moments are uniformly bounded by some
finite constant.
Assumption 3 The nonstochastic matrix W
N
has the following properties:
(a) All diagonal elements of W
N
are zero.
(b) The true parameter?satisfies |?| < 1; the matrixI
N
?rW
N
is nonsingular
for all |r| < 1.
(c) TherowandcolumnsumsofW
N
andP
N
(?)=(I
N
??W
N
)
?1
arebounded
uniformly in absolute value by, respectively, k
W
< ? and k
P
< ? where
k
P
may depend on ?.
It will be shown in the next section that the following assumption will guaran-
tee that the variances of the disturbances u
it,N
are bounded away from zero:
Assumption 4
?
min
(P
N
P
0
N
) ?c
P
> 0
for some c
P
where c
P
may depend on ?.
The analysis is conditionalized on the realized values of the exogenous vari-
ables and I henceforth view them as constants. I make the following assumptions
on the exogenous variables:
Assumption 5 (a) The matrix of exogenous (nonstochastic) regressors X
t,N
,
t?T, has a full column rank (for N sufficiently large).
62
(b) The elements of X
t,N
are uniformly bounded in absolute value.
I complete the model by specifying a process that generates the initial obser-
vation of the dependent variable:
Assumption 6 The model defined in (3.1.4) is dynamically stable, i.e., |?| < 1,
and has been in operation for an infinite period of time.
15
The error specification adopted in this thesis corresponds to that of a classi-
cal one-way error component model, see e.g. Baltagi (1995, pp. 9). It is also a
generalization of the literature on dynamic panel data models with independent
innovations. Notice that with ? =0,myspecification becomes, for example,
that of Arellano and Bond (1991), Ahn and Schmidt (1995), Arellano and Bover
(1995), Blundell and Bond (1998),
16
or Anderson and Hsiao (1981 and 1981),
case IVb.
17
Finally, note that the same error component specification of the dis-
turbanceprocesswasadoptedinKapooretal. (2005), whoconsiderrandomeffect
specification in the context of a static panel data model.
3.2 Model Implications
I examine the asymptotic properties of the proposed estimation procedure when
the time dimension of the panel is fixed. I assume slope homogeneity of the
autoregressive parameters (? does not have an i subscript)
18
and I also assume
15
Note that Assumptions 1 and 2 have been consistently specified to hold for ?? <t? T.
16
In these papers the exogenous variables are allowed to be stochastic and either strictly exoge-
nous or predetermined while in this thesis I treat the exogenous variables as nonstochastic.
17
Anderson and Hsiao do not include exogenous variables in their specification.
18
Note that heterogenous slope coefficients cannot be consistently estimated with a fixed num-
ber of observations in the time dimension of the panel.
63
that the spatial weighting matrices are constant over time.
19
In the rest of this
section I explore some implications of the maintained assumptions. Proofs of the
claims made in this section are in the Appendix B.
Assumption1isastandardrestrictionfor asymptotic results. I do not assume
that the innovations are identically distributed and hence a stronger requirements
on the existence of moments is necessary. Assumption 2 is a random effects as-
sumptionthatwillbeusedtoproveexistenceofasymptoticdistributionofmoment
conditionsthatinvolvelevelsoflaggedendogenousvariables. Iconjecturethatthe
estimationproceduresuggestedinthisthesis remains valid also when the individ-
ual effects are fixed (?
2
?
=0). However, the proofs would have to be modified
20
and hence I choose to concentrate on the random effects case.
Assumption 3(a) is a normalization of the model that also implies that no
cross-section is viewed as its own neighbor. Assumption 3(b) implies that the
system in (3.1.4) is complete in that it defines endogenous variables in terms of
exogenous variables and innovations. In particular, from Assumption 3(b) it fol-
lows that
u
t,N
N?1
= P
N
N?N
?
t,N
N?1
. (3.2.1)
Furthermore, we can eliminate lagged dependent variables by backward substitu-
tion and express the model as a function of lagged disturbance terms and lagged
19
If the spatial weighting matrices were not constant over time, then first differencing would not
remove the individual effects.
20
I apply central limits theorems to a vector of random variables that includes the individual
effects. Hence it is required that ?
2
?
> 0.Inthefixed effects case, the central limit theorems
would be applied to a vector of random variables that excludes ?
N
. Observe that the sequence of
vectors ?
N
would in this case be required to satisfy some regularity condition such as Assumption
A2 in Appendix A.
64
explanatory variables. From (3.1.4), we have that for 1 ?t?T
y
t,N
= ?y
t?1,N
+X
t,N
?+u
t,N
(3.2.2)
= ?[?y
t?2,N
+X
t?1,N
?+u
t?1,N
]+X
t,N
? +u
t,N
.
.
.
=
t?1
X
j=0
?
j
[X
t?j,N
? +u
t?j,N
]+?
t
y
0,N
=
t?1
X
j=0
?
j
[X
t?j,N
? +P
N
?
t?j,N
]+?
t
y
0,N
,
and hencey
t,N
isawelldefined transformation of the innovations?
t,N
, the initial
values of the processy
0,N
, and the exogenous variables X
t,N
.
Assumption 3(c) restricts the degree of permissible cross-sectional correlation
in the sample. Note that some restriction on the correlation is necessary for any
largesampleresultstohold.Inpractice in the spatial literature, with T fixed and
N ?? , it is often assumed that each cross-sectional unit has a finite number of
neighbors, or that the rows of the weight matrices are normalized to sum to unity.
It is also often the case that although the matrices may not be sparse, the weights
are proportional to an inverse of some distance measure. Therefore, under reason-
able conditions, the weight matrices will have row and column sums uniformly
bounded in absolute value.
Assumption 4 rules out degenerate weighting matrices that would imply zero
variance of the disturbances u
t,N
. Observe that from Assumption 3, we have
u
t,N
= P
N
(?
N
+?
t,N
) and hence the variance covariance matrix of the distur-
65
bances u
t,N
is
VC(u
t,N
)=
?
?
2
?,N
+?
2
?,N
?
P
N
P
0
N
. (3.2.3)
In particular, notice that each diagonal element of VC(u
t,N
) is bounded from
below by the smallest eigenvalue
21
and hence the assumption implies that each
u
it,N
hasvarianceboundedawayfromzero. Inamodelwithoutspatialcorrelation,
P
N
= I
N
and this Assumption is trivially satisfied.
Assumption 5 is an exogeneity assumption of explanatory variables. Finally,
under Assumption 6, together with the assumptions on the exogenous variables
and the spatial weighting matrix, we have by backward substitution:
y
0,N
=
?
X
j=0
?
j
?
X
?j,N
? +u
?j,N
?
(3.2.4)
=
?
X
j=0
?
j
?
X
?j,N
?+P
N
?
?j,N
?
+(1??)
?1
P
N
?
N
.
Hence y
0,N
is a random variable that in general depends on N with mean that
is not necessarily equal to zero. Notice that {u
it,N
:1?i?N,?? <t? 0} is
a transformation of {?
it,N
:1?i?N,?? <t? 0} and
?
?
i,N
:1?i?N
?
.
Therefore, by Assumptions 1 and 2, the array {y
i0,N
:1?i?N} is independent
of {?
it,N
:1?i?N,1 ?t?T}. Furthermore, given Assumptions 5 and 6 it
also has finite absolute moments of order 4+?
y
o
for some ?
y
o
> 0 and those
moments are uniformly bounded by some finite constant (see the appendix for a
21
See e.g. Lemma 2 in Kelejian and Prucha (2003).
66
proof).
22
For future reference, I note that the variance of y
0,N
is
VC(y
0,N
)=
?
?
2
?,N
1??
2
+
?
2
?,N
(1??)
2
?
P
N
P
0
N
. (3.2.5)
22
Similarly, it can be shown that the stochastic process y
it,N
has finite absolute moments of
order 4+?
y
for some ?
y
> 0 and that those moments are uniformly bounded by some finite
constant. The proof of this claim is also in the appendix.
67
4 Estimation and Inference
This chapter will present the key results of the thesis. I present a procedure to es-
timate the parameters of the model outlinedinChapter3andderiveitsasymptotic
properties. The proposed estimation method consists of three steps. In the first
step, I propose to use an instrumental variables (IV) estimator of the slope coef-
ficients ? and ? without efficiently accounting for the spatial correlation of the
disturbances.
23
In the second step of the estimation, the estimated disturbances
from the first stage are utilized in a spatial generalized moments (GM) estimator
to estimate the degree of spatial autocorrelation in the disturbances (?). In the last
step of the procedure, I propose a GMM estimator of ? and ? with an optimal
weighting of the moments that is based on the initial estimators.
For expositional purposes, I choose to present for the first stage an IV estima-
tor that uses a simple set of instruments due to Anderson and Hsiao (1981). Ob-
serve, however, that the results on the third stage generalized method of moments
(GMM) estimators presented subsequently are sufficiently general to guarantee
consistency of IV estimators that use an extended set of instruments, such as the
one in Arellano and Bond (1991).
4.1 Initial IV Estimation
In this section I propose a simple estimation procedure to estimate the parameters
? =[?,?
0
]
0
ofthe model(3.1.1) anddemonstrate that themethodisconsistent and
23
I do not account for the spatial correlation in formulating the initial IV estimator. However, it
is taken into account in the analysis of its properties.
68
asymptotically normal. Since the model contains individual effects, these cannot
be consistently estimated with fixed T. Hence the model is considered after a
transformation that removes the individual effects from the dependent variable. I
followtheliteratureondynamicpanelsandusefirstdifferences. Notethatitwould
also be possible to use other transformations such as central differences. I use
moment conditions based on the fact that the first difference of the disturbances is
uncorrelatedwiththeleveloftheendogenousvariablelaggedtwice(ormore).
24
In
particular, the estimator corresponds to the one suggested by Anderson and Hsiao
(1982). Inspection of the proofs reveals that the random effects Assumption 2 is
not strictly necessary for the initial estimator to work.
25
I write the model in first differences as (t =2,...,T):
?y
t,N
N?1
= ?
1?1
?y
t?1,N
N?1
+?X
t,N
N?p
?
p?1
+?u
t,N
N?1
, (4.1.1)
where?is the first difference operator and, in particular,?y
t,N
= y
t,N
?y
t?1,N
,
?X
t,N
= X
t,N
?X
t?1,N
and?u
t,N
= u
t,N
?u
t?1,N
.
Stacking the observationsovertimeyields
?y
N
(T?1)N?1
= ?Z
N
(T?1)N?(1+p)
?
(1+p)?1
+ ?u
N
(T?1)N?1
, (4.1.2)
24
This claim is formally proved in Lemma 2.
25
Note that it is not the case that no assumption has to be made on the individual effects, as is
often claimed in the literature. Since the lagged endogenous variable is used as an instrument, one
stillneedtomaintainthattheindividualeffectsareuncorrelatedwiththeidiosyncraticdisturbances
and satisfy certain moment restrictions as well. This of would of course be satisfied if we view the
individual effects as constants.
69
where
?Z
N
(T?1)N?p+1
=
"
?y
?1,N
(T?1)N?1
, ?X
N
(T?1)N?p
#
(4.1.3)
and
26
?y
N
=
?
?
?
?
?
?
?y
2,N
.
.
.
?y
T,N
?
?
?
?
?
?
(T?1)N?1
, ?y
?1,N
=
?
?
?
?
?
?
?y
1,N
.
.
.
?y
T?1,N
?
?
?
?
?
?
(T?1)N?1
,(4.1.4)
?X
N
=
?
?
?
?
?
?
?X
2,N
.
.
.
?X
T,N
?
?
?
?
?
?
(T?1)N?p
, ?u
N
=
?
?
?
?
?
?
?u
2,N
.
.
.
?u
T,N
?
?
?
?
?
?
(T?1)N?1
.
Since ?y
t?1,N
is correlated with ?u
t,N
the ordinary least squares estimator
for ? from the above model will generally be inconsistent. However, the level
of the dependent variable lagged twice (or more) will not be correlated with the
disturbances?u
t,N
. Motivated by this, I define an instrument matrix
H
t,N
N?(1+p)
=[y
t?2,N
N?1
,?X
t,N
N?p
]. (4.1.5)
Given the model assumptions we have, as demonstrated in Lemma 2 below:
E
?
H
0
t,N
(1+p)?N
?u
t,N
N?1
!
= 0
(1+p)?1
,t=2,...,T. (4.1.6)
26
Note that most of the dynamic panel data literature stacks the data by first collecting the T
observations of each unit in a vector and then stacks those N vectors. The grouping used in this
paper is more convenient for modelling spatial correlation via (3.1.2).
70
The initial IV estimator of ? utilizes H
t,N
as instruments
27
for ?y
t?1,N
and is
defined as
b
?
N
(1+p)?1
=
h
?
b
Z
0
N
?Z
N
i
?1
(1+p)?(1+p)
?
b
Z
0
N
(1+p)?(T?1)N
?y
N
(T?1)N?1
, (4.1.7)
where
?
b
Z
N
(T?1)N?(1+p)
= H
N
(H
0
N
H
N
)
?1
H
0
N
(T?1)N?(T?1)N
? ?Z
N
(T?1)N?(1+p)
, (4.1.8)
and
H
N
(T?1)N?(1+p)
=
?
?
?
?
?
?
H
2,N
.
.
.
H
T,N
?
?
?
?
?
?
. (4.1.9)
is a (T ?1)N ? (p+1)matrix of instruments.
28
TheinitialAndersonandHsiaoIVestimatorisaspecialcaseofamoregeneral
GMM estimator discussed in Section 4.3. However, for expositional purposes
I derive its asymptotic properties here. Substituting in the definition of the IV
27
We note that it is possible to use additional lags and/or levels of the dependent variable as
instruments and obtain a consistent initial estimator as well. For example, we could use the instru-
ments suggested in Section 4.3, i.e. H
t
=[y
t?2,N
,...,y
0,N
,X
t,N
,...,X
1,N
].
28
Writing the instruments in this fashion leads to an estimator that is based on moment con-
ditions that are averaged both over N and T. It is also possible to define the H
N
matrix as
H
N
= diag(H
2
,..,H
T
), and the moment conditions are then only averaged over N.Inthiscase
the expressions in Lemmas 1 and 2 have to be modified. Note that these two specifications of the
instrument matrix lead to different estimators. The projection matrixH
N
(H
0
N
H
N
)
?1
H
0
N
in the
first case has elements in the form H
t,N
?
P
T
s=2
H
0
s,N
H
s,N
?
?1
H
0
t,N
while in the second case
they are H
t,N
?
H
0
t,N
H
t,N
?
?1
H
0
t,N
. The case of estimators based on moments averaged only
over T will be considered in Section 4.2 below.
71
estimator in equation (4.1.7) yields
b
?
N
= ?+
h
?
b
Z
0
N
?Z
N
i
?1
?
b
Z
0
N
?u
N
(4.1.10)
= ?+
h
?Z
0
N
H
N
(H
0
N
H
N
)
?1
H
0
N
?Z
N
i
?1
?Z
0
N
H
N
(H
0
N
H
N
)
?1
H
0
N
?u
N
.
For the instruments to be valid, I make the following assumption.
Assumption IV1 The matrix
M
H?Z
(1+p)?(1+p)
= plim
1
(T ?1)N
H
0
N
(1+p)?(T?1)N
?Z
N
(T?1)N?(1+p)
,
exist and is finite with full column rank. The matrix
M
HH
(1+p)?(1+p)
= plim
1
(T ?1)N
H
0
N
(1+p)?(T?1)N
H
N
(T?1)N?(1+p)
,
exists and is nonsingular.
We can also define
M
?Z
(1+p)?(1+p)
= plim
1
(T ?1)N
?
b
Z
N
(1+p)?(T?1)N
?Z
N
(T?1)N?(1+p)
, (4.1.11)
Observe that ?
b
Z
0
N
?Z
N
=?Z
0
N
H
N
(H
0
N
H
N
)
?1
H
N
?Z
N
and hence M
?Z
=
M
H?Z
M
?1
HH
M
H?Z
. Assumption IV1 thus implies that M
?Z
exists and is fi-
nite. Also note that the assumption that the M matrices are finite can be de-
72
rived from earlier restrictions
29
. However, the existence and invertability ofM
?Z
and M
HH
is not guaranteed by Assumptions 1-6.
30
Observe that one could de-
rive Assumption IV1 from existence and nonsingularity of the limits such as
lim(TN)
?1
P
T
t=j+1
X
0
t?j,N
X
t,N
.
To derive the asymptotic distribution of
b
?
N
, I note that given Assumption IV1,
it remains be to shown that the term H
0
N
?u
N
converges in distribution (when
appropriately normalized). It will prove convenient to introduce the following
additional notation for lagged exogenous variables
X
?2,N
(T?1)N?p
=
?
0
p?N
,X
0
1,N
,..,X
0
T?2,N
?
0
, (4.1.12)
the vector collecting all of the model orthogonal innovations
?
N
(T+2)N?1
=
?
?
0
N
,?
0
N
,?
0
1,N
,...,?
0
T,N
?
0
, (4.1.13)
with?
N
=
P
?
j=0
?
j
?
?j,N
,anda(T ?1)?T differenceoperatorDanda(T ?1)?
29
For example, the elements of M
HH
consist of first and second moments of the stochastic
process y
it
interacted with the exogenous variables. These are bounded by Assumptions 1-6.
30
For example, Arrelano (1989) examines a univariate AR(1) model with first-order autoregres-
sive exogenous variables, and finds that when the first differences of endogenous variables lagged
twice are used as instruments, there exists a significant range of parameters for which there is
a singularity point in the estimator. The paper also suggests that the estimator that uses second
lags of the levels of the endogenous variables does not have the singularity problem for a reason-
able range of parameters. However, this conclusion does not readily generalize for all possible
exogenous variables.
73
(T ?1) matrix?
D
(T?1)?T
=
?
?
?
?
?
?
?
?
?
?11 0??? 0
0 ?11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
0 ??? 0 ?11
?
?
?
?
?
?
?
?
?
, ?
(T?1)?(T?1)
=
?
?
?
?
?
?
?
?
?
1 ? ??? ?
T?2
01
.
.
.
.
.
.
.
.
.
?
0 ??? 01
?
?
?
?
?
?
?
?
?
.
(4.1.14)
Observe that given Assumptions 1, 2, and 6, the variance covariance matrix of
?
N
is
E(?
N
?
0
N
)
(T+2)N?(T+2)N
=
?
?
?,N
(T+2)?(T+2)
?I
N
!
, (4.1.15)
where the (T +2)? (T +2)diagonal matrix?
?,N
is
?
?,N
(T+2)?(T+2)
= diag
?
?
2
?,N
,
?
2
?,N
1??
,?
2
?,N
,...,?
2
?,N
?
. (4.1.16)
IfirstexpresstheelementsofH
0
N
?u
N
(whicharey
0
?2,N
?u
N
and?X
0
N
?u
N
)
in terms of lagged model disturbances and dependent variables:
Lemma 1 Under the specification (3.1.4) with Assumptions 1-6 and IV1 we have
that
y
0
?2,N
?u
N
= f
0
N
(I
T+2
?P
N
)?
N
+?
0
N
(F?P
0
N
P
N
)?
N
,
74
where the (T +2)N ? 1 vector f
N
is given by
f
N
=
?
?
?
?
?
?
0
2?(T?1)
D
0
T?(T?1)
?
?
?
?I
N
?
?
?
(?
0
?I
N
)
(T?1)N?(T?1)N
?
?
?
?
?
?
?
?
?
?
X
?2,N
(T?1)N?p
?
p?1
+
?
?
?
E(y
0,N
)
N?1
0
(T?2)N?1
?
?
?
(T?1)N?1
?
?
?
?
?
?
?
?
?
?
and the T +2?T +2matrix Fis
F
(T+2)?(T+2)
=
?
?
?
?
?
?
?
?
?
1
1??
1
1?(T?2)
1
1?1
0
1?(T?2)
0
(T?2)?1
I
T?2
0
2?1
0
2?(T?2)
?
?
?
?
?
?
?
?
?
(T+2)?(T?1)
?
(T?1)?(T?1)
?
0
(T?1)?2
,D
?
(T?1)?(T+2)
.
Furthermore?X
0
N
?u
N
can also be expressed as a linear function of?
N
:
?X
0
N
?u
N
= ?X
0
N
??
0
(T?1)?2
,D
?
?P
N
?
?
N
.
Proof. See the Appendix C.1.
Notice that as indicated by the subscript, the size of the f
N
vector depends on
the sample size. Since T is fixed, I do not use subscripts for matrices F and D
whose size and elements only depend on T and not on N.
To determine the asymptotic variance of the estimator, I will make use of
the following Lemma that gives an expression for expected value and variance
covariance matrix of the moment conditions:
75
Lemma 2 Suppose Assumptions 1-6 hold. The expected value of the vector of
quadratic forms H
0
N
?u
N
is zero. Its variance covariance matrix is given by
V
N
(1+p)?(1+p)
= E(H
0
N
?u
N
?u
0
N
H
N
)
= S
0
N
(1+p)?(T+2)N
(?
?,N
?P
N
P
0
N
)
(T+2)N?(T+2)N
S
N
(T+2)N?(1+p)
+
?
?
?
?
N
0
1?p
0
p?1
0
p?p
?
?
?
,
where
S
N
=
?
?
f
N
(T+2)N?1
,
"
?
0
(T?1)?2
,D
?
0
(T?1)?(T+2)
?I
N
#
(T?1)N?(T+2)N
?X
N
(T?1)N?p
?
?
(T+2)N?(1+p)
,
and
?
N
=2tr
?
F
S
?
?,N
F
S
?
?,N
?
?tr(P
N
P
0
N
P
N
P
0
N
),
with F
S
=
1
2
(F+F
0
).
Proof. See the Appendix C.1.
To rule out cases where the moment conditions have zero asymptotic variance,
I make the following assumption:
Assumption IV2 The smallest eigenvalue of [(T ?1)N]
?1
S
0
N
S
N
is uniformly
bounded away from zero for T ? 2.
Although S
N
depends on the sample size, the dimensions of S
0
N
S
N
do not
changewithN.Furthermore,noticethattheassumptionalsoimpliesthatE(H
0
N
H
N
)
76
has eigenvalues uniformly bounded away from zero and, therefore, also implies
the invertability of M
HH
in Assumption IV1.
31
The above Assumption together
with Assumption 4 allows us to prove the following Lemma:
Lemma 3 Suppose Assumptions 1-4 and IV2 hold. The smallest eigenvalue of
[(T ?1)N]
?1
V
N
is uniformly bounded away from zero for T ? 2.
Proof. See the Appendix C.1.
The representation of y
0
?2,N
?u
N
and?X
0
N
?u
N
as linear-quadratic forms in
?
N
, lets us apply a central limit theorem for quadratic forms of triangular arrays
and derive the asymptotic distribution of the IV estimator. The central limit theo-
rem (CLT) I use is given in Appendix A. It is based on a result from Kelejian and
Prucha (2005) and is an extension of a CLT in Kelejian and Prucha (2001).
Proposition 1 Under Assumptions 1-6, IV1 and IV2, we have that
V
?1/2
N
?H
0
N
?u
N
d
? N (0,I
p+1
),
where
?
V
1/2
N
??
V
1/2
N
?
0
= V
N
.
Proof. See the Appendix C.1.
31
However, it does not guarantee the existence of the limit in Assumption IV1.
77
To be able to write down explicit asymptotic distribution of the estimator, I
make the following assumption.
Assumption IV3 lim
N??
1
(T?1)N
V
N
= V,where V is finite.
We then have the following Theorem:
Theorem 1 Under Assumptions 1-6, and IV1-IV3, we have that
p
(T ?1)N?
?
b
?
N
??
?
d
? N (0,?),
with
?
(1+p)?(1+p)
= M
?1
?Z
(1+p)?(1+p)
M
0
H?Z
(1+p)?(1+p)
M
?1
HH
(1+p)?(1+p)
?
V
(1+p)?(1+p)
M
?1
HH
(1+p)?(1+p)
M
H?Z
(1+p)?(1+p)
M
?1
?Z
(1+p)?(1+p)
Proof. See the Appendix C.1.
Idonotprovideanestimateof? sinceitwoulddependonanestimateof
the P
N
=(I
N
??W
N
)
?1
matrix which includes an unknown parameter ?.I
will provide small sample guidance for the second stage estimator in Section 4.3.
Note that by Theorem 17 in P?tcher and Prucha (2001), the result in the above
Theorem implies that
p
(T ?1)N
b
?
N
isO
p
(1) and the initial estimator IV satis-
fies the conditions required in the following section and hence can be used in the
subsequent estimation steps.
78
4.2 Estimation of the Degree of Spatial Autocorrelation
The specification in this thesis reduces to that of Kapoor et al. (2005) in the static
case (? =0) which is in turn a generalization of the single cross-section case in
KelejianandPrucha (1999). Inthis section, Iwill showthatthe procedureadopted
in Kapoor et al. (2005) provides a consistent estimate of the spatial autoregres-
sive parameter in a dynamic panel data model as well. To do that, I define the
generalized moments (GM) estimator following Kapoor et al. (2005) and then
extend their proofs for the dynamic case. For simplicity, I only consider one of
the weighting schemes for the moment condition in Kapoor et al. (2005).
Observe that the spatial GM estimator in this section is essentially the same as
the estimator in Kapoor et al. (2005). However, the presence of stochastic regres-
sors (lagged dependent variable) renders the proofs in that paper inapplicable to
the specification considered in this thesis. Nevertheless, the proofs in this section,
with small exceptions (most notably Lemmas C4 and C6 in the Appendix C.2),
are a direct analogy of those in Kapoor et al. (2005).
I take an initial consistent estimate of the spatially correlated errors and use it
to estimate the spatial autocorrelation parameter based on a set of moment condi-
tions. The initial consistent estimate of the errors can be, for example based on
theIVestimatorintheprevioussection. The moment conditions are chosen so
that the estimator will have an Analysis of Variance interpretation.
Consider an estimator
b
?
N
p+1?1
of the parameter vector ?
p+1?1
such that
79
p
(T ?1)N
b
?
N
= O
p
(1) and denote the predictors of u
t
by bu
t
:
bu
t,N
N?1
= y
t,N
N?1
?(y
t?1,N
,X
t,N
)
N?p+1
?
b
?
N
p+1?1
, 1 ?t?T. (4.2.1)
The model implies that (see equation 3.1.2 in Chapter 3)
u
t,N
N?1
= ?W
N
N?N
u
t,N
N?1
+?
t,N
N?1
, 1 ?t?T, (4.2.2)
where?
t,N
= ?
t,N
+?
N
.Inastackednotationthisbecomes
u
N
NT?1
=?(I
T
?W
N
)
NT?NT
u
N
NT?1
+ ?
N
NT?1
, (4.2.3)
where u
N
=
?
u
0
1,N
,...,u
0
T,N
?
0
and?
N
= ?
N
+(e
T
??
N
),with
?
N
=
?
?
0
1,N
,...,?
0
T,N
?
0
,e
T
being aT ?1 vector of unit elements, and?
N
theN?1
vector of individual effects. It will prove convenient to introduce the following
notation:
u
N
=(I
T
?W
N
)u
N
, (4.2.4)
u
N
=(I
T
?W
N
)u
N
,
?
N
=(I
T
?W
N
)?
N
.
I will also use the following transformation matrices that are utilized in the error
80
component literature:
Q
0,N
NT?NT
=
?
I
T
?
J
T
T
?
?I
N
, (4.2.5)
Q
1,N
NT?NT
=
J
T
T
?I
N
,
where J
T
= e
T
e
0
T
is a T ? T matrix of unit elements.
32
Note that using the
transformation matrices, we can express the variance-covariance matrix of the
innovations as
E(?
N
?
0
N
)
NT?NT
= ?
2
?,N
I
NT
+?
2
?,N
(J
T
?I
N
) (4.2.6)
= ?
2
?,N
Q
0,N
+?
2
1,N
Q
1,N
,
where ?
2
1,N
= ?
2
?,N
+T ??
2
?,N
.
The spatial GM estimator is based on the following moment conditions:
E(?
0
N
Q
0,N
?
N
)=N (T ?1)?
2
?,N
, (4.2.7)
E(?
0
N
Q
0,N
?
N
)=(T ?1)?
2
?,N
?tr(W
0
N
W
N
),
E(?
0
N
Q
0,N
?
N
)=0,
E(?
0
N
Q
1,N
?
N
)=N?
2
1,N
,
E(?
0
N
Q
1,N
?
N
)=?
2
1,N
?tr(W
0
N
W
N
),
E(?
0
N
Q
1,N
?
N
)=0.
32
The Q
1
transformation calculates unit specific sample means while the Q
0
transformation
substracts them from the original variable.
81
For derivation of the moment conditions see Kapoor et al. (2005). Notice that
based on (4.2.3), the moment conditions can be rewritten in terms of the trans-
formed (by Q
j,N
) disturbance vectors u
N
, u
N
and u
N
:
?
N
= ?
N
?, (4.2.8)
where? =
?
?,?
2
,?
2
?,N
,?
2
1,N
?
0
, and
?
N
6?4
= E
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
0
11,N
?
0
12,N
?
0
13,N
0
?
0
21,N
?
0
22,N
?
0
23,N
0
?
0
31,N
?
0
32,N
?
0
33,N
0
?
1
11,N
?
1
12,N
0 ?
1
13,N
?
1
21,N
?
1
22,N
0 ?
1
23,N
?
1
31,N
?
1
32,N
0 ?
1
33,N
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
, ?
N
6?1
= E
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
0
1,N
?
0
2,N
?
0
3,N
?
1
1,N
?
1
2,N
?
1
3,N
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
, (4.2.9)
82
with (j =0,1)
?
j
11,N
=
2
N (T ?1)
1?j
u
0
N
Q
j,N
u
N
,?
j
12
=
?1
N (T ?1)
1?j
u
0
N
Q
j,N
u
N
,
?
j
21,N
=
2
N (T ?1)
1?j
u
0
N
Q
j,N
u
N
,?
j
22
=
?1
N (T ?1)
1?j
u
0
N
Q
j,N
u
N
,
?
j
31,N
=
1
N (T ?1)
1?j
?
u
0
N
Q
j,N
u
N
+u
0
N
Q
j,N
u
N
?
,
?
j
32,N
=
?1
N (T ?1)
1?j
u
0
N
Q
j,N
u
N
, (4.2.10)
?
j
13,N
=1,?
j
1
=
1
N (T ?1)
1?j
u
0
N
Q
j,N
u
N
,
?
j
23,N
=
1
N
tr(W
0
N
W
N
),?
j
2
=
1
N (T ?1)
1?j
u
0
N
Q
j,N
u
N
,
?
j
33,N
=0,?
j
3
=
1
N (T ?1)
1?j
u
0
N
Q
j,N
u
N
.
The sample counterparts of the six equations in (4.2.9) replace u
N
with
bu
N
=
?
bu
0
1,N
,...,bu
0
T,N
?
0
basedon(4.2.1)withthe implied notation
b
u
N
=(I
T
?W
N
)bu
N
and
b
u
N
=(I
T
?W
N
)
b
u
N
:
g
N
6?1
= G
N
6?4
?
4?1
+?
N
6?1
, (4.2.11)
83
where?
N
can be viewed as a vector of regression residuals and
G
N
6?4
=
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
g
0
11,N
g
0
12,N
g
0
13,N
0
g
0
21,N
g
0
22,N
g
0
23,N
0
g
0
31,N
g
0
32,N
g
0
33,N
0
g
1
11,N
g
1
12,N
0 g
1
13,N
g
1
21,N
g
1
22,N
0 g
1
23,N
g
1
31,N
g
1
32,N
0 g
1
33,N
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
, g
N
6?1
=
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
g
0
1,N
g
0
2,N
g
0
3,N
g
1
1,N
g
1
2,N
g
1
3,N
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
, (4.2.12)
with (j =0,1)
g
j
11,N
=
2
N (T ?1)
1?j
bu
0
N
Q
j,N
b
u
N
,g
j
12
=
?1
N (T ?1)
1?j
b
u
0
N
Q
j,N
b
u
N
,
g
j
21,N
=
2
N (T ?1)
1?j
b
u
0
N
Q
j,N
b
u
N
,g
j
22
=
?1
N (T ?1)
1?j
b
u
0
N
Q
j,N
b
u
N
,
g
j
31,N
=
1
N (T ?1)
1?j
?
bu
0
N
Q
j,N
b
u
N
+
b
u
0
N
Q
j,N
b
u
N
?
,
g
j
32,N
=
?1
N (T ?1)
1?j
b
u
0
N
Q
j,N
b
u
N
, (4.2.13)
g
j
13,N
=1,g
j
1
=
1
N (T ?1)
1?j
bu
0
N
Q
j,N
bu
N
,
g
j
23,N
=
1
N
tr(W
0
N
W
N
),g
j
2
=
1
N (T ?1)
1?j
b
u
0
N
Q
j,N
b
u
N
,
g
j
33,N
=0,g
j
3
=
1
N (T ?1)
1?j
bu
0
N
Q
j,N
b
u
N
.
The generalized moments (GM) estimator of ? =
?
?,?
2
?,N
,?
2
1,N
?
0
say
b
?
N
can
84
be written as
b
?
N
3?1
=argmin
???
??
g
N
6?1
?G
N
6?4
?
4?1
?
0
A
N
6?6
?
g
N
6?1
?G
N
6?4
?
4?1
??
, (4.2.14)
where?is the admissible optimization space; in particular it is assumed that? =
?
?,?
2
?,N
,?
2
1,N
?
?
?? [0,b
1
],?
2
?,N
? [0,b
2
],?
2
1,N
? [0,b
3
]
?
with b
1
, b
2
and b
3
being
predetermined constants. The moments are weighted by a sequence of weighting
matricesA
N
. Following Kapoor et al. (2005), two choices forA
N
areconsidered.
An initial unweighted spatial GM estimators uses A
N
= I
6
. The second choice
is to use an approximation to variance covariance matrix of the moments. In
particular, Kapoor et al. (2005) show that under normality the variance covariance
matrix of the six moment conditions in (4.2.7) is given by
?
N
6?6
=
?
?
?
1
T?1
?
2
?N
0
0 ?
2
1N
?
?
?
?T
W,N
2?2
, (4.2.15)
where
T
W,N
2?2
=
?
?
?
?
?
?
?
22tr
?
W
0
N
W
N
N
?
0
2tr
?
W
0
N
W
N
N
?
2tr
?
W
0
N
W
N
W
0
N
W
N
N
?
tr
?
W
0
N
W
N(
W
N
+W
0
N
)
N
?
0 tr
?
W
0
N
W
N(
W
N
+W
0
N
)
N
?
tr
?
W
N
W
N
+W
0
N
W
N
N
?
?
?
?
?
?
?
?
.
(4.2.16)
The weighted spatial GM estimator then replaces ?
2
?N
and ?
2
1N
by their initial
85
estimators and utilizes the weighting matrices A
N
=
b
?
?1
N
?
b
?
2
?N
,
b
?
2
1N
?
where
b
?
N
?
b
?
2
?N
,
b
?
2
1N
?
6?6
=
?
?
?
1
T?1
b
?
2
?N
0
0
b
?
2
1N
?
?
?
?T
W,N
2?2
, (4.2.17)
and the estimators
b
?
2
?N
,
b
?
2
1N
are based on the initial unweighted spatial GM esti-
mator.
The following additional assumption is required in order to establish consis-
tency of
b
?
GM,N
(the assumption is used to demonstrate that the estimator is iden-
tifiably unique):
Assumption GM1 The smallest eigenvalue of?
0
N
?
N
is uniformly bounded away
from zero. Furthermore, 0 <???
min
?
?
?1
N
?
??
max
?
?
?1
N
?
??<? .
The following theorem establishes the consistency of the GM estimator.
Theorem 2 Suppose Assumptions 1-6 and GM1 hold.
If
b
?
N
is a consistent estimator of? with
p
(T ?1)N
b
?
N
= O
p
(1),then
b
?
N
P
? ? as N ?? .
Proof. See the Appendix C.2.
86
4.3 Second Stage GMM Estimation
In this section I define a second stage generalized method of moments (GMM)
estimator of the slope coefficients ? =(?,?
0
)
0
and derive its asymptotic distrib-
ution. I base the estimator on a set of weighted moment conditions. In the first
part of this section, I consider a general case of stochastic instruments of a cer-
tain form and show that the normalized GMM estimator based on these moment
conditions converges (under the assumptions maintained in this thesis and under
additional assumptions spelled out in this section) in distribution. Next, I consider
the choice of an optimal weighting matrix for a given set of instruments. I close
the section with an application of these results to a feasible GMM estimator based
on moment conditions utilized in the literature (see Chapter 2 for a review).
Consider again the model (4.1.2)
?y
N
(T?1)N?1
= ?Z
N
(T?1)N?(1+p)
?
(1+p)?1
+ ?u
N
(T?1)N?1
, (4.3.1)
where the explanatory variable?Z
N
=(?y
?1,N
,?X
N
) contains lagged endoge-
nous variables. Let H
N
be a (T ?1)N ?k set of instruments (to be determined
later) such that
E
?
H
0
N
k?(T?1)N
?u
N
(T?1)N?1
!
= 0
k?1
. (4.3.2)
Also, let A
N
be a sequence of nonsingular symmetric k ? k matrices with non-
singular limit
p lim
N??
A
N
k?k
= A
k?k
. (4.3.3)
87
Consider the GMM estimator
e
?
N
basedoninstrumentsH
N
and weights A
N
defined as a minimizer of
(?y
N
??Z
N
?)
0
1?(T?1)N
H
N
(T?1)N?k
A
?1
N
k?k
H
0
N
k?(T?1)N
(?y
N
??Z
N
?)
(T?1)N
,
(4.3.4)
i.e.,
e
?
N
(1+p)?1
=
"
?Z
0
N
(1+p)?(T?1)N
H
N
(T?1)N?k
A
?1
N
k?k
H
0
N
k?(T?1)N
?Z
N
(T?1)N?(1+p)
#
?1
?
?Z
0
N
(1+p)?(T?1)N
H
N
(T?1)N?k
A
?1
N
k?k
H
0
N
k?(T?1)N
?y
N
(T?1)N?1
. (4.3.5)
Note that it is possible to define an initial IV estimator is of this form, withA
N
=
[(T ?1)N]
?1
H
0
N
H
N
. The initial IV estimator in Section 4.1 utilized lagged
levels of the endogenous variable as instruments and the instrument matrix H
N
was given in (4.1.5) and (4.1.9).
In the literature (e.g. Arellano and Bond, 1991; see Chapter 2 for a review) the
instrumentsetattimetis expanded to include all available lags of the endogenous
(and possibly also the exogenous) variable. As a result the number of the moment
condition is different at different time periods and the instrument matrix H
N
can
be, for example, as in (4.3.20) below. Observe that under the specification con-
sidered in this thesis, the endogenous variable can be expressed as linear forms of
the (mutually independent) innovations of the model:
88
Lemma 4 Under Assumptions 1-6 we can express the dependent variable as
y
t,N
N?1
= a
t,N
N?1
+
?
b
t
1?(T+2)
? P
N
N?N
!
?
N
(T+2)N?1
,
where the sequence of nonstochastic N ?1 vectorsa
t,N
and the sequence of non-
stochastic 1 ? (T +2)vectors b
t
have elements uniformly bounded in absolute
value.
Proof. See the Appendix C.3.
Motivated by the expression in the above Lemma, I consider a set of k
t
sto-
chastic instruments in each time period H
t,N
N?k
t
=(h
1,t,N
,...,h
k
t
,t,N
) and assume
that each instrument can be expressed as a linear combination of the model dis-
turbances,
h
r,t,N
N?1
= a
rt,N
N?1
+
?
b
rt
1?(T+2)
? P
N
N?N
!
?
N
(T+2)N?1
,r=1,...,k
t
, (4.3.6)
where the sequence of nonstochastic N ?1 vectors a
t,N
and the sequence of non-
stochastic 1 ? (T +2)vectors b
t
have elements uniformly bounded in absolute
value. The total number of instruments is k = k
2
+ ... + k
T
and the instruments
89
can be collected in a (T ?1)N ?k block-diagonal matrix
33
H
N
(T?1)N?k
=
?
?
?
?
?
?
?
?
H
2,N
N?k
2
0
.
.
.
0H
T,N
N?k
T
?
?
?
?
?
?
?
?
. (4.3.7)
Observe that the disturbances ?u
N
can also be expressed as a linear form of
the innovations?
N
,
?u
N
(T?1)N?1
=
"
?
0
(T?1)?2
, D
(T?1)?T
?
(T?1)?(T+2)
? P
N
N?N
#
?
N
(T+2)N?1
, (4.3.8)
where D is the first difference operator matrix defined in e.g. (4.1.14). Further-
more, the t-th period disturbances can be expressed as
?u
t,N
N?1
=
?
d
t
1?(T+2)
? P
N
N?N
!
?
N
(T+2)N?1
, (4.3.9)
with d
t
consisting of (t?1)-th row of
?
0
(T?1)?2
,D
?
.
As a result, the moment conditions collected in H
0
N
?u
N
are quadratic forms
33
This definitionof the instrument matrixis based onmoment condititons that are only averaged
over N andnotoverT. An alternative is to average over bothN andT as in the initial IVestimator
in Section 4.1.
90
in?
N
:
h
0
rt,N
1?N
?u
t,N
N?1
= a
0
rt,N
1?N
?
d
t
1?(T+2)
? P
N
N?N
!
?
N
(T+2)N?1
(4.3.10)
+ ?
0
N
1?(T+2)N
?
b
0
rt
(T+2)?1
d
t
1?(T+2)
?P
0
N
P
N
N?N
!
?
N
(T+2)N?1
.
Below, I will apply the central limit theorem for triangular arrays of quadratic
forms stated in Theorem A1 in Appendix A.
34
From Assumptions 1 and 2, it
follows that the random variables ?
N
satisfy Assumptions A1 and A3. Observe
that when the instruments are chosen to be lagged levels of the endogenous vari-
able (i.e. h
r,t,N
= y
t?s,N
, s>1), Lemma 4 and Assumption 3 guarantee that
(d
0
t
?P
0
N
)a
rt,N
and (b
0
rt
d
t
?P
0
N
P
N
) satisfy Assumption A2.
The condition
E
?
H
0
N
k?(T?1)N
?u
N
(T?1)N?1
!
= 0
k?1
, (4.3.11)
then implies that the matrix (b
0
rt
d
t
?P
0
N
P
N
) has zeros on the main diagonal and,
therefore, the quadratic forms satisfy conditions of Lemma A1. In particular, their
variances and covariances can be derived using the expressions in that Lemma.
The following Lemma shows that under regularity conditions the quadratic forms
h
0
rt,N
?u
t,N
converge in distribution when normalized by their standard errors.
Lemma 5 Consider a set of k instrumentsH
N
given in (4.3.7), with the diagonal
blocks H
t,N
=(h
1t,N
,...,h
k
t
t,N
) being N ?k
t
matrices (k = k
2
+ .. + k
T
)with
34
Observe that (d
0
t
?P
0
N
)a
rt,N
then corresponds to the sequence of vectors b
t
,while
(b
0
rt
d
t
?P
0
N
P
N
) corresponds to the sequence of matrices A
n
,and?
N
corresponds to the se-
quence of vectors of random variables?
n
in Theorem A1 in Appendix A.
91
columns h
rt,N
= a
rt,N
+(b
rt
?P
N
)?
N
,where the sequence of nonstochastic
N?1 vectorsa
rt,N
andthesequenceofnonstochastic 1?(T +2)vectorsb
rt
have
elements uniformly bounded in absolute value. Under Assumptions 1-6, and given
that the instruments are such thatE(H
0
N
?u
N
)=0
k?1
,E(H
0
N
?u
N
?u
0
N
H
N
)=
V
N
and
[(T ?1)N]
?1
?
min
(V
N
) ?c>0,wehavethat
V
?1/2
N
H
0
N
?u
N
d
? N (0,I
k
),
where V
N
= E(H
0
N
?u
N
?u
0
N
H
N
)=V
1/2
N
V
1/2
N
.
Proof. See the Appendix C.3.
Given that the moment conditions converge in distribution, the GMM estima-
tor defined in (4.3.5) will under appropriate regularity conditions also converge in
distribution:
Lemma 6 Consider a set of stochastic instruments H
N
such that
V
?1/2
N
H
0
N
?u
N
d
? N (0,I
k
),
where V
N
= E(H
0
N
?u
N
?u
0
N
H
N
)=V
1/2
N
V
1/2
N
, with
p lim
N??
[(T ?1)N]
?1
V
N
= V,
where V is finite. Furthermore, consider a sequence of weighting (possibly sto-
92
chastic) matrices A
N
with nonsingular (probability) limit
p lim
N??
A
N
= A.
Under Assumptions 1-6 and given that
M
H?Z
= p lim
N??
[(T ?1)N]
?1
H
0
N
?Z,
existsandhasfullcolumnrank, wehavethattheGMMestimatordefinedin(4.3.5)
converges in distribution and
p
(T ?1)N
?
e
?
N
??
?
d
? N (0,?),
where
? =
?
M
?ZH
A
?1
M
0
?ZH
?
?1
M
?ZH
A
?1
VA
?1
M
?ZH
?
M
?ZH
A
?1
M
0
?ZH
?
?1
.
Proof. See the Appendix C.3.
I give a small sample approximation for ? for the specific GMM estima-
tor considered below. Note that given Lemmas 4 and 5, the asymptotic result
in the above Lemma 6 applies to a general class of GMM estimators which in-
cludes the initial IV estimator discussed in Section 4.1,
35
as well as the different
35
The lemma is directly applicable when the moment conditions in the initial IV estimator are
averaged only over the cross-sectional units. Note that in Section 4.1, the moment conditions are
averaged over both cross-sectional units and time. I have provided the asymptotic results for this
93
variants of the GMM estimators in Arellano and Bond (1991) and, in particular,
the feasible GMM estimator discussed below. Note that in applying the above
Lemma to these estimators it remains to be checked whether in the particular
application the instruments satisfy the stipulated regularity conditions, e.g. that
plim
N??
[(T ?1)N]
?1
H
0
N
?Z
N
exists and has full column rank and that the
variance covariance matrix of the moment conditions has the smallest eigenvalue
uniformly bounded away from zero.
I now consider the issue of an optimal choice of the sequence of the weighting
matrices, given a set of instruments. I close this section with proving consistency,
asymptotic normality and providing a small sample guidance for a feasible second
stage GMM estimator based on moment conditions considered in the literature.
4.3.1 Optimal Weighting Matrix
ConsidernowtheoptimalchoiceofthesequenceoftheweightingmatricesA
N
.It
can be shown
36
that given a set of instruments, the asymptotic variance covariance
matrix of an estimator defined as a minimizer of (4.3.4) is minimized
37
when
p lim
N??
[(T ?1)N]
?1
A
N
= V. (4.3.12)
initial IV estimator in Theorem 1 above.
36
See e.g. Hansen (1982), Bates and White (1993), Newey and McFadden (1994), or
Wooldridge (2002), Ch. 8 and 14.
37
In the sense that the difference with respect to any other VC matrix of an estimator that is a
minimizer of (4.3.4) is positive semi-definite.
94
Given that plim
N??
[(T ?1)N]
?1
V
N
= V, the small sample weighting matri-
ces A
N
can be chosen to be estimators of the small sample variance covariance
matrix V
N
= E(H
0
N
?u
N
?u
0
N
H
N
). Observe that the matrix V
N
can be parti-
tioned as
V
N
=
?
?
?
?
?
?
V
22,N
V
2T,N
.
.
.
V
T2,N
V
TT,N
?
?
?
?
?
?
, (4.3.13)
where V
ts,N
= E
?
H
0
t,N
?u
t,N
?u
0
s,N
H
s,N
?
. I denote the ij-th element of V
ts,N
as v
ij,ts,N
= E
?
h
0
it,N
?u
t,N
?u
0
s,N
h
js,N
?
. Given the structure of the instruments
assumed in this section, the moment conditions are quadratic forms in ?
N
and
satisfy conditions of Lemma A1 in Appendix A - see the discussion preceding
Lemma 5. In particular, we have as in (4.3.10) above:
h
0
it,N
?u
t,N
= a
0
it,N
(d
t
?P
N
)?
N
+?
0
N
(b
0
it
d
t
?P
0
N
P
N
)?
N
, (4.3.14)
and hence from Lemma A1 in Appendix A, the covariance of h
0
it,N
?u
t,N
and
h
0
js,N
?u
s,N
denoted as v
ij,ts,N
is given by:
v
ij,ts,N
= a
0
it,N
(d
t
?
?,N
d
0
s
?P
N
P
0
N
)a
js,N
(4.3.15)
+2tr(b
0
it
d
t
?
?,N
d
0
s
b
js
?
?,N
?P
0
N
P
N
P
0
N
P
N
),
where?
?,N
is defined in (4.1.16).
Observe that for |s?t| > 1,wehaved
t
?
?,N
d
0
s
=0and hence the above
95
covariance is zero. An expectations based estimator, say
b
V
E
N
,ofV
N
would then
replace the true values of the parameters in the above expression by their initial
consistent estimates. Note that in addition to ?
?,N
and P
N
,thetermsa
it,N
and
b
it
alsopotentiallydependontheparameters of the model (compare e.g. the
expressions for a
t,N
andb
t
in the proof of Lemma 4 in Appendix C.3). The exact
form depends on the choice of the instruments. In Section 4.3.3 below, I consider
a set of instruments utilized in the literature (e.g. Arellano and Bond, 1991) and I
also provide an expression for such expectation based variance covariance matrix
estimator given such choice of instruments. Note that the instruments considered
in Section 4.1 are also of the form assumed here; see the proof of Lemma 1. The
expression for V
N
isthengivenbyLemma2.
As an alternative to
b
V
E
N
, the small sample weighting matrices can be con-
structed based on approximations to H
0
N
E(?u
N
?u
0
N
)H
N
. For stochastic in-
struments, such estimator will not in general be consistent estimator of
E(H
0
N
?u
N
?u
0
N
H
N
). Nevertheless, based on Lemma 6, the resultant second
stage GMM estimator is consistent. It is also computationally simpler and has
reasonable small sample properties (see Chapter 5).
This estimator denoted by
b
V
mix
N
ignores the fact that the instruments collected
in H
N
are stochastic and replaces the disturbances ?u
N
?u
0
N
by an estimate of
their expected value:
b
V
mix
N
=[(T ?1)N]
?1
H
0
N
b
?
?u,N
H
N
, (4.3.16)
96
where
b
?
?u,N
isanestimatorofthevariancecovariancematrixofthedisturbances.
In our case this could be:
b
?
?u,N
=
b
?
2
?N
?
D?
b
P
N
?
Q
0,N
?
D
0
?
b
P
0
N
?
, (4.3.17)
whereb?
N
and
b
?
2
?N
are initial estimates and
b
P
N
=(I
N
?b?
N
W
N
)
?1
. (4.3.18)
4.3.2 Feasible GMM Estimator
Consider now a GMM estimator based on the moment conditions of the form
E
h
e
H
0
N
?u
N
i
=0, (4.3.19)
where
e
H
N
=
?
?
?
?
?
?
e
H
2,N
0
.
.
.
0
e
H
T,N
?
?
?
?
?
?
N(T?1)?k
, (4.3.20)
with
e
H
t,N
=(y
t?2,N
,...,y
0,N
,X
t,N
,...,X
1,N
) being a N ? k
t
matrix of instru-
ments at time t.Notethatk
t
=(t?1) +t?p and k = k
2
+..+k
T
.Let
e
V
N
= E
?
e
H
0
N
?u
N
?u
0
N
e
H
N
?
, (4.3.21)
97
then the estimator is given by
e
?
N
=
h
?Z
0
N
e
H
N
e
V
?1
N
e
H
0
N
?Z
N
i
?1
?Z
0
N
e
H
N
e
V
?1
N
e
H
0
N
?y
N
. (4.3.22)
The instrument matrix in (4.3.20) utilizes moment conditions of the form
E[(u
t,i
?u
t?1,i
)y
t?1?s,i
]=0 s =1,..,t?1, (4.3.23)
E[(u
t,i
?u
t?1,i
)X
t?s,i
]=0
1?p
,
E[(u
t,i
?u
t?1,i
)X
t,i
]=0
1?p
.
Observe that the instruments consist of y
t?1?s,N
, X
t,N
and X
t?s,N
, s =1,...,t?
1 and hence by Lemma 4 they are linear forms in the innovations of the form
considered above, e.g. they satisfy the conditions in Lemma 5. To complete
the verification of the conditions stipulated in Lemma 5, I consider the smallest
eigenvalues of the sequence of matrices
e
V
N
= E
?
e
H
0
N
?u
N
?u
0
N
e
H
N
?
.
Note that from Lemma 4 it follows that y
t,N
= a
t,N
+(b
t
?P
N
)?
N
.Letus
denote
e
S
t,N
=[a
t?2,N
,...,a
0,N
,X
t,N
,...,X
1,N
]
N?k
t
, (4.3.24)
and
?
t,N
=
"?
(b
t?2
,...,b
0
)
1?(t?1)(T+2)
? P
N
N?N
!?
I
t?1
(t?1)?(t?1)
? ?
N
N(T+2)?1
!
,0
N?tp
#
. (4.3.25)
98
The instruments can then be expressed as
e
H
t,N
N?k
t
=
e
S
t,N
N?k
t
+?
t,N
N?k
t
, (4.3.26)
As a result the full matrix of instruments is
e
H
N
=
e
S
N
+?
N
, (4.3.27)
where the matrix
e
S
N
contains the nonstochastic elements of the instruments and
is defined as
e
S
N
=
?
?
?
?
?
?
e
S
2,N
0
.
.
.
0
e
S
T,N
?
?
?
?
?
?
N(T?1)?k
, (4.3.28)
while the stochastic components of the instrument matrix are
?
N
=
?
?
?
?
?
?
?
2,N
0
.
.
.
0 ?
T,N
?
?
?
?
?
?
N(T?1)?k
. (4.3.29)
To guarantee that the smallest eigenvalue of [(T ?1)N]
?1
e
V
N
is uniformly
bounded away from zero, I make the following assumption:
Assumption GMM1 Thesmallesteigenvalueof[(T ?1)N]
?1
e
S
0
N
e
S
N
isuniformly
bounded away from zero.
99
Given the above Assumption, we have by Lemma 5 that the normalized mo-
ment conditions converge in distribution. I next show that the estimator
e
?
N
,where
the weighting matrix for the moment conditions is based on the true value of the
parameters is consistent and asymptotically normal. Corresponding to Assump-
tions IV1 and IV3 for the first stage estimator, I introduce the following assump-
tions. Let
f
M
H?Z
(1+p)?(1+p)
= plim
1
(T ?1)N
e
H
0
N
?Z
N
. (4.3.30)
Assumption GMM2 The matrix
f
M
H?Z
exist and is finite with full column rank.
Assumption GMM3 The matrix
e
V = plim
N??
[(T ?1)N]
?1
e
V
N
exists and is
finite and invertible.
As a consequence of Lemma 6, we now have the following Theorem.
Theorem 3 Under Assumptions 1-6, and GMM1-GMM3, we have that
p
(T ?1)N
?
e
?
N
??
?
d
? N [0,?],
where
? =
h
f
M
0
H?Z
e
V
?1
f
M
H?Z
i
?1
.
100
Proof. See the appendix.
The above estimator is based on the true value of the parameters which are
unknown and have to be estimated. I now provide an expression for the expec-
tation based estimator of the variance covariance matrix of the moment condi-
tions
e
V
N
, denoted by
b
V
N
?
b
?
N
?
where
b
?
N
is an initial consistent estimator of
? =
?
?
N
,?
2
?N
,?
2
?
,?
?
. I then show that when the feasible GMM estimator uses
h
b
V
N
?
b
?
N
?i
?1
as the moment weighting matrix, the parameters collected in the
vector ? are nuisance parameters.
Thevariancecovariancematrixofthemomentconditionscollectedin
e
H
0
N
?u
N
with
e
Hdefined in (4.3.20), can be written analogically to (4.3.13) as
e
V
N
=
?
?
?
?
?
?
e
V
22,N
e
V
2T,N
.
.
.
e
V
T2,N
e
V
TT,N
?
?
?
?
?
?
, (4.3.31)
where
e
V
ts,N
= E
?
e
H
0
t,N
?u
t,N
?u
0
s,N
e
H
s,N
?
.Since
e
H
t,N
consists of stochastic
part (y
t?2,N
,..,y
0,N
) and nonstochastic part (X
t,N
,...,X
1,N
),thematrix
e
V
ts,N
is
partitioned accordingly:
38
e
V
ts,N
=
?
?
?
e
V
y
ts,N
0
(t?1)?tp
0
sp?(s?1)
e
V
X
ts,N
?
?
?
k
t
?k
s
, (4.3.32)
38
I show that the off-diagonal blocks of
e
V
ts,N
are matrices of zeros as a part of the proof of the
Lemma 7 below.
101
where the upper block is
e
V
y
ts,N
=
?
ev
y
qr,ts,N
?
q=1,..,t?1
r=1,..,s?1
, (4.3.33)
with ev
y
qr,ts,N
= E
?
y
0
t?1?q
?u
t,N
?u
0
s,N
y
s?1?r
?
. Given expressions fory
t?1?q
and
y
s?1?r
in Lemma 4 and the expressions for ?u
t,N
and ?u
s,N
in (4.3.9), the mo-
ment conditions y
0
t?1?q
?u
t,N
and y
0
s?1?r
?u
s,N
arequadraticformsin?
N
and
their covariance is (using Lemma A1 in Appendix A) given by
ev
y
qr,ts,N
= E
?
y
t?1?q
?u
t,N
?u
0
s,N
y
s?1?r
?
(4.3.34)
= a
0
t?1?q,N
(d
t
?
?,N
d
0
s
?P
N
P
0
N
)a
s?1?r,N
+2tr
?
b
0
t?1?q,N
d
t
?
?,N
d
0
s
b
s?1?r,N
?
?,N
?P
0
N
P
N
P
0
N
P
N
?
.
Notethatby(4.3.9), thedisturbances?u
t,N
arelinearformsintheinnovations
?
N
. From Lemma A1 in Appendix it then follows that the variance covariance
matrix of ?u
t,N
and?u
s,N
is (d
t
?
?,N
d
0
s
?P
N
P
0
N
). Hence the second block of
b
V
ts,N
is:
e
V
X
ts,N
=(X
t,N
,...,X
1,N
)
0
E
?
?u
t,N
?u
0
s,N
?
(X
s,N
,...,X
1,N
) (4.3.35)
=(X
t,N
,...,X
1,N
)
0
(d
t
?
?,N
d
0
s
?P
N
P
0
N
)(X
s,N
,...,X
1,N
).
The estimator
b
V
N
?
b
?
N
?
replaces the true values in the expressions (4.3.31)-
(4.3.35) by their initial estimates collected in the vector
b
?
N
=
?
b?
N
,
b
?
2
?N
,
c
?
2
?
,
b
?
?
0
.
102
In particular, it replaces?
?,N
,P
N
, a
t,N
,andb
t,N
with
b
?
?,N
= diag
?
b?
2
?,N
,
b?
2
?,N
1?
b
?
,b?
2
?,N
,...,b?
2
?,N
!
, (4.3.36)
b
P
N
=(I
N
?b?
N
W
N
)
?1
,
ba
t,N
=
t?1
X
j=0
b
?
j
N
X
t?j,N
b
?
N
,
b
b
t,N
=
?
1
1?
b
?
N
,1,
b
?
t?1
N
,..,
b
?
0
N
,0
1?(T?t)
!
.
Note that in order to for the estimator ofthevariancecovariancematrixof
the moment conditions to be feasible, this implicitly assumes that the past values
of the exogenous variables are so that
P
?
j=0
?
j
X
?j,N
? =0,i.e. thereareno
individual effects other than those contained in?
i
.
39
The following Lemma shows
that the estimator
b
V
N
is consistent.
Lemma 7 Under Assumptions 1-6, and GMM1-GMM3, and given that
b
?
N
p
? ?
as N ?? , the row and column sums of the matrix rW
N
are uniformly bounded
in absolute value for some r with |?| <r<1,andthat
P
?
j=0
?
j
X
?j,N
? =0,we
have that
1
(T?1)N
b
V
N
?
b
?
N
?
p
?
e
V.
Proof. See the Appendix C.3.
Consider now the feasible GMM estimator that uses
h
b
V
N
?
b
?
N
?i
?1
as the
39
This will not be satisfied when the model contains a deterministic constant terms. In this case,
it is necessary to assume that the past values of the exogenous variables are observable and replace
the expression for ba
t,N
with ba
t,N
=
P
?
j=0
b
?
j
N
X
t?j,N
b
?
N
.
103
moment weighting matrix and is defined as
?
?
N
?
b
?
N
?
=
?
?Z
0
N
e
H
N
?
b
V
N
?
b
?
N
??
?1
e
H
0
N
?Z
N
?
?1
?
?Z
0
N
e
H
N
?
b
V
N
?
b
?
N
??
?1
e
H
0
N
?y
N
. (4.3.37)
The following Theorem establishes that the parameters collected in the vector ?
are nuisance parameters.
Theorem 4 Under Assumptions 1-6, and GMM1-GMM3, and given that
b
?
N
p
? ?
as N ?? , the row and column sums of the matrix rW
N
are uniformly bounded
in absolute value for some r with |?| <r<1,andthat
P
?
j=0
?
j
X
?j,N
? =0,we
have that
p
N (T ?1)
?
?
?
N
?
b
?
N
?
?
e
?
N
?
p
? 0,
and hence
p
N (T ?1)
?
?
?
N
?
b
?
N
?
??
?
d
? N (0,?).
Proof. See the Appendix C.3.
The small sample approximation to the variance covariance matrix ? can be
basedonthefollowingLemma.
Lemma 8 Under Assumptions 1-6, and GMM1-GMM3, and given that
b
?
N
p
? ?
as N ?? , the row and column sums of the matrix rW
N
are uniformly bounded
for some r with |?| <r<1, and that
P
?
j=0
?
j
X
?j,N
? =0,wehavethat
b
?
N
?
b
?
N
?
p
? ?,
104
as N ?? ,where
b
?
N
?
b
?
N
?
=
1
(T ?1)N
h
?Z
0
N
e
H
N
b
V
?1
N
e
H
0
N
?Z
N
i
?1
.
Proof. See the Appendix C.3.
105
5MonteCarloStudy
I consider the same dynamic panel data model as specified in Chapter 3. Here
I will first definetheestimatorsthatIexamineintheMonteCarlostudy.Ithen
describe how I generated the artificial data samples, briefly describe the range of
the parameters I considered and finally present the results of the experiments.
5.1 Estimators Considered
I consider the following estimators in my simulations. The first group of esti-
mators, labeled ?Initial Estimators?, ignores the spatial autocorrelation of the dis-
turbances and estimates only the slope coefficients of the model (i.e. ? and ?).
The second group of estimators uses some initial estimator of the slope coeffi-
cients (and the projected disturbances it implies) and provides an estimate of the
spatial autocorrelation parameter (?) and the variances of the innovations and the
individual effects (?
2
?
and ?
2
?
). Finally, the third group, labeled as ?Second Stage
GMM Estimators?, are estimators that use different weighting schemes to weight
the same moment conditions as the initial estimators. The weights are based on
initial estimators of ?, ?
2
?
and ?
2
?
. For comparison, I also include results a for two
stage GMM estimator that ignores spatial correlation. The rest of this section will
introduce the different estimators. For clarity of the exposition, I will drop the
sample size subscript in this section.
106
5.1.1 Initial Estimators
I consider the instrumental variable (IV) estimators suggested by Anderson and
Hsiao (1981) as well as IV estimators that use a larger instrument set, correspond-
ing to the initial estimators suggested by Arellano and Bond (1991) and others.
All these estimators can be written as IV estimators but with a different instru-
ment matrix. In particular, they are of the form
b
?
IV
=
?
b
?
IV
,
b
?
0
IV
?
0
=
h
?Z
0
H(HH
0
)
?1
H
0
?Z
i
?1
?Z
0
H(HH
0
)
?1
H
0
?y,
(5.1.1)
where the right-hand side variables of the first differenced model (3.1.1 or 4.1.1)
are stacked in a matrix ?Z as in, e.g. (4.1.3), the dependent variable is ?y as in
(4.1.2) and the matrix H collects the instruments used. The instrument matrix is
block diagonal with each block containing the set of instruments for the relevant
time period:
H =
?
?
?
?
?
?
H
2
0
.
.
.
0H
T
?
?
?
?
?
?
.
Different choices of H will lead to different initial estimators. In particular,
the following estimators are considered in the experiments: the IV estimators
suggestedbyAndersonandHsiao (1981), the initial IV estimators suggested by
Arellano and Bond (1991), as well as the IV estimator with the instrument set
discussed in this thesis in Chapter 4, Section 4.3.2.
The two Anderson and Hsiao (AH) estimators use respectively lagged first
107
difference of the endogenous variable (y
t?2
?y
t?3
) and level of the endogenous
variable lagged twice (y
t?2
) as instruments for the lagged difference of the en-
dogenous variable (y
t?1
?y
t?2
). The instrument matrices hence have the follow-
ing form:
H
AH1
=
?
?
?
?
?
?
?
?
?
(X
2
?X
1
)
(y
1
?y
0
,X
3
?X
2
)
(y
T?2
?y
T?3
,X
T
?X
T?1
)
?
?
?
?
?
?
?
?
?
,
(5.1.2)
and
H
AH2
=
?
?
?
?
?
?
?
?
?
(y
0
,X
2
?X
1
)
(y
1
,X
3
?X
2
)
(y
T?2
,X
T
?X
T?1
)
?
?
?
?
?
?
?
?
?
.
(5.1.3)
In addition to the moment condition (i =1,...,N)
E[(u
it
?u
i,t?1
)(x
it
?x
i,t?1
)] = 0
p?1
t =1,...,T, (5.1.4)
the AH estimators each utilize at each time period one additional moment condi-
tion:
E[(u
it
?u
i,t?1
)(y
i,t?2
?y
i,t?3
)] = 0 t =2,...,T (5.1.5)
108
and
E[(u
it
?u
i,t?1
)y
i,t?1
]=0 t =1,...,T (5.1.6)
respectively. However, as pointed out by Arellano and Bond (1991), there are
additional moment conditions, not utilized by the AH estimators. In particular,
for the observation at a time t,wehavethefollowing additional moments:
E[(u
it
?u
i,t?1
)y
i,t?1?k
]=0 k =1,..t?1.
Similarly, there are additional moment conditions involving lags of the exoge-
nous variables in addition to the condition utilized by the AH estimators. There-
fore, based on Arellano and Bond (AB), I consider an instrument matrix discussed
in Section 4.3:
H
AB
=
?
?
?
?
?
?
?
?
?
(y
0
,X
1
,X
2
)
(y
0
,y
1
,X
1
,X
2
,X
3
)
.
.
.
(y
0
,..,y
T?2
,X
1
,..,X
T
)
?
?
?
?
?
?
?
?
?
.
(5.1.7)
The table below summarizes the initial estimators, their instrument matrices
and the moment conditions that the instruments are based on.
109
Table 2. Estimators Considered
Estimator
(Instrument Matrix)
Moment Conditions
i =1,..,N and t =1,..,T
AH difference (H
AH1
)
E[(u
it
?u
i,t?1
)(y
i,t?2
?y
i,t?3
)] = 0,
t =1not considered
E[(u
it
?u
i,t?1
)(x
it
?x
i,t?1
)] = 0
p?1
AH level (H
AH2
)
E[(u
it
?u
i,t?1
)y
i,t?2
]=0
E[(u
it
?u
i,t?1
)(x
it
?x
i,t?1
)] = 0
p?1
AB (H
AB
)
E[(u
it
?u
i,t?1
)y
i,t?2?k
]=0,
k =0,...,t?1
E[(u
it
?u
i,t?1
)x
is
]=0
p?1
,
s =1,..,t
5.1.2 Spatial Parameter Estimators
I consider the spatial generalized moments (GM) estimators of the spatial autore-
gressive parameter ? suggested by Kapoor et al. (2005) and discussed in Chapter
4. The spatial GM estimator was defined in (4.2.14).
The estimators differ along two dimensions. First, they differ with respect
to how the estimated disturbances were calculated. I consider the three initial
estimators from the previous section as well as the true value of the disturbances.
Secondly,theestimatorsdifferwithrespecttohowthemomentsareweighted. The
first estimator is referred to as ?Unweighted Spatial GM Estimator? and weights
the moment conditions equally, e.g. by setting A
N
= I
6
in (4.2.14). The second
110
estimator I consider is based on the full set of weighted moments and utilizes the
weighting matrix A
N
=
b
?
N
defined in (4.2.17). I refer to this estimator as the
?Weighted Spatial GM Estimator?.
Altogether, there are four different possibilities to calculate the estimated dis-
turbances (three initial estimators and the true values) and two types of GM esti-
mator (unweighted and weighted moment conditions), i.e. altogether eight possi-
ble combinations.
5.1.3 Second Stage GMM Estimators
The second stage GMM estimators utilize the same moment conditions as the
initial AB estimator but with a weighting matrix. The estimators are of the form:
b
?
GMM
=
?
b
?
GMM
,
b
?
0
GMM
?
0
=
?
?Z
0
HA
?1
k
H
0
?Z
?
?1
?Z
0
HA
?1
k
H
0
?y,
(5.1.8)
where the weighting matrix A
k
, k =1,2,3 is calculated in three different ways.
The first case is a weighting matrix that ignores the spatial autocorrelation of the
disturbances but uses an estimators for ?
2
?
and ?
2
1
that are consistent even for
nonzero values of ?. In particular, the first weighting scheme uses:
A
1
= H
0
e
?H, (5.1.9)
111
with
e
? =(D?I
N
)
?
e
?
2
?
Q
0
+
e
?
2
1
Q
1
?
(D
0
?I
N
), (5.1.10)
where the estimators
e
?
2
?
,and
e
?
2
1
are the spatial GM estimators (with weighted
moment conditions) described above and, basedonaninitialIVestimatorwith
H
AB
as the instrument matrix.
The second weighting scheme uses
b
V
mix
as an estimate of the variance co-
variance matrix of the moment conditions (see Section 4.3), i.e. it employs A
2
=
b
V
mix
,where
b
V
mix
= H
0
b
?H, (5.1.11)
with
b
? =
?
D?
e
P
??
e
?
2
?
Q
0
+
e
?
2
1
Q
1
??
D
0
?
e
P
0
?
, (5.1.12)
where
e
P =(I
N
?e?W)
?1
, (5.1.13)
Dis the (T ?1) ? (T ?1) first difference transformation matrix: D:
D =
?
?
?
?
?
?
?
?
?
?11 0??? 0
0 ?11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
0 ??? 0 ?11
?
?
?
?
?
?
?
?
?
, (5.1.14)
and estimators
e
?
2
?
,
e
?
2
1
ande?are the spatial GM estimators (with weighted moment
conditions) based on the initial IV estimator withH
AB
as the instrument matrix.
112
Finally, the last weighting scheme uses a consistent estimate of the variance
covariance matrix of the moment conditions
b
V
E
,i.e.itemploysA
3
=
b
V
E
,where
b
V
E
is definedinSection 4.3.2(equations 4.3.31 - 4.3.35) andis basedonthesame
initial estimators
e
?
2
?
,
e
?
2
1
ande?, as well as on the initial IV estimators
b
? and
b
?.
5.2 Data Generation
I first generate the exogenous variables so that these are common across the dif-
ferent replications. The exogenous variables consist of a nonstochastic intercept
(equal to unity) and a second stochastic exogenous variable. I generate the sec-
ond exogenous variable as independent (of all the other random variables in the
model) jointly normally distributed random variables, i.e.
?
(vecX
1
)
0
,...,(vecX
T
)
0
?
0
?N (0,I
NT
). (5.2.1)
The exogenous variables are generated once and are used in all replications of the
model.
In each replication, I then draw (T +2)N independent jointly normally dis-
tributed random numbers that are used to construct draws of the vector ?.The
first N draws are scaled by ?
?
=
p
?
?
?
2
?
and are used for creating the N ? 1
vector of individual effects ? =(?
1
,...,?
N
)
0
.
40
The next N draws are scaled by
40
We find that altering the ratio of the variance of the individual effects and model disturbances
does not qualitativelly affect our results and hence we only consider ?
?
=1.
113
q
?
2
?
(1??)
2
and are used for creating the N ? 1 vector of the initial disturbances
? =
?
X
j=0
?
j
?
?j
. (5.2.2)
Finally, the last TNdraws are scaled by ?
?
and are used for creating the NT ? 1
vector of disturbances (?
0
1
,...,?
0
T
)
0
IconstructtheN ? 1 vector of initial observations as
41
y
0
=(I
N
??W)
?1
?
?+(1??)
?1
?
?
. (5.2.3)
The subsequent observations for t =1,..,T are then generated according to the
our model as
y
t
= ?y
t?1
+X
t
?+(I
N
??W)
?1
(?
t
+?). (5.2.4)
5.3 Designs Considered
In all experiments I set N =100and T =5. I consider three specifications for
the spatial weighting matrix, as in Kelejian and Prucha (1999) and Kapoor et al.
(2005). The matrices differ in the degree of their sparseness. The first matrix has
in its i-th row, 1 <i<N, nonzero elements in positions i ? 1 and i +1,so
that the i-th unit is directly related to its immediate neighbors. I define this matrix
in a circular world so that the nonzero elements in the first and last rows are at
41
This specification implicitly assumes that the contribution of initial values of the exogenous
variables is zero. i.e. that
P
?
j=0
X
?j
? =0.
114
positions (1,2), (1,N), (N,1) and (N,N ?1). This matrix is row normalized
and hence all the nonzero elements are equal to 1/2. As in Kelejian and Prucha
(1999), I refer to this matrix as ?1 ahead and 1 behind?. The next weighting
matrices are defined in a corresponding way as ?3 ahead and 3 behind? and ?5
ahead and 5 behind? with nonzero entries 1/6 and 1/10 respectively. In the tables
of results below I reference the matrices by W =1,2,3.
The exogenous variables were generated once prior to the Monte Carlo ex-
periments and the process is described above in the Data Generation section. For
simplicity I always set ? =(1,1)
0
. The rest of the coefficients of the model take
on the following values:
?? {?0.9,?0.75,?0.25,0,0.25,0.75,0.9}, (5.3.1)
and
?? {?0.9,?0.5,?0.25,0,0.25,0.5,0.9}. (5.3.2)
I find that the results do not qualitatively change with theratioofthevariances
of the independent innovations and the individual effects and hence I always set
?
?
=
?
2
?
?
2
?
=1. The variance of the independent innovations is always set to one.
As a result the different specifications will have different overall average R
2
of
the data. The variance of the dependent variable conditional on the explanatory
variables (equal to variance of the disturbances) is given by
VC(y
t
|X
t
,y
t?1
)=
?
?
2
?
+?
2
?
?
?PP
0
, (5.3.3)
115
where as before I define P =(I
N
??W)
?1
. Furthermore, the unconditional
variance of the dependent variable is
42
VC(y
t
)=
?
?
2
?
1??
2
+
?
2
?
(1??)
2
?
?PP
0
. (5.3.4)
The expected R
2
of the data is then equal to the ratio of the conditional and un-
conditional variance of the dependent variable and hence is a function of the true
values of the parameters ?, W and ? (as well as ?
2
?
and ?
2
?
).
In particular, consider the vector of observation of the dependent variable as
y =(y
0
0
,...,y
0
T
)
0
and its mean vector denoted as
y = E
?
(y
0
1
,...,y
0
T
)
0
?
=
"
?
0
X
0
1
,...,
T
X
t=1
?
T?t
?
0
X
0
t
#
0
, (5.3.5)
The sample correlation coefficient betweeny and y is then defined as
r =
(y?e
TN
?y
0
y)
0
(y?e
TN
?y
0
y)
p
(y?e
TN
?y
0
y)
0
(y?e
TN
?y
0
y)(y?e
TN
?y
0
y)
0
(y?e
TN
?y
0
y)
,
(5.3.6)
where e
TN
is a TN ? 1 vector of unit elements. The designs considered in the
Monte Carlo experiments are such that average (over the replications of a partic-
ular design) r is between 0.54 and 0.78.
Tosummarize,wehave7valuesfor?,7valuesfor?and 3 different weighting
matrices W, that is 147 different parameter designs.
42
Note that the expression is derived analogously to the variance-covariance matrix of the initial
observations, given in equation (3.2.5)
116
5.4 Tables of Results
The tables of results D1-D4 contain bias and a measure of the root mean square
errors of the different estimators for the 147 designs considered. For each con-
stellation of parameters, the random numbers were generated 1000 times and the
estimators calculated and their values saved. For each estimator, I report the me-
dian and root mean square error calculated as in Kapoor (2005); that is using the
interquantile based measure:
RMSE =
"
bias
2
+
?
IQ
1.35
?
2
#
1/2
, (5.4.1)
where bias is the true value of the parameter minus the median of the estimators,
and IQis the difference between .75 and .25 quantiles.
Observe that the comparison of the different estimation procedures in Tables
D1-D4 is only based on comparing the .25, .50 and .75 quantiles of their distrib-
utions. Note that hypothesis tests are often based on the .05 and the .95 quantiles
and hence it might be of interest to consider quantiles other than those used in
constructing the bias and RMSE measure.
Tomakesuchcomparisonfeasible,IpresentinFigures1through6thequantile-
to-quantile plots that compare the small sample distribution of the estimated slope
coefficient ? with the Gaussian normal distribution. The plots depict the sam-
ple cumulative distribution of the estimator (over the 1,000 replications of each
design). Thelefthandsideaxisoftheplotshasanonlinearscalesothatifthe
data was exactly normally distributed, the plot would be linear. Therefore, any
117
nonlinearity in the plot represents deviations from normality at the appropriate
quantiles.
I superimpose the 147 design on top of each other
43
in each Figure, so that
the deviations from the straight line represent the worst-case scenarios over the
entire parameter space. For illustration purposes, Figure 7 shows the quantile-to-
quantile plot of 1,000 replications of N (0,1) distribution, and Figure 8 show the
sameplotwherethesamplewasdrawnfromastudent-tdistributionwith5degrees
of freedom. Observe that the quantile-to-quantile plot allows an easy detection of
even such small deviations from normality.
5.5 Conclusions and Comparison with Other Studies
The results of the experiments confirm the finding in the literature that for some
parameter values the performance of the Anderson-Hsiao estimator AH1 is not
very satisfactory (see Table D1). However, the second initial estimator AH2 (us-
ing the twice lagged level of the endogenous variable as an instrument) performs
quite well and in fact for most parameter values it is better (in terms of lower
bias and/or lower RMSE) than the estimator AB that uses a larger instrument set.
Note that if the model did not contain individual effects, the instruments used by
the estimator AH2 would be the conditional expectations of the right-hand side
variables. This might explain its relativelly good performance. Note that the AH1
and AH2 estimators are exactly identified and hence their performance cannot be
43
To maintain compatibility over different designs, the small sample distributions were normal-
ized by their medians and the difference between the .25 and .75 quantiles.
118
improved by weighting the moment conditions.
In contrast, once the moment conditions are weighted, the performance of the
ABestimatorimproves(seeTableD2)andbecomesbetterthanthatoftheAH2es-
timator. Observe that ignoring spatial autocorrelation in constructing the weights
involves a moderate loss of efficiency relative to the other weighting schemes es-
pecially when the spatial autocorrelation is high and positive. On the other hand
for low or negative values of ?, this weighting scheme performs as well as the
more computationally involved alternativesandhenceisaviableoptionincase
where the calculation of the inverse in (I
N
??W
N
)
?1
is computationally pro-
hibitive.
The second weighting scheme (labeled mix) uses an inconsistent estimate of
the variance covariance matrix of the moment conditions. However, this does not
negatively affects the small sample performance of the GMM estimator and the
performance is for most parameter values in fact better than that of the other two
alternatives.
The last weighting scheme has for many parameter values clearly the smallest
biasbutitsRMSEisabout thesameasthat ofthe alternatives. Overallthereseems
to be no clear best choice of the weighting scheme and all of the weights lead to
a second stage GMM estimator that performs satisfactory over the entire range of
parameter (which is not true for any of the initial estimators).
Examining other quantiles of the small sample distributions of the estimators
in Figures 1-6 shows that the distributions of the initial IV estimators are not are
not well approximated by the normal distribution. The Anderson Hsiao estimators
119
(AH1 and AH2) exhibit large deviations after the .20 quantiles and although the
extended instrument set employed by the AB estimator alleviates this, there are
still deviations from normality at the .10 quantile.
On the other hand, the second stage GMM estimators show no dramatic devi-
ationsfromnormalityuptotheir.10quantile. The weighting scheme that ignores
the spatial correlation shows some deviations from normality at the .05 quantile
and hence the resultant estimator might not perform well in the usual hypothesis
tests. The weights based on
b
V
mix
and
b
V
E
perform better at the .05 quantile, with
the estimator based on
b
V
E
being marginally better than the one based on
b
V
mix
.
Nevertheless, for both weighting schemes there is still some sizedistortion of tests
based on the .05 and .95 quantiles. This is in line with finding of other studies that
looked at the performance of GMM estimators and found that often the use of
asymptotic distributions of the GMM estimators as a small sample guidance was
not satisfactory, suggesting the use of ML estimation (e.g. Binder et al. 2000).
Turning to the estimator of the spatial autocorrelation parameter ? in Tables
D3 and D4, it is remarkable that the spatial GM procedure works well even when
basedoninefficient initial estimators. The loss of efficiency in terms of RMSE is
for many parameter values negligible. Observe that with? =0, the simulations in
this study are comparable to those in Kapoor et al. (2005). To check whether this
is indeed the case, Figures 9-11 present the comparison of the values of RMSE for
the unweighted spatial GM estimator based on the true values of the disturbances
obtainedinthis simulationstudywiththe comparableRMSEvaluesreported inan
earlier draft of the Kapoor et al. paper. The value for W in the labels corresponds
120
to the type of the weighting matrix used and is the same as in the Tabled D1-D4.
121
6 Directions for Future Research
In this thesis I have concentrated on studying a specific model and deriving formal
results on the properties of the suggested estimation procedure under a particular
set of maintained assumptions. In the future this approach can obviously be ex-
tended along several dimensions.
Firstly, the model under consideration can be extended to include other ele-
ments. In particular it would be of interest to consider a spatial lag in the depen-
dent variable in addition to the spatial lag in the disturbance process.
Secondly, the estimation procedure under consideration can be altered. In this
respect it could be interesting to consider potentially more efficient estimation
procedures such as GMM estimators based on an extended set of moment con-
ditions as suggested by, for example Ahn and Schmidt (1995), or some form of
continuously updating GMM estimator.
Finally, the set of maintained assumptions can be made more general. Here
the first extension that can be tackled is to allow for the exogenous variables to be
stochastic.
122
A Appendix: Central Limit Theorem for Vectors of
Linear Quadratic Forms
For the convenience of the reader I firstgiveexplicitformulaeforthemeanand
covariances of linear quadratic forms. I focus on the case where the diagonal
elements of the quadratic forms are zero and the innovations have zero mean.
44
The following lemma is a special case of a Lemma A.1 in Kelejian and Prucha
(2005).
Lemma A1 Let?
N
=(?
1
,...,?
n
)
0
? (0,?
n
) where?
n
is diagonal and positive
definite, and let A
n
=(a
ij,n
) and B
n
=(b
ij,n
) be n?n nonstochastic symmetric
matrices where a
ii,n
= b
ii,n
=0.Leta
n
and b
n
be n ? 1 nonstochastic vec-
tors. Consider the decomposition ?
n
= P
n
P
0
n
,andlet?
n
=(?
1,n
,...,?
n,n
)
0
=
P
?1
n
?
n
. Then assuming that the elements of?
n
are independently distributed with
zero mean, variance one fourth moments E(?
4
i,n
)=?
(4)
?
i
we have
E(a
0
n
?
n
+?
0
n
A
n
?
n
)=0,
VC(a
0
n
?
n
+?
0
n
A
n
?
n
)=2tr(A
n
?
n
A
n
?
n
)+a
0
n
?
n
a
n
,
Cov(a
0
n
?
n
+?
0
n
A
n
?
n
,b
0
n
?
n
+?
0
n
B
n
?
n
)=2tr(A
n
?
n
B
n
?
n
)+a
0
n
?
n
b
n
.
44
In general the variance and covariance of quadratic forms will depend on the second, third and
fourth moments of the innovations. However, since we specialize to the case where the diagonal
elements of the quadratic forms are zero, the variance and covariance of the quadratic forms will
only depend on the second moments of the innovations.
123
For proof see Kelejian and Prucha (2005). The expressions also correspond to
those given in Kelejian and Prucha (2001). Obviously, in case A
n
and B
n
are not
symmetric the above formulae apply with A
n
and B
n
replaced by (A
n
+ A
0
n
)/2
and (B
n
+B
0
n
)/2.
For convenience of the reader, I next state a Central Limit Theorem (CLT) for
vectors of quadratic forms of triangular arrays based on Theorem A.1 in Kelejian
and Prucha (2005).
Let?
n
=(?
1,n
,...,?
n,n
)
0
be an n? 1 random vector, let
A
r,n
=(a
ij,r,n
)
i,j=1,...,n
, (A.1)
be nonstochastic matrices, and let b
r,n
=(b
1,r,n
,...b
n,r,n
)
0
be nonstochastic vec-
tors (r =1,...,m). Consider the following assumptions:
Assumption A1 The real valued random variables of the array {?
i,n
:1?i?n,
n ? 1} satisfy E?
i,n
=0. Furthermore, for each n ? 1 the random variables
?
1,n
,...,?
n,n
are totally independent.
Assumption A2 For r =1,...,mthe elements of the array of real numbers
{a
ij,r,n
:1?i,j ?n, n? 1} satisfy a
ij,r,n
= a
ji,r,n
and
45
sup
1?j?n,n?1
n
X
i=1
|a
ij,r,n
| <? .
45
The assumption of symmetry of the elements of A
n
is maintained w.l.o.g. since ?
0
n
A
n
?
n
=
?
0
n
[(A
n
+ A
0
n
)/2]?
n
.
124
The elements of the array of real numbers {b
i,r,n
:1?i?n, n? 1} satisfy
sup
n?1
n
?1
n
X
i=1
|b
i,r,n
|
2+?
1
<?
for some ?
1
> 0.
Note that a sufficient condition for Assumption A2 is that the row and column
sums of A
n
and the elements of b
n
are uniformly bounded in absolute value.
Assumption A3 For r =1,...,mwe assume that one of the following two con-
ditions holds.
(a) sup
1?i?n,n?1
E|?
i,n
|
2+?
2
<? for some ?
2
> 0 and a
ii,r,n
=0.
(b) sup
1?i?n,n?1
E|?
i,n
|
4+?
2
<? for some ?
2
> 0 (but possibly a
ii,r,n
6=0).
Consider the quadratic forms
q
r,n
= ?
0
n
A
r,n
?
n
+b
0
r,n
?
n
(A.2)
and definethevectoroflinearquadraticforms
q
n
=
?
?
?
?
?
?
q
1,n
.
.
.
q
m,n
?
?
?
?
?
?
, (A.3)
125
and let
?
q
n
= Eq
n
, (A.4)
?
q
n
= E(q
n
?Eq
n
)(q
n
?Eq
n
)
0
denote the mean vector and the variance covariance matrix ofq
n
.Then
?
q
n
=
?
?
?
?
?
?
?
q
1,n
.
.
.
?
q
m,n
?
?
?
?
?
?
, ?
q
n
=
?
?
?
?
?
?
?
q
11,n
??? ?
q
1m,n
.
.
.
.
.
.
.
.
.
?
q
m1,n
??? ?
q
mm,n
?
?
?
?
?
?
. (A.5)
where?
q
r,n
and?
q
rs,n
denote the mean ofq
r,n
and the covariance betweenq
r,n
and
q
s,n
, respectively, for r,s =1,...,m.WenowhavethefollowingCLT.
Theorem A1 Suppose Assumptions A1-A3 hold and n
?1
?
min
(?
q
n
) ?c for some
c>0.Let?
q
n
=
?
?
1/2
q
n
??
?
1/2
q
n
?
0
,then
?
?1/2
q
n
?
q
n
??
q
n
?
d
? N (0,I
m
).
Of course, the theorem remains valid, if all assumptions are assumed to hold
forn>n
0
where n
0
is finite. The above theorem can also be applied to situations
wheren = TNwithT finite andN ?? ; see footnote 13 in Kelejian and Prucha
(2001).
I now illustrate this in more detail. Suppose, we have sample sizes
T,2T,3T,...,NT,...,? as N ?? and the random variables are triangular ar-
126
rays is
?
1
=(?
11,1
,...,?
T1,1
)
0
(A.6)
?
2
=(?
11,2
,?
12,2
,...,?
T1,2
,?
T2,2
)
0
.
.
.
?
N
=(?
11,N
,...,?
1N,N
,?
21,N
,...,?
2N,N
,...,?
T1,N
,...,?
TN,N
)
0
,
Consider the sequence of vectors of linear quadratic forms and the vectors of
linear quadratic forms
v
N
=(v
1,N
,...,v
m,N
)
0
, (A.7)
with
v
r,N
= ?
0
N
A
r,TN
?
N
+b
0
r,TN
?
N
. (A.8)
As above, we denote by ?
v
N
and ?
v
N
the mean vector and variance covariance
matrix of the vector v
N
.
Supposethattherandomvariablescollectedin?
N
satisfyAssumptionsA1and
A3, and the sequences of matrices A
r,TN
and vectors b
r,TN
satisfy Assumption
A2.
127
We can define additional triangular arrays of sizes between tN and (t+1)N
to obtain a sequence
?
1
=(?
11,1
) (A.9)
?
2
=(?
11,1
,?
21,1
)
0
.
.
.
?
T
=(?
11,1
,...,?
T1,1
)
0
?
T+1
=(?
11,2
,...,?
T1,2
,?
12,2
)
0
(A.10)
?
T+2
=(?
11,2
,...,?
T1,2
,?
12,2
,?
12,2
,?
22,2
)
0
.
.
.
?
2T
=(?
11,2
,...,?
T1,2
,?
12,2
,...,?
T2,2
)
0
.
.
.
?
NT
=(?
11,N
,...,?
1N,N
,?
21,N
,...,?
2N,N
,...,?
T1,N
,...,?
TN,N
)
0
.
Observe that the new sequence ?
n
satisfies Assumptions A1 and A3 and that for
n = NT we have?
n
= ?
N
.
Similarly, we can extend the sequence of vectors of linear quadratic forms to
q
n
=(q
1,n
,...,q
m,n
)
0
, (A.11)
where
q
r,n
= ?
0
n
A
r,n
?
n
+b
0
r,n
?
n
, (A.12)
128
with
A
r,n
=
?
?
?
?
?
?
?
?
?
A
r,
[
n
T
]
T
0
[
n
T
]
T?1
??? 0
[
n
T
]
T?1
0
1?
[
n
T
]
T
a
11,r,
[
n
T
]
T+1
??? a
k1,r,
[
n
T
]
T+1
.
.
.
.
.
.
.
.
.
.
.
.
0
1?
[
n
T
]
T
a
1k,r,
[
n
T
]
T+1
??? a
kk,r,
[
n
T
]
T+1
?
?
?
?
?
?
?
?
?
, (A.13)
b
r,n
=
?
?
?
?
?
?
?
?
?
b
r,
[
n
T
]
T
b
1,r,
[
n
T
]
T+1
.
.
.
b
k,r,
[
n
T
]
T+1
?
?
?
?
?
?
?
?
?
,
andk = n?
?
n
T
?
T,whereIuse
?
r
s
?
to denote the whole part of a rational number
r
s
.
Observe that by definition forn = NT,wehaveq
n
= v
N
.Furthermore,since
A
r,n
and b
r,n
satisfy Assumption A2 for n = NT, it follows from the construc-
tion of A
r,NT
and b
r,NT
that they satisfy AssumptionA2foralln. As a result,
quadratic forms q
n
fulfill conditions of Theorem A1 and ?
?1/2
q
n
?
q
n
??
q
n
?
d
?
N (0,I
m
) as n ?? , where as before ?
q
n
and ?
q
n
denote the mean vector
and variance covariance matrix of the vector q
n
. Hence the sequence of distribu-
tion functions of?
?1/2
q
n
?
q
n
??
q
n
?
converges weakly to the distribution function
of N (0,I
m
). We now select a subsequence from the distribution functions of
?
?1/2
q
n
?
q
n
??
q
n
?
for n = NT (we treat T as a fixed constant) and observe that
theseareequivalenttothesequenceofdistributionfunctionsof?
?1/2
v
N
?
v
N
??
v
N
?
.
129
This subsequence must have the same limitand,asaconsequence,wehavethat
?
?1/2
v
N
?
v
N
??
v
N
?
d
? N (0,I
m
), (A.14)
as N ?? .
130
B Appendix: Proof of Claims in Chapter 3
Lemma B1 :Let?
j
, j ? N, be a sequence of totally independent real valued
random variables with E|?
j
|
p
? k
?
< ? for some 2 ? p<? .Leta
j
be
a sequence of real numbers such that
P
?
j=0
|a
j
| ? k
a
< ? .(a)Considerthe
random variables ?
m
=
P
m
j=0
a
j
?
j
, then there exists a random variable ?,which
we denote as
P
?
j=0
a
j
?
j
, such that ?
m
r
? ? for 0 <r? p. (b) Furthermore,
E|?|
r
?k
r
a
k
r/p
?
<? ,for0 <r?p.
Proof: To prove part (a) I first show that each?
m
has finitep-th absolute moments
and hence belongs to the L
p
space of random variables with finite absolute p-th
moments. I then demonstrate that the sequence ?
m
is a Cauchy sequence. By
invoking the completeness property of the L
p
space we will then have that the
limiting random variable ? also belongs to L
p
.Inowturntoeachofthesesteps
in detail.
Let p? 2 and 1/q +1/p =1. Then, using the triangle and H?lder?s inequali-
ties
?
?
?
?
?
m
X
i=1
a
i
?
i
?
?
?
?
?
?
m
X
i=1
|a
i
||?
i
| =
m
X
i=1
|a
i
|
1/q
|a
i
|
1/p
|?
i
| (B.1)
?
"
m
X
i=1
|a
i
|
#
1/q
"
m
X
i=1
|a
i
||?
i
|
p
#
1/p
? k
1/q
a
"
m
X
i=1
|a
i
||?
i
|
p
#
1/p
,
131
and further
E|?
m
|
p
= E
?
?
?
?
?
m
X
i=1
a
i
?
i
?
?
?
?
?
p
?k
p/q
a
m
X
i=1
|a
i
|E|?
i
|
p
(B.2)
? k
p/q
a
k
?
m
X
i=1
|a
i
|k
p/q+1
a
k
?
= k
p
a
k
?
<? ,
and hence each ?
m
belongs to L
p
.
I nowdemonstrate that the sequence?
m
is CauchyinL
p
, or in the terminology
of Shiryayev (1984, p.251) that it is fundamental in L
p
.Since
P
?
i=1
|a
i
| < ? it
follows from the Cauchy Test (Neylor and Sell, 1982, p.225) that for every ?>0
there exist and index N
?
such that
m+k
X
i=m+1
|a
i
| <?, (B.3)
for all m ? N
?
and k ? 0. Now choose some ?
?
> 0 and ? = ?
?
/(k
p
a
k
?
),thenby
argumentation analogous to above
E
?
?
?
m+k
??
m
?
?
p
= E
?
?
?
?
?
m+k
X
i=1
a
i
?
i
?
m
X
i=1
a
i
?
i
?
?
?
?
?
p
(B.4)
? k
p/q
a
k
?
m+k
X
i=m+1
|a
i
| ?k
p
a
k
?
? = ?
?,
for all m ? N
?
and k ? 0. Thus under the maintained assumptions the sequence
?
m
is Cauchy in L
p
. By Theorem 7 in Shiryayev (1984, p.258) we then have
that the sequence ?
m
converges in p-thmeantoarandomvariableinL
p
,which
132
implies that ? exists as a limit in p-th mean. Of course, since for r ?p
k?
m
??k
r
? k?
m
??k
p
, (B.5)
by Lyapunov?s inequality it follows that ?
m
converges to ? also in r-th mean for
0 <r?p.
To prove part (b) observe that from the above E|?|
r
? c for some c<? .
Hence E|?|
r
? (E|?|
p
)
r
p
?c
r
p
<? .
Lemma B2 :LetA
n
beasequenceofnonstochasticmatricesof dimensionsn?n
where n ?N such that max
1?i?n
P
n
j=1
|a
ij
| ? k
A
< ? . Consider a sequence of
n?1 random vectors?
n
, with elements?
i,n
that are real valued random variables
with E|?
i,n
|
p
? k
?
< ? for some 2 ? p<? . Then the elements of the random
vector ?
n
= A
n
?
n
have finite r-th moments with E|?
i,n
|
r
? k
r
A
k
r/p
?
< ? ,for
0 <r?p.
Proof: Letp? 2 and 1/q+1/p =1. Using the triangle and H?lder?s inequalities,
we have
|?
i,n
| =
?
?
?
?
?
n
X
j=1
a
ij,n
?
j,n
?
?
?
?
?
?
"
n
X
j=1
|a
ij,n
||?
j,n
|
#
=
"
n
X
j=1
|a
ij,n
|
1
q
|a
ij,n
|
1
p
|?
j,n
|
#
?
"
n
X
j=1
|a
ij,n
|
#1
q
"
n
X
j=1
|a
ij,n
||?
j,n
|
p
#1
p
. (B.6)
133
and further
E|?
i,n
|
p
?
"
n
X
j=1
|a
ij,n
|
#
p
q
"
n
X
j=1
|a
ij,n
|E|?
j,n
|
p
#
(B.7)
? k
p/q
A
n
X
j=1
|a
ij,n
|k
?
?k
p
A
k
?
.
Observe that by Lyapunov?s inequality for 0 <r?p,
k?
i,n
k
r
? k?
i,n
k
p
=[E|?
i,n
|
p
]
1/p
?k
A
k
1/p
?
, (B.8)
and hence E|?
i,n
|
r
= k?
i,n
k
r
r
?k
r
A
k
r/p
?
.
Lemma B3 : Suppose Assumptions 1, 2, 3 and 5 hold.
(a) Let ?
t,N
= X
t,N
? + u
t,N
,andlet?
it,N
denote the i-th element of ?
t,N
,
then
E|?
it,N
|
4+?
?k
?
<? ,
where k
?
does not depend i,t,N.
(b) The random vector
y
t,N
=
?
X
j=0
?
j
?
t?j,N
,
is well definedasthelimitofthefinite sums in quadratic means and there is?>0
such thatE|y
it,N
|
r
?k
y
<? for allr ? 4+?,wherek
y
does not dependi,t,N.
134
Proof: In the following let p =4+? with ? =min{?
?
,?
?
}.Ifirst prove part
(a). Denotingx
ikt,N
thek-th element ofx
it,N
, we have from Assumption 5(b) that
|x
ikt,N
| ?k
X
<? and thus
?
?
x
0
it,N
?
?
?
p
?k
p
X
(?
0
?)
p/2
<? . (B.9)
Next, from Assumptions 1 and 2, we have
E|?
it,N
|
p
? 2
p?1
?
E|?
it,N
|
p
+E
?
?
?
i,N
?
?
p
?
? 2
p?1
(k
?
+k
?
) <? , (B.10)
by inequality (1.4.3) in Bierens (1994). Now observe that u
t,N
= P
N
?
t,N
.By
Assumption 3(c) we have
max
i
N
X
j=1
p
ij,N
?k
P
<? , (B.11)
and hence by Lemma B2 we have E|u
it,N
|
p
?k
p
P
2
p?1
(k
?
+k
?
). Hence
E|?
it,N
|
p
? 2
p?1
??
E
?
?
x
0
it,N
?
?
?
p
+E|u
it,N
|
p
??
(B.12)
? 2
p?1
n
k
p
X
(?
0
?)
p/2
+k
p
P
2
p?1
(k
?
+k
?
)
o
<? ,
i.e., the p-th absolute moment of ?
it,N
is uniformly bounded by a finite constant
that does not depend i,t,N.
To prove part (b) observe that
P
?
i=0
|?|
i
=1/(1 ?|?|) < ? . Given part (a)
of the Lemma, part (b) now follows immediately from Lemma B1.
135
Equation(3.2.4): The vector of endogenous variables is defined by a stochas-
tic difference equation:
y
t,N
= ?y
t?1,N
+?
t,N
. (B.13)
From Lemma B3 above it immediately follows that the random variables
y
p
it
=
?
X
j=0
?
j
?
i,t?j
, (B.14)
y
p
i,t?1
=
?
X
j=0
?
j
?
i,t?1?j
,
are well defined as limits of the finite sums in quadratic means.
I now show that they are a particular solution. Substituting into the RHS of the
difference equation definingy
t,N
, we have (using Theorems 2.6 and 2.7 in Prucha,
2004):
?y
p
i,t?1,N
+?
it,N
= ?
?
X
j=0
?
j
?
i,t?1?j
+?
it,N
(B.15)
=
?
X
j=1
?
j
?
i,t?j
+?
it,N
=
?
X
j=0
?
j
?
i,t?j
= y
p
it,N
,
and hence y
p
t,N
is a particular solution. The homogeneous part of the difference
equation is
y
h
t,N
??y
h
t?1,N
=0. (B.16)
136
and its solution is of the form y
h
t,N
= ??
t+m
,where? is a N ? 1 vector of
(finite) constants and ?m is the starting point of the process. Since I assume
that the process has started in an infinite past (m =+? ), we have that y
h
t,N
=
lim
m??
??
t+m
=0and, as a result, the unique solution is
y
t,N
= y
p
t,N
+y
h
t,N
=
?
X
j=0
?
j
?
t?j,N
.
Substituting for the definition of ?
t?j,N
and utilizing Theorem 2.6 in Prucha
(2004) yields
y
t,N
=
?
X
j=0
?
j
?
t?j,N
(B.17)
=
?
X
j=0
?
j
(X
t?j,N
? +P
N
?
t?j,N
+P
N
?
N
)
=
?
X
j=0
?
j
(X
t?j,N
? +P
N
?
t?j,N
)+
?
X
j=0
?
j
P
N
?
N
=
?
X
j=0
?
j
(X
t?j,N
? +P
N
?
t?j,N
)+(1??)
?1
P
N
?
N
.
The claim in Chapter 3 then follows from specializing the above expression for
t =0.
Equation (3.2.5): By Lemma B3 we have that
y
0,N
=
?
X
j=0
?
j
?
t?j
, (B.18)
137
with E
?
?
2
it,N
?
? k
?
< ? and |?| < 1. Using Theorems 2.6 and 2.7 in Prucha
(2004) we can write
y
0,N
=
?
X
j=0
?
j
?
t?j
(B.19)
=
?
X
j=0
?
j
(X
?j,N
? +u
t,N
)
=
?
?
X
j=0
?
j
X
?j,N
?
!
+
?
?
X
j=0
?
j
u
t,N
!
=
?
?
X
j=0
?
j
X
?j,N
?
!
+
?
?
X
j=0
?
j
P
N
?
?j,N
!
+
?
?
X
j=0
?
j
P
N
?
N
!
= c
N
+P
N
?
N
+(1??)
?1
P
N
?
N
,
wherec
N
isnonstochasticandthevectorsofrandomvariablesare?
N
=
P
?
j=0
?
j
?
?j,N
and ?
N
. Notice that by Lemma B1 the random variable?
N
is well defined. From
Assumption 1 and Theorem 2.2 in Prucha (2004) we have that
E(?
N
)=
?
X
j=0
?
j
E(?
?j,N
)=0
N?1
(B.20)
and
VC(?
N
)=E(?
N
?
0
N
)=
?
X
j=0
?
?
2
?
j
?
2
?
I
N
(B.21)
= ?
2
?
?
1??
2
?
?1
I
N
.
138
Furthermore,
VC(P
N
?
N
)=E(P
N
?
N
?
0
N
P
0
N
)=P
N
E(?
N
?
0
N
)P
0
N
= ?
2
?
?
1??
2
?
?1
P
N
P
0
N
. (B.22)
Observe that by Assumption 2, the random variables P
N
?
N
and?
N
are inde-
pendent. Thus we have
VC(y
0,N
)=VC(P
N
?
N
)+VC
?
(1??)
?1
P
N
?
N
?
(B.23)
= ?
2
?
?
1??
2
?
?1
P
N
P
0
N
+(1??)
?2
P
N
VC(?
N
)P
0
N
=
?
?
2
?
1??
2
+
?
2
?
(1??)
2
?
P
N
P
0
N
.
139
C Appendix: Proofs for Chapter 4
I will make repeated use of the following facts:
Lemma C1 Let C = A + B be square real valued symmetric matrices of same
dimensions. Then
?
min
(C) ??
min
(A)+?
min
(B).
For proof see, e.g., Rao and Rao (1998), Proposition 10.1.1.
Lemma C2 Let A and B be n?m and n?n matrices. If B is symmetric then
?
min
(A
0
BA) ??
min
(A
0
A) ??
min
(B).
Proof: By Rayleigh-Ritz Theorem (see, e.g. Proposition 4.2.2 in Horn and John-
son 1985) we have that the smallest eigenvalue of a symmetric matrix can be
obtained as:
?
min
(C)=inf
?6=0
h
(?
0
?)
?1
(?
0
C?)
i
=inf
?;?
0
?=1
(?
0
C?). (C.0.1)
SinceBissymmetric, wecandecomposeitasB = U
0
?UwhereUisorthog-
onal and ? = diag(?
1
,..,?
n
) is diagonal with eigenvalues of B on the diagonal
140
(cp. Proposition 52 in Dhrymes 1984). Hence we have
?
min
(A
0
BA)=?
min
(A
0
U
0
?UA) (C.0.2)
=inf
?;?
0
?=1
[?
0
A
0
U
0
?UA?]
? inf
j
?
j
inf
?;?
0
?=1
[?
0
A
0
U
0
UA?]
= ?
min
(B) ? inf
?;?
0
?=1
[?
0
A
0
A?]
= ?
min
(B) ??
min
(A
0
A).
Lemma C3 Let a
n
, and b
n
be sequences of n? 1 vectors and C
n
be a sequence
of n?nmatrices. Suppose that the elements ofa
n
andb
n
are uniformly bounded
in absolute value, and that the matrixC
n
has uniformly bounded absolute row (or
column) sums. Then n
?1
a
0
n
C
n
b
n
is uniformly bounded in absolute value.
Proof: Denote the uniform bounds of the elements of the vectorsa
n
andb
n
as k
a
and k
b
and the uniform bound of the absolute row sums of the matrices C
n
as k
c
.
We have by the triangle inequality
n
?1
|a
0
n
C
n
b
n
| = n
?1
?
?
?
?
?
n
X
i=1
n
X
i=1
a
i,n
c
ij,n
b
j,n
?
?
?
?
?
?n
?1
n
X
i=1
n
X
i=1
|a
i,n
||c
ij,n
||b
j,n
|
? n
?1
n
X
i=1
n
X
i=1
k
a
|c
ij,n
|k
b
= k
a
k
b
n
?1
n
X
i=1
n
X
i=1
|c
ij,n
| (C.0.3)
? k
a
k
b
n
?1
n
X
i=1
k
c
= k
a
k
b
k
c
<? .
141
C.1 Proofs for Section 4.1
ProofofLemma1:By backward substitution we can eliminate lagged dependent
variables and express y
?2
as a function of lagged disturbance terms and lagged
explanatory variables. From (3.2.2), we have that y
?2,N
is
y
?2,N
=
?
?
?
?
?
?
?
?
?
y
0,N
y
1,N
.
.
.
y
T?2,N
?
?
?
?
?
?
?
?
?
=
?
?
?
?
?
?
?
?
?
y
0,N
X
1,N
?+u
1,N
+?y
0,N
.
.
.
P
T?3
j=0
?
j
[X
T?2?j,N
?+u
T?2?j,N
]+?
T?2
y
0,N
?
?
?
?
?
?
?
?
?
=(?
0
?I
N
)
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
0
N?p
X
1,N
.
.
.
X
T?2,N
?
?
?
?
?
?
?
?
?
?+
?
?
?
?
?
?
?
?
?
y
0,N
u
1,N
.
.
.
u
T?2,N
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
, (C.1.4)
where
? =
?
?
?
?
?
?
?
?
?
1 ? ??? ?
T?2
01
.
.
.
.
.
.
.
.
.
?
0 ??? 01
?
?
?
?
?
?
?
?
?
. (C.1.5)
142
Next I express
?
y
0
0,N
,u
0
1,N
,...,u
0
T?2,N
?
0
as a linear form of
?
N
=
?
?
N
,?
N
,?
0
1,N
,...,?
0
T,N
?
0
. (C.1.6)
Observethatu
t,N
= P
N
(?
t,N
+?
N
)andfromequation(3.2.4),y
0,N
= E(y
0,N
)+
P
N
h
?
N
+
?
N
1??
i
, with?
N
=
P
?
j=0
?
j
?
?j,N
well defined by Lemma B1. Therefore,
?
y
0
0,N
,u
0
1,N
,...,u
0
T?2,N
?
0
(C.1.7)
=
?
?
?
?
?
?
[1??]
?1
1 0
1?T?2
0
1?2
1
T?2?1
0
T?2?1
I
T?2
0
T?2?2
?
?
?
T?1?T+2
?P
N
?
?
?
?
N
+
?
E
?
y
0
0,N
?
,0
1?(T?2)N
?
0
.
Hence with the notationX
?2,N
=
?
0
0
N?p
,X
0
1,N
,...,X
0
T?2,N
?
0
we have
y
?2,N
=(?
0
?I
N
)
n
X
?2,N
? +
?
E
?
y
0
0,N
?
,0
T?2?1
?
0
o
(C.1.8)
+
?
?
?
?
0
?
?
?
[1??]
?1
1 0
1?T?2
0
1?2
1
T?2?1
0
T?2?1
I
T?2
0
T?2
?
?
?
?P
N
?
?
?
?
N
.
Therefore, given that
?u
N
=
??
0
(T?1)?2
,D
?
?P
N
?
?
N
, (C.1.9)
we can express y
0
?2,N
?u
N
as a function of the model disturbances and explana-
143
tory variables:
y
0
?2,N
?u
N
(C.1.10)
=
?
?
0
X
0
?2,N
+
?
E
?
y
0
0,N
?
,0
T?2?1
??
(??I
N
)
??
0
(T?1)?2
,D
?
?P
N
?
?
N
+?
0
N
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
1
1??
1
1?T?2
1
1?1
0
1?T?2
0
T?2?1
I
T?2
0
2?1
0
2?T?2
?
?
?
?
?
?
?
?
?
?
?
0
(T?1)?2
,D
?
?P
0
N
P
N
?
?
?
?
?
?
?
?
?
?
N
= f
0
N
(I
T+2
?P
N
)?
N
+?
0
N
(F?P
0
N
P
N
)?
N
,
where
f
0
N
=
?
?
0
X
0
?2,N
+
?
E
?
y
0
0,N
?
,0
T?2?1
??
(??I
N
)
??
0
(T?1)?2
,D
?
?I
N
?
(C.1.11)
and
F
T+2?T+2
=
?
?
?
?
?
?
?
?
?
1
1??
1
1?T?2
1
1?1
0
1?T?2
0
T?2?1
I
T?2
0
2?1
0
2?T?2
?
?
?
?
?
?
?
?
?
?
?
0
(T?1)?2
,D
?
(C.1.12)
The expression for ?X
0
N
?u
N
follows from a trivial substitution of ?u
N
=
??
0
(T?1)?2
,D
?
?P
N
?
?
N
.
ProofofLemma2: Toobtaintheexpectedvalueandvarianceofthetwoquadratic
144
forms, I use the expression from Lemma 1:
y
0
?2,N
?u
N
= f
0
N
[I
T+2
?P
N
]?
N
+?
0
N
(F?P
0
N
P
N
)?
N
, (C.1.13)
and
?X
0
N
?u
N
=?X
0
N
??
0
(T?1)?2
,D
?
?P
N
?
?
N
, (C.1.14)
where ?
N
=
?
?
0
N
,?
0
N
,?
0
1,N
,...,?
0
T,N
?
0
is a vector of independent zero mean ran-
dom variables with uniformly bounded fourth moments. Next I verify that as-
sumptions of Lemma A1 in Appendix A are satisfied. Given Assumption 1 and 2,
it remains to be verified that diagonal elements of (F?P
0
N
P
N
) are zero. Observe
that from Lemma 1 we have F = A?B,where
A
(T+2)?(T?1)
=
?
?
?
?
?
?
?
?
?
1
1??
1
1?(T?2)
1
1?1
0
1?(T?2)
0
(T?2)?1
I
T?2
0
2?1
0
2?(T?2)
?
?
?
?
?
?
?
?
?
, B
(T?1)?(T+2)
=
?
0
(T?1)?2
,D
?
.
(C.1.15)
The diagonal elements of F are then
F
ii
= {A?B}
ii
=
T?1
X
j=1
A
ij
{?B}
ji
(C.1.16)
=
T?1
X
j=1
A
ij
T?1
X
k=1
?
jk
B
ki
,
145
where A
ij
andB
ij
denote the ij?thelements of matricesAandBrespectively.
Note thatB
ki
=0fork<i+2and?
jk
=0fork<j, and, therefore, {?B}
ji
=
P
T?1
k=1
?
jk
B
ki
=0for i<j+2.
46
Furthermore, the elements A
ij
are zero for
i>j+1and hence F
ii
=
P
T?1
j=1
A
ij
{?B}
ji
=0.
Hence I can use Lemma A1 to derive the mean and variances and covariances
of y
0
?2,N
?u
N
and?X
0
N
?u
N
. In particular, we have that
E
?
y
0
?2,N
?u
N
?
= E(?X
0
N
?u
N
)=0, (C.1.17)
and
VC
?
y
0
?2,N
?u
N
?
= f
0
N
(?
?,N
?P
N
P
0
N
)f
N
(C.1.18)
+2tr
?
F
S
?
?,N
F
S
?
?,N
?P
0
N
P
N
P
0
N
P
N
?
= f
0
N
(?
?,N
?P
N
P
0
N
)f
N
+?
N
,
VC(?X
0
N
?u
N
)=?X
0
N
??
0
(T?1)?2
,D
?
?I
N
?
(?
?,N
?P
N
P
0
N
)?
h
?
0
(T?1)?2
,D
?
0
?I
N
i
?X
N
. (C.1.19)
46
Notethatthebothmatrices?andDareupperdiagonal(inthesensethattheirij?thelements
are zero for i<j) and hence their?Dproduct also has the same property. As a result, the matrix
?B =
?
0
(T?1)?2
,?D
?
will have its ij ?th elements equal to zero for i<j+2.
146
and finally
Cov
?
y
0
?2,N
?u
N
,?X
0
N
?u
N
?
= f
0
N
h
?
?,N
?
0
(T?1)?2
,D
?
0
?P
0
N
i
?X
N
,
(C.1.20)
where I defined
?
N
=2tr
?
F
S
?
?,N
F
S
?
?,N
?P
0
N
P
N
P
0
N
P
N
?
(C.1.21)
=2tr
?
F
S
?
?,N
F
S
?
?,N
?
?tr(P
0
N
P
N
P
0
N
P
N
).
Together we have that
V
N
=
?
?
?
VC
?
y
0
?2,N
?u
N
?
Cov
?
y
0
?2,N
?u
N
,?X
0
N
?u
N
?
Cov
?
y
0
?2,N
?u
N
,?X
0
N
?u
N
?
0
VC(?X
0
N
?u
N
)
?
?
?
= S
0
N
(?
?,N
?P
N
P
0
N
)S
N
+
?
?
?
?
N
0
1?p
0
p?1
0
p?p
?
?
?
, (C.1.22)
where S
N
=
?
f
N
,
h
?
0
(T?1)?2
,D
?
0
?I
N
i
?X
N
?
.
Proof of Lemma 3: From Lemma C1, we have that
?
min
(V
N
) ??
min
[S
0
N
(?
?,N
?P
N
P
0
N
)S
N
]+min(?
N
,0). (C.1.23)
Note that since?
?
issymmetric,byProposition52inDhrymes(1984)wecan
147
express it as?
?
= ?
0
?.Hence
tr
?
F
S
?
?
F
S
?
?
?
= tr
?
F
S
?
0
?F
S
?
0
?
?
= tr
?
?F
S
?
0
?F
S
?
0
?
= tr(A
0
A) ? 0
with A = ?F
S
?
0
,sinceF
S
is also symmetric. Therefore,
tr
?
F
S
?
?,N
F
S
?
?,N
?
? 0. (C.1.24)
Furthermore,
tr(P
0
N
P
N
P
0
N
P
N
)=tr[(P
0
N
P
N
)(P
0
N
P
N
)] ? 0, (C.1.25)
and, therefore, ?
N
? 0.
By Lemma C2 the smallest eigenvalue of V
N
is then
?
min
(V
N
) ? ?
min
[S
N
(?
?,N
?P
N
P
0
N
)S
0
N
] (C.1.26)
? ?
min
(S
0
N
S
N
) ??
min
(?
?,N
?P
N
P
0
N
).
From Theorem 4.2.12 in Horn and Johnson (1991) we have
?
min
(?
?,N
?P
N
P
0
N
)=?
min
(?
?,N
) ??
min
(P
N
P
0
N
), (C.1.27)
148
and hence
[(T ?1)N]
?1
?
min
(V
N
) ? [(T ?1)N]
?1
?
min
(S
0
N
S
N
) ? (C.1.28)
??
min
(?
?,N
) ??
min
(P
N
P
0
N
)
= ?
min
?
[(T ?1)N]
?1
S
0
N
S
N
?
?
??
min
(?
?,N
) ??
min
(P
N
P
0
N
).
By Assumptions 4 we have that ?
min
(P
N
P
0
N
) ?c
P
> 0,byAssumptionIV2
we have that ?
min
?
[(T ?1)N]
?1
S
0
N
S
N
?
? c
S
> 0.Since?
?,N
is diagonal, we
have ?
min
(?
?,N
)=min
?
?
2
?
,var
?
?
i,N
?
,?
2
?
?
=min
h
?
2
?
,
?
2
?
1??
2
,?
2
?
i
? c
?
> 0
and hence [(T ?1)N]
?1
?
min
(V
N
) ?c
S
c
?
c
P
> 0.
ProofofProposition1:The result in the Proposition is a special case of the
general result in Lemma 5 in Section 4.3,
47
whichisinturnbasedontheCLTin
TheoremA1inAppendixA.HereIverifydirectlythattheconditionsofTheorem
A1 hold.
47
The conditions of that Lemma are satisfiedsincebyLemma1(andalsoLemma4inSection
4.3), the instruments y
?2,N
and ?X
N
are linear forms in the innovations of the form assumed
in Lemma 5. Furthermore, by Lemma 3, the smallest eigenvalue V
N
is uniformly bounded away
from zero. Finally, the moment conditions are valid since by Lemma 2, we have E (H
0
N
?u
N
)=
0. Therefore, conditions of Lemma 5 are satisfied and we have thatV
?1/2
N
H
0
N
?u
N
d
? N (0,I).
149
The moment conditions are
H
0
N
?u
N
=
?
?
?
?
?
?
H
2,N
.
.
.
H
T,N
?
?
?
?
?
?
0
?u
N
=
?
?
?
?
?
?
(y
0,N
,?X
2,N
)
.
.
.
(y
T?2,N
,?X
T,N
)
?
?
?
?
?
?
0
?u
N
(C.1.29)
=
?
?
?
?
?
?
?
?
?
?
?
?
y
0,N
.
.
.
y
T?2,N
?
?
?
?
?
?
,
?
?
?
?
?
?
?X
2,N
.
.
.
?X
T,N
?
?
?
?
?
?
?
?
?
?
?
?
0
?u
N
=
?
?
?
y
0
?2,N
?u
N
?X
0
N
?u
N
?
?
?
Observe that by Lemma 1, the instruments y
?2,N
and ?X
N
are linear forms
in the innovations and, as a result, the moment conditions collected in H
0
N
?u
N
are linear quadratic form in the innovations
?
N
=
?
?
0
N
,?
0
N
,?
0
1,N
,...,?
0
T,N
?
, (C.1.30)
where ?
N
=
P
?
j=0
?
j
?
?j,N
. By Assumptions 1 and 6 it follows from Lemma B1
in Appendix B that the random variable?
N
satisfiesconditionA3inAppendixA.
Therefore, by Assumptions 1 and 2, the elements of the innovations ?
N
satisfy
conditions A1 and A3 in Appendix A.
By Lemma 2, the variance covariance matrix of the moment conditions col-
lectedinH
0
N
?u
N
isV
N
andbyLemma3,thesmallesteigenvalueof[(T ?1)N]
?1
V
N
is uniformly bounded away from zero. Hence it remains to be shown that the lin-
ear quadratic forms collected in H
0
N
?u
N
satisfy condition A2 in Appendix A.
150
Note that from Lemma 1 we have that the elements ofH
0
N
?u
N
are
y
0
?2,N
?u
N
= f
0
N
(I
T+2
?P
N
)?
N
+?
0
N
(F?P
0
N
P
N
)?
N
, (C.1.31)
and
?X
0
N
?u
N
=?X
0
N
??
0
(T?1)?2
,D
?
?P
N
?
?
N
. (C.1.32)
Observe that any finite sum, product or Kronecker product of matrices with row
and column sums uniformly bounded in absolute value will also have row andcol-
umn sums uniformly bounded in absolute value; see Kelejian and Prucha (2001d)
for details.
From Lemma 1, we have that
f
0
N
=
?
?
0
X
0
?2,N
+
?
E
?
y
0
0,N
?
,0
1?(T?2)N
???
?
?
0
(T?1)?2
,D
?
?I
N
?
. (C.1.33)
Elements and dimensions of?
?
0
(T?1)?2
,D
?
do not depend onN and hence triv-
ially
?
?
?
0
(T?1)?2
,D
?
?I
N
?
hasrowandcolumn sumsuniformlyboundedinab-
solute value. Elements of the vector ?
0
X
0
?2,N
are uniformly bounded in absolute
valuebyAssumption5andelementsof
?
E
?
y
0
0,N
?
,0
1?(T?2)N
?
are uniformly
bounded in absolute value since, as demonstrated by Lemma B3 in Appendix
B, y
it
has uniformly bounded 4+? moments for some ?>0. Together we then
have that f
N
haselementsuniformly bounded in absolute value. The sequence
of matrices P
N
has row and column sums uniformly bounded in absolute value
(Assumption 3) and hence elements off
0
N
(I
T+2
?P
N
) are uniformly bounded in
151
absolute value. Similarly, by Assumptions 5 and 3, ?X
0
N
??
0
(T?1)?2
,D
?
?P
N
?
has row and column sums uniformly bounded in absolute value. Finally, since
dimensions of F do not change with N and its elements are also independent
of N, the matrix (F?P
0
N
P
N
) has row and column sums uniformly bounded in
absolute value.
This completes the verification of conditions of Theorem A1 and, therefore,
we have that V
?1/2
N
H
0
N
?u
N
d
? N (0,I).
Proof of Theorem 1: From equation (4.1.10) we have
p
(T ?1)N
?
b
?
N
??
?
(C.1.34)
=
p
(T ?1)N
h
?Z
0
N
H
N
(H
0
N
H
N
)
?1
H
0
N
?Z
N
i
?1
?
?Z
0
N
H
N
(H
0
N
H
N
)
?1
H
0
N
?u
N
=
p
(T ?1)N
"
?Z
0
N
H
N
(T ?1)N
?
H
0
N
H
N
(T ?1)N
?
?1
H
0
N
?Z
N
(T ?1)N
#
?1
?
?Z
0
N
H
N
(T ?1)N
?
H
0
N
H
N
(T ?1)N
?
?1
H
0
N
?u
N
(T ?1)N
=
"
?Z
0
N
H
N
(T ?1)N
?
H
0
N
H
N
(T ?1)N
?
?1
H
0
N
?Z
N
(T ?1)N
#
?1
?
?Z
0
N
H
N
(T ?1)N
?
H
0
N
H
N
(T ?1)N
?
?1
H
0
N
?u
N
p
(T ?1)N
.
Given Assumptions IV1 and IV3, our result follows from Proposition 1 in this
thesisandCorollary5inP?tcherandPrucha(2001).
152
C.2 Proofs for Section 4.2
I now give a sequence of Lemmas that will be used to prove Theorem 2. I use the
notation k.k to denote the matrix norm kMk := [tr(M
0
M)]
1/2
.
Lemma C4 Let bu
N
be based on a N
1/2
consistent estimate of ?. Then under
Assumptions 1-6 we can write
u
N
?bu
N
= D
N
?
N
.
where the random matrix D
N
has elements d
ij,N
that have uniformly bounded
absolute 4+? moments for some?>0,i.e.E|d
ij,N
|
4+?
?c
d
<? wherec
d
does
not depend on N, and the random vector?is such that N
1/2
k?
N
k = O
p
(1).
Proof: Note that from (4.2.1) we can write u
t,N
?bu
t,N
as
u
t,N
?bu
t,N
=(y
t?1,N
,X
t,N
)
?
??
b
?
N
?
, (C.2.1)
Idefine D
t,N
=(y
t?1,N
,X
t,N
) and?
N
=
?
??
b
?
N
?
. Hence we have
u
N
?bu
N
= D
N
?
N
, (C.2.2)
where D
N
=
?
D
0
1,N
,...,D
0
T,N
?
0
.
Since
b
?
N
is
?
N consistent, it follows that N
1/2
k?
N
k = O
p
(1). By Lemma
B3, elements of y
t?1,N
have finite 4+? absolute moments for some ?>0.
153
The nonstochastic elements of D
N
are uniformly bounded in absolute value by
Assumption 5 and hence also their 4+? power is uniformly bounded in absolute
value. ThusD
N
hasuniformlyboundedabsolute 4+?momentsforsome?>0.
Note that the claim in the above lemma also holds for 2+? moments since by
Lyapunov?s inequality,
E|y
i,t?1,N
|
2+?
?
h
E|y
i,t?1,N
|
(4+?)
i
(2+?)/(4+?)
?k
(2+?)/(4+?)
y
<? . (C.2.3)
Lemma C5 Given Assumptions 1-6, the moment conditions converge to their ex-
pectations in probability, i.e.
?
j
kl,N
?E
?
?
j
kl,N
?
p
? 0 and ?
j
k,N
?E
?
?
j
k,N
?
p
? 0
as N ?? for j =0,1, k,l =1,2,3.
Proof: The moment conditions correspond to those considered in Kapoor et al.
(2005) and, in particular, Assumptions 1,2 and 4 of their paper are satisfied,
48
and
hence the lemma is their Lemma A2.
Lemma C6 The sample counterparts of the moment conditions converge in prob-
ability to the true moments, i.e.
g
j
kl,N
?E
?
?
j
kl,N
?
p
? 0 and g
j
k,N
?E
?
?
j
k,N
?
p
? 0
48
Assumption 1 is directly implied by our Assumptions 1 and 2. Assumptions 2 and 4 are
contained in our Assumption 3.
154
as N ?? for j =0,1, k,l =1,2,3.
Proof: InlightofLemmaC5,itsuffices to show that g
j
kl,N
? ?
j
kl,N
p
? 0 and
g
j
k,N
??
j
k,N
p
? 0. These can be expressed as quadratic forms:
g
j
kl,N
??
j
kl,N
=
1
N
?
bu
0
N
C
j
kl,N
bu
N
?u
0
N
C
j
kl,N
u
N
?
, (C.2.4)
g
j
k,N
??
j
k,N
=
1
N
?
bu
0
N
C
j
k,N
bu
N
?u
0
N
C
j
k,N
u
N
?
,
where the the NT ? NT matrices C
j
kl,N
and C
j
k,N
are defined for j =0,1, k =
1,2,3 and l =1,2. Explicit expressions are given below. Note that for l =3we
have (see 4.2.10 and 4.2.13):
g
j
13,N
= ?
j
13,N
=1, (C.2.5)
g
j
23,N
= ?
j
23,N
= N
?1
tr(W
0
N
W
N
),
g
j
33,N
= ?
j
33,N
=0,
and hence trivially g
j
k3,N
??
j
k3,N
p
? 0 for j =0,1 and k =1,2,3.
Forj =0,1,k =1,2,3andl =1,2,theC
j
kl,N
andC
j
k,N
matricesareproducts
of (some of) the matrices (I
T
?W
0
N
), Q
j,N
,and(I
T
?W
N
). In particular, from
155
(4.2.10) and (4.2.13), j =0,1:
C
j
11,N
=2(T ?1)
j?1
Q
j,N
(I
T
?W
N
), (C.2.6)
C
j
12,N
= ?(T ?1)
j?1
(I
T
?W
0
N
)Q
j,N
(I
T
?W
N
),
C
j
21,N
=2(T ?1)
j?1
(I
T
?W
0
N
)(I
T
?W
0
N
)Q
j,N
(I
T
?W
N
),
C
j
22,N
= ?(T ?1)
j?1
(I
T
?W
0
N
)(I
T
?W
0
N
)Q
j,N
(I
T
?W
N
)(I
T
?W
N
),
C
j
31,N
=(T ?1)
j?1
Q
j,N
(I
T
?W
N
)(I
T
?W
N
)
+(T ?1)
j?1
(I
T
?W
0
N
)Q
j,N
(I
T
?W
N
),
C
j
32,N
= ?(T ?1)
j?1
(I
T
?W
0
N
)Q
j,N
(I
T
?W
N
)(I
T
?W
N
),
C
j
1,N
=(T ?1)
j?1
Q
j,N
,
C
j
2,N
=(T ?1)
j?1
(I
T
?W
0
N
)Q
j,N
(I
T
?W
N
),
C
j
3,N
=(T ?1)
j?1
Q
j,N
(I
T
?W
N
).
By their definition (see equation 4.2.5), the row and column sums of the Q
j,N
matrices (j =0,1) are less than two in absolute value.
49
The row and column
sums of (I
T
?W
N
) and (I
T
?W
0
N
) are uniformly bounded in absolute value by
Assumption 3. Therefore, for j =0,1, k =1,2,3 and l =1,2,eachC
j
kl,N
and
C
j
k,N
matrix has row and column sums uniformly bounded in absolute value.
49
The row and column sums of |Q
0,N
| are equal to 2
T?1
T
, while the row and column sums of
|Q
1,N
| are equal to one.
156
By Lemma C4 we have u
N
?bu
N
= D
N
?
N
. Utilizing this expression I can
write for j =0,1, k =1,2,3 and l =1,2:
g
j
kl,N
??
j
kl,N
= ?
j
kl,N
+?
j
kl,N
, (C.2.7)
g
j
k,N
??
j
k,N
= ?
j
k,N
+?
j
k,N
,
with
?
j
kl,N
=
1
N
?
0
N
1?(p+1)
D
0
N
(p+1)?NT
?
C
j
kl,N
+C
j0
kl,N
?
NT?NT
u
N
NT?1
, (C.2.8)
?
j
k,N
=
1
N
?
0
N
1?(p+1)
D
0
N
(p+1)?NT
?
C
j
k,N
+C
j0
k,N
?
NT?NT
u
N
NT?1
,
?
j
kl,N
=
1
N
?
0
N
1?(p+1)
D
0
N
(p+1)?NT
C
j
kl,N
NT?NT
D
N
NT?(p+1)
?
N
(p+1)?1
,
?
j
k,N
=
1
N
?
0
N
1?(p+1)
D
0
N
(p+1)?NT
C
j
k,N
NT?NT
D
N
NT?(p+1)
?
N
(p+1)?1
.
To prove the claim, I show that all the terms ?
j
kl,N
, ?
j
k,N
, ?
j
kl,N
and ?
j
k,N
are
all o
p
(1). To simplify notation, I consider a sequence of NT ?NT matrices C
N
that have row and column sums uniformly bounded in absolute value. I define
?
N
=
1
N
?
0
N
1?(p+1)
D
0
N
(p+1)?NT
?
C
j
N
+C
0
N
?
NT?NT
u
N
NT?1
(C.2.9)
=
1
N
?
0
N
1?(p+1)
D
0
N
(p+1)?NT
?
C
j
N
+C
0
N
?
NT?NT
(I
T
?P
N
)
NT?NT
v
N
NT?1
,
?
N
=
1
N
?
0
N
1?(p+1)
D
0
N
(p+1)?NT
C
N
NT?NT
D
N
NT?(p+1)
?
N
(p+1)?1
,
157
and show that both ?
N
,and?
N
are o
p
(1). By substituting C
N
= C
j
kl,N
for
j =0,1, k =1,2,3 andl =1,2,andC
N
= C
j
k,N
fork =1,2,3 andj =0,1,we
then obtain that ?
j
kl,N
, ?
j
k,N
, ?
j
kl,N
and ?
j
k,N
are all o
p
(1).
Observe that ?
N
and ?
N
correspond to ?
N
and ?
N
in the proof of Lemma
C.1 in Kelejian and Prucha (2005), with C
n
= C
N
, A
n
= C
N
(I
T
?P
N
) and
?
n
= v
N
. Inspection of their proof of ?
N
= o
p
(1) and ?
N
= o
p
(1) reveals that
it only utilizes Assumption 4 of that paper, the fact that the matrices C
n
and A
n
have uniformly bounded absolute row and column sums and that n
?1
P
n
i=1
?
i,n
=
O
p
(1).
I assume that the row and column sums of C
N
are uniformly bounded in ab-
solute value. Given Lemma C4, Assumption 4 in that paper holds and hence ?
N
is by their proof o
p
(1). Note that by Assumption 3,C
N
(I
T
?P
N
) has uniformly
bounded absolute row and column sums. Instead of ?
i,n
, I consider the random
variables v
it,N
= ?
it,N
+?
i,N
. By the triangle inequality
(NT)
?1
N
X
i=1
T
X
t=1
|v
ij,N
| ? (NT)
?1
N
X
i=1
T
X
t=1
|?
it,N
| +(NT)
?1
N
X
i=1
T
X
t=1
?
?
?
i,N
?
?
=(NT)
?1
N
X
i=1
T
X
t=1
|?
it,N
| +N
?1
N
X
i=1
?
?
?
i,N
?
?
.(C.2.10)
SincebyAssumption1,therandomvariables?
it,N
areindependentwithuniformly
bounded second moments, it follows that (NT)
?1
P
N
i=1
P
T
t=1
|?
it,N
| = O
p
(1).
Similarly, by Assumption 2, the random variables ?
i,N
are independent with uni-
formly bounded second moments, and hence it follows that N
?1
P
N
i=1
?
?
?
i,N
?
?
=
158
O
p
(1). Asaresult(NT)
?1
P
N
i=1
P
T
t=1
|v
ij,N
| = O
p
(1). Hence the proof that
?
N
= o
p
(1) in Kelejian and Prucha (2005) also applies for the structure consid-
ered in this thesis and ?
N
= o
p
(1).
Proof of Theorem 2: Given Lemma C6, the proof is identical to the proof of
Theorem 2 in Kapoor et al. (2001).
C.3 Proofs for Section 4.3
Proof of Lemma 4: The dependent variable can be expressed as in equation
(3.2.4):
y
t,N
=
?
X
j=0
?
j
[X
t?j,N
?+u
t?j,N
] (C.3.1)
=
?
X
j=0
?
j
X
t?j,N
? +
t?1
X
j=0
?
j
u
t?j,N
+
?
X
j=0
?
t+j
u
?j,N
=
?
?
X
j=0
?
j
X
t?j,N
?
!
+P
N
?
t?1
X
j=0
?
j
?
t?j,N
+
?
X
j=0
?
t+j
?
?j,N
!
+
P
N
?
1??
=
?
?
X
j=0
?
j
X
t?j,N
?
!
+
??
1
1??
,1,?
t?1
,..,?
0
,0
1?(T?t)
?
?P
N
?
?
N
.
Hencewecandefine
a
t,N
=
?
X
j=0
?
j
X
t?j,N
?, (C.3.2)
and
b
t
=
?
1
1??
,1,?
t?1
,..,?
0
,0
1?(T?t)
?
. (C.3.3)
159
Given Assumptions 5 and 6, we have that
P
?
j=0
?
j
X
t?j,N
? is uniformly bounded
in absolute value and hence the sequence of vectors a
t,N
has elements uniformly
bounded in absolute value. Note that the elements (as well as dimensions) of b
t
do not depend on N, and hence they are trivially uniformly bounded in absolute
value.
ProofofLemma5:The claim is a consequence of TheoremA1inAppendixA.
I now verify that its conditions are met. As in equation (4.3.10), we have that the
elements of H
0
N
?u
N
are quadratic forms in the innovations:
h
0
rt,N
?u
t,N
= a
0
rt,N
(d
t
?P
N
)?
N
+?
0
N
(b
0
rt
d
t
?P
0
N
P
N
)?
N
, (C.3.4)
where ?
N
=
?
?
0
N
,?
0
N
,?
0
1,N
,...,?
0
T,N
?
,with?
N
=
P
?
j=0
?
j
?
?j,N
.ByAssump-
tions 1 and 6 it follows from Lemma B1 in Appendix B that the random variable
?
N
satisfies condition A3 in Appendix A. Therefore, by Assumptions 1 and 2, the
innovations ?
N
satisfyconditionsA1andA3inAppendixA.TheLemmastip-
ulates that the vectors a
rt,N
have elements uniformly bounded in absolute value.
Observe that by Assumption 3, the matrix (d
t
?P
N
) has row sums uniformly
bounded in absolute value and hence the vectora
0
rt,N
(d
t
?P
N
) has elements uni-
formly bounded in absolute value and thus satisfies condition A2 in Appendix A.
Furthermore, given that the dimensions and elements ofb
0
rt
d
t
do not change with
N, we have that Assumption 3 implies that the matrix (b
0
rt
d
t
?P
0
N
P
N
) fulfills
condition A2 as well. Finally, [(T ?1)N]
?1
?
min
(V
N
) ? c>0 is a condition
stipulated in the Lemma.
160
ProofofLemma6:Substituting the model (equation 4.3.1) into the definition of
the GMM estimator in (4.3.5) leads to:
p
(T ?1)N
?
e
?
N
??
?
(C.3.5)
=
p
(T ?1)N
?
?Z
0
N
H
N
A
?1
N
H
0
N
?Z
N
?
?1
?
?Z
0
N
H
N
A
?1
N
H
0
N
?u
N
=
"
?Z
0
N
H
N
(T ?1)N
?
A
N
(T ?1)N
?
?1
H
0
N
?Z
N
(T ?1)N
#
?1
?
?Z
0
N
H
N
(T ?1)N
?
A
N
(T ?1)N
?
?1
H
0
N
?u
N
p
(T ?1)N
.
By assumption in the lemma we have that V
?1/2
N
H
0
N
?u
N
d
? N (0,I
k
) with
[(T ?1)N]
?1
V
N
p
? V finite. Hence by Corollary 5 in P?tcher and Prucha
(2001), we have
?
V
N
(T ?1)N
?
1/2
V
?1/2
N
H
0
N
?u
N
=
H
0
N
?u
N
p
(T ?1)N
d
? N (0,V). (C.3.6)
Furthermore, the lemma assumes that
?Z
0
N
H
N
(T ?1)N
p
? M
?ZH
, (C.3.7)
A
N
(T ?1)N
p
? A,
whereM
?ZH
is finite with full column rank andAis finite and invertible. Hence,
by Corollary 5 in P?tcher and Prucha (2001), we have the desired result.
161
Proof of Theorem 3: Observe that the instruments collected in
e
H
N
consist of
y
t,N
and columns of X
t,N
and hence by Lemma 4 are linear forms of the innova-
tions of the form assumed in Lemma 5 and satisfy its conditions. Below I verify
that [(T ?1)N]
?1
e
V
N
has the smallest eigenvalue uniformly bounded away from
zero. This will complete verification of conditions of Lemma 5 and hence we will
have that
e
V
?1/2
N
e
H
0
N
?u
N
d
? N
?
0,
e
V
?
.
Observe that using the expression
e
H
N
=
e
S
N
+?
N
,where
e
S
N
is the nonsto-
chastic part of the instruments (see Section 4.3.3), we have
[(T ?1)N]
?1
e
V
N
=[(T ?1)N]
?1
E
?
e
H
0
N
?u
N
?u
0
N
e
H
N
?
(C.3.8)
=[(T ?1)N]
?1
E
h?
e
S
0
N
+?
0
N
?
?u
N
?u
0
N
?
e
S
N
+?
N
?i
=[(T ?1)N]
?1
?
e
V
1,N
+
e
V
2,N
+
e
V
3,N
+
e
V
4,N
?
,
where
e
V
1,N
=
e
S
0
N
E(?u
N
?u
0
N
)
e
S
N
(C.3.9)
e
V
2,N
=
e
S
0
N
E(?u
N
?u
0
N
?
N
)
e
V
3,N
= E(?
0
N
?u
N
?u
0
N
)
e
S
N
e
V
4,N
= E(?
0
N
?u
N
?u
0
N
?
N
).
In the following I show that the smallest eigenvalue of [(T ?1)N]
?1
e
V
1,N
is
uniformly bounded away from zero. I also show that
e
V
2,N
= 0,and
e
V
3,N
= 0.
Sincetheeigenvaluesof
e
V
4,N
arenonnegativeitthenfollowsfromLemmaC1that
162
the smallest eigenvalue of [(T ?1)N]
?1
e
V
N
is uniformly bounded away from
zero.
Using
?u
N
=
??
0
(T?1)?2
,D
?
?P
N
?
?
N
, (C.3.10)
whereasin(4.1.15)E(?
N
?
0
N
)=(?
?,N
?I
N
), it follows that
e
V
1,N
=
e
S
0
N
h
?
0
(T?1)?2
,D
?
?
?,N
?
0
(T?1)?2
,D
?
0
?P
N
P
0
N
i
e
S
N
. (C.3.11)
By Lemma C2 the smallest eigenvalue of
e
V
1,N
is then
?
min
?
e
V
1,N
?
? ?
min
?
e
S
0
N
e
S
N
?
? (C.3.12)
?
min
h
?
0
(T?1)?2
,D
?
?
?,N
?
0
(T?1)?2
,D
?
0
?(P
N
P
0
N
)
i
= ?
min
?
e
S
0
N
e
S
N
?
??
min
[(D?
?,N
D
0
)?(P
N
P
0
N
)]
= ?
min
?
e
S
0
N
e
S
N
?
??
min
(D?
?,N
D
0
) ??
min
(P
N
P
0
N
)
= ?
min
?
e
S
0
N
e
S
N
?
??
min
(DD
0
) ??
min
(?
?,N
) ??
min
(P
N
P
0
N
),,
where I also used Theorem 4.2.12 in Horn and Johnson (1991). Observe that from
the definition of the first difference operator matrix D (see 4.1.14), it follows that
DD
0
=2I
T?1
and hence ?
min
(DD
0
)=2.Since?
?,N
is diagonal, we have
?
min
(?
?,N
)=min
?
?
2
?
,var
?
?
i,N
?
,?
2
?
?
=min
h
?
2
?
,
?
2
?
1??
2
,?
2
?
i
? c
?
> 0.By
Assumption 4 we have that ?
min
(P
N
P
0
N
) ?c
P
> 0 and, therefore
?
min
?
e
V
1,N
?
? 2c
?
c
P
?
min
?
e
S
0
N
e
S
N
?
. (C.3.13)
163
From Assumption GMM1 we have that?
min
?
[(T ?1)N]
?1
e
S
0
t,N
e
S
t,N
?
?c
S
> 0
and hence
[(T ?1)N]
?1
?
min
?
e
V
1,N
?
? 2c
?
c
P
c
S
> 0. (C.3.14)
Next, I show that
e
V
2,N
and
e
V
3,N
are matrices of zeros. Recall that ?
N
con-
sistsofblocks?
t,N
onthemaindiagonalandzeroselsewhere. Thus?
0
N
?u
N
?u
0
N
consists of blocks ?
0
t,N
?u
t,N
?u
0
t,N
on the main diagonal and zeros elsewhere.
Observe that
?
t,N
=[((b
t?2
,...,b
0
)?P
N
)(I
t?1
??
N
),0
N?tp
], (C.3.15)
and thus
?
0
t,N
?u
t,N
?u
0
t,N
=
?
?
?
?
?
?
?
?
?
?
0
N
(b
0
0
?P
0
N
)
.
.
.
?
0
N
?
b
0
t?2
?P
0
N
?
0
tp?N
?
?
?
?
?
?
?
?
?
?u
t,N
?u
0
t,N
. (C.3.16)
Observe that?u
t,N
=(d
t
?P
N
)?
N
(as in 4.3.9) and thus
?
0
N
?
b
0
t?s
?P
0
N
?
?u
t,N
= ?
0
N
?
b
0
t?s
d
t
?P
0
N
P
N
?
?
N
, (C.3.17)
where d
t
is a (t+1)? th row of
?
0
(T?1)?2
,D
?
,withthe(T ?1) ? T matrix
D is defined in (4.1.14). Hence the 1 ? (T +2)vector d
t
is a row vector with
zeros in the first t positions. Furthermore, the 1?(T +2)vector b
t?s
(defined in
164
the proof of Lemma 4 above) has zero entries starting from position (t?2+s).
Asaresult,fors>1, the product b
0
t?s
d
t
is a (T +2)? (T +2)matrix with
zeros on the main diagonal. Hence?
0
N
?
b
0
t?s
?P
0
N
?
?u
t,N
is a quadratic form in
the innovations ?
N
with zeros on the main diagonal (and no linear component).
Each element of ?u
t,N
is a linear form in innovations ?
N
and hence can also be
treated as a linear-quadratic form in ?
N
where the matrix defining the quadratic
component consists of zeros. As a result, we can apply Lemma A1 in Appendix A
to obtain that the covariance of?
0
N
?
b
0
t?s
?P
0
N
?
?u
t,N
and?u
it,N
is zero. Thus
it follows that
E
?
?
0
N
?
b
0
t?s
?P
0
N
?
?u
t,N
?u
0
t,N
?
=0, (C.3.18)
where s>1, implying that E(?
0
N
?u
N
?u
0
N
) isamatrixofzeros.Asaconse-
quence
e
V
2,N
= E(?
0
N
?u
N
?u
0
N
)
e
S
N
= 0
k?k
. (C.3.19)
The same argument implies that
e
V
3,N
is a matrix of zeros. Finally, observe that
the matrix
e
V
4,N
is itself a variance covariance matrix (i.e. symmetric positive
semidefinite) and thus it has non-negative eigenvalues.
This completes the verification of the conditions of Lemma 5 and hence we
have that
e
V
?1/2
N
e
H
0
N
?u
N
d
? N
?
0,
e
V
?
. We can now write the estimator as
e
?
N
= ?+
h
?Z
0
N
e
H
N
e
V
?1
N
e
H
0
N
?Z
N
i
?1
?Z
0
N
e
H
N
e
V
?1
N
e
H
0
N
?u
N
, (C.3.20)
165
where by Assumptions GMM2 and GMM3,
p lim
N??
1
(T ?1)N
?Z
0
N
e
H
N
=
f
M
H?Z
, (C.3.21)
and
p lim
N??
1
(T ?1)N
e
V
N
=
e
V. (C.3.22)
Therefore by Lemma 6, the estimator converges in distribution with
p
(T ?1)N
?
e
?
N
??
?
d
? N (0,?), (C.3.23)
where
? =
?
f
M
?ZH
e
V
?1
f
M
0
?ZH
?
?1
f
M
?ZH
e
V
?1
e
V
e
V
?1
f
M
?ZH
?
?
f
M
?ZH
e
V
?1
f
M
0
?ZH
?
?1
(C.3.24)
=
?
f
M
?ZH
e
V
?1
f
M
0
?ZH
?
?1
= ?,
which is the claim in the Theorem.
To prove Lemma 7, I will use Lemma C.6 in Kelejian and Prucha (2005). For
convenience of the reader, I restate a simplified version of that lemma:
Lemma C7 Let a
n
and b
n
be sequences of n ? 1 vectors and let W
n
be a se-
quence of n ? n matrices. Assume that the vectors a
n
and b
n
have elements
uniformly bounded in absolute value and that the matrices (rW
n
) have row and
166
columnsumsuniformlyboundedinabsolutevalueforr<1byoneandsome finite
constant respectively. Consider a sequence of random variablese?
n
converging in
probability to ? as n ?? ,where|?| <r.DenoteP
n
(r)=(I
n
?rW
n
)
?1
.
Then
n
?1
a
0
n
P
n
(?)
0
P
n
(?)b
n
?n
?1
a
0
n
P
n
(e?
n
)
0
P
n
(e?
n
)b
n
= o
p
(1), (C.3.25)
and
n
?1
tr
?
P
n
(?)
0
P
n
(?)P
n
(?)
0
P
n
(?)
?
?n
?1
tr
?
P
n
(e?
n
)
0
P
n
(e?
n
)P
n
(e?
n
)
0
P
n
(e?
n
)
?
= o
p
(1).
(C.3.26)
Proof: The proof of the first claim follows from Lemma C.6 in Kelejian and
Prucha (2005) by choosing (in their notation)?
n
=
e
?
n
= I
n
andH
n
=(a
n
,b
n
).
The second claim is not a direct consequence of the Lemma C.6, however, its
proof follows the same structure. Denote
v
n
= n
?1
tr
?
P
n
(?)
0
P
n
(?)P
n
(?)
0
P
n
(?)
?
(C.3.27)
?n
?1
tr
?
P
n
(e?
n
)
0
P
n
(e?
n
)P
n
(e?
n
)
0
P
n
(e?
n
)
?
.
Using the same argument as on p.39 in Kelejian and Prucha (2005), it follows that
for every subsequence (n
m
) there exists a subsequence (n
0
m
) such that for ? ?A,
P (A)=1, there is critical indexN
?
such that for alln
0
m
?N
?
:
?
?
e?
n
0
m
(?)
?
?
?r
?
,
where r
?
=(r + |?|)/2. Furthermore, it also follows from the argument on the
167
same page that for n
0
m
? N
?
the row sums of e?
n
0
m
(?)W
n
are less than unity in
absolute value and that
?
I
n
0
m
?e?
n
0
m
(?)W
n
?
and
?
I
n
0
m
??
n
0
m
W
n
?
are invertible
with
?
I
n
0
m
?e?
n
0
m
(?)W
n
?
?1
=
?
X
l=1
?
e?
n
0
m
(?)
?
l
W
l
n
, (C.3.28)
?
I
n
0
m
??
n
0
m
W
n
?
?1
=
?
X
l=1
?
?
n
0
m
?
l
W
l
n
.
Hencewehavethat
v
n
0
m
(?)=
?
X
k=1
?
X
l=1
?
X
p=1
?
X
q=1
?
?
n
0
m
?
k+l+p+q
(W
0
n
)
k
(W
n
)
l
(W
0
n
)
p
(W
n
)
q
(C.3.29)
?
?
X
k=1
?
X
l=1
?
X
p=1
?
X
q=1
?
e?
n
0
m
(?)
?
k+l+p+q
(W
0
n
)
k
(W
n
)
l
(W
0
n
)
p
(W
n
)
q
=
?
X
k=1
?
X
l=1
?
X
p=1
?
X
q=1
?
(k,l,p,q)
n
0
m
(?),
where
?
(k,l,p,q)
n
0
m
(?)=
"
?
k+l+p+q
?e?
k+l+p+q
n
0
m
(?)
r
k+l+p+q
#
?
(k,l,p,q)
n
0
m
,
with
?
(k,l,p,q)
n
0
m
=(n
0
m
)
?1
tr
h
r
k+l+p+q
?
W
0
n
0
m
?
k
?
W
n
0
m
?
l
?
W
0
n
0
m
?
p
?
W
n
0
m
?
q
i
. (C.3.30)
Given that the row and column sums of the matrix rW
n
are uniformly bounded
in absolute value by one and some finite constant respectively, it follows that
168
?
(k,l,p,q)
n
0
m
= O(1). Furthermore, observe that
?
?
??
k+l+p+q
?e?
k+l+p+q
n
0
m
?
?
?
r
k+l+p+q
? 2
?
r
?
r
?
k+l+p+q
, (C.3.31)
(see Kelejian and Prucha, 2005, p. 40) and hence
?
?
??
(k,l,p,q)
n
0
m
(?)
?
?
? ? B
(k,l,p,q)
=
2c(r
?
/r)
k+l+p+q
where c is the uniform bound for
?
?
??
(k,l,p,q)
n
0
m
?
?
?.Sincer
?
/r < 1
by construction, clearly
P
?
k=1
P
?
l=1
P
?
p=1
P
?
q=1
B
(k,l,p,q)
< ? . By dominated
convergence it follows that v
n
0
m
(?) ? 0 as n
0
m
?? , and as a result v
n
?? by
the subsequence argument (Kelejian and Prucha, 2005, p. 39; G?nsler and Slute,
1977, pp. 61-62).
ProofofLemma7:Recall that based on the expression for the covariance of
the quadratic forms in
e
V
N
and
b
V
N
, the elements of the first diagonal block of
e
V
ts,N
?
b
V
ts,N
are(see4.3.34):
ev
y
qr,ts,N
?bv
y
qr,ts,N
= a
0
t?1?q,N
(d
t
?
?,N
d
0
s
?P
N
P
0
N
)a
s?1?r,N
(C.3.32)
?ba
0
t?1?q,N
?
d
t
b
?
?,N
d
0
s
?
b
P
N
b
P
0
N
?
ba
s?1?r,N
+2tr
?
b
0
t?1?q
d
t
?
?,N
d
0
s
b
s?1?r
?
?,N
?P
0
N
P
N
P
0
N
P
N
?
?2tr
?
b
b
0
t?1?q,N
d
t
b
?
?,N
d
0
s
b
b
s?1?r,N
b
?
?,N
?
b
P
0
N
b
P
N
b
P
0
N
b
P
N
?
.
Note that from (C.3.2) and since the lemma assumes
P
??
k=0
?
k
X
?k,N
? = 0,it
169
follows that
?
a
0
t?1?q,N
?ba
0
t?1?q,N
?
=
t?2?q
X
k=0
?
?
k
?
0
?
b
?
k
b
?
0
N
?
X
0
t?1?q?k,N
,(C.3.33)
(a
s?1?r,N
?ba
s?1?r,N
)=
s?2?r
X
k=0
X
s?1?r?k,N
?
?
k
??
b
?
k
b
?
N
?
.
Since d
t
?
?,N
d
0
s
is a scalar and we can then rearrange the above expression as
50
ev
y
qr,ts,N
?bv
y
qr,ts,N
=
18
X
m=1
v
y
m,N
, (C.3.34)
50
I use the following, rather tedious algebraic rule: let a,b,c,d andba,
b
b,bc,
b
d be matrices (and/or
scalars or vectors) of conformable dimensions. It is then easy to verify that:
abcd?ba
b
bbc
b
d =(a?ba)bcd + a
?
b?
b
b
?
cd + ab(c?bc)d + abc
?
d?
b
d
?
?(a?ba)
?
b?
b
b
?
cd?(a?ba)b(c?bc)d?(a?ba)bc
?
d?
b
d
?
?a
?
b?
b
b
?
(c?bc)d?a
?
b?
b
b
?
c
?
d?
b
d
?
?ab(c?bc)
?
d?
b
d
?
+(a?ba)
?
b?
b
b
?
(c?bc)d +(a?ba)
?
b?
b
b
?
c
?
d?
b
d
?
+(a?ba)b(c?bc)
?
d?
b
d
?
+ a
?
b?
b
b
?
(c?bc)
?
d?
b
d
?
?(a?ba)
?
b?
b
b
?
(c?bc)
?
d?
b
d
?
,
and
ab?ba
b
b =(a?ba)b + a
?
b?
b
b
?
?(a?ba)
?
b?
b
b
?
.
170
where
v
y
1,N
= d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
?
a
0
t?1?q,N
P
N
P
0
N
a
s?1?r,N
?
, (C.3.35)
v
y
2,N
= d
t
?
?,N
d
0
s
?
a
0
t?1?q,N
?ba
0
t?1?q,N
?
P
N
P
0
N
a
s?1?r,N
= d
t
?
?,N
d
0
s
"
t?2?q
X
k=0
?
?
k
?
0
?
b
?
k
b
?
0
N
?
X
0
t?1?q?k,N
#
P
N
P
0
N
a
s?1?r,N
,
v
y
3,N
= d
t
?
?,N
d
0
s
a
0
t?1?q,N
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
a
s?1?r,N
,
v
y
4,N
= d
t
?
?,N
d
0
s
a
0
t?1?q,N
P
N
P
0
N
(a
s?1?r,N
?ba
s?1?r,N
)
= d
t
?
?,N
d
0
s
a
0
t?1?q,N
P
N
P
0
N
"
s?2?r
X
k=0
X
s?1?r?k,N
?
?
k
??
b
?
k
b
?
N
?
#
,
v
y
5,N
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
?
a
0
t?1?q,N
?ba
0
t?1?q,N
?
P
N
P
0
N
a
s?1?r,N
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
"
s?2?r
X
k=0
X
s?1?r?k,N
?
?
k
??
b
?
k
b
?
N
?
#
?
P
N
P
0
N
a
s?1?r,N
,
v
y
6,N
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
a
0
t?1?q,N
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
a
s?1?r,N
,
v
y
7,N
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
a
0
t?1?q,N
P
N
P
0
N
(a
s?1?r,N
?ba
s?1?r,N
)
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
a
0
t?1?q,N
?
P
N
P
0
N
"
s?2?r
X
k=0
X
s?1?r?k,N
?
?
k
??
b
?
k
b
?
N
?
#
,
171
v
y
8,N
= ?d
t
?
?,N
d
0
s
?
a
0
t?1?q,N
?ba
0
t?1?q,N
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
a
s?1?r,N
= ?d
t
?
?,N
d
0
s
"
t?2?q
X
k=0
?
?
k
?
0
?
b
?
k
b
?
0
N
?
X
0
t?1?q?k,N
#
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
a
s?1?r,N
,
v
y
9,N
= ?d
t
?
?,N
d
0
s
?
a
0
t?1?q,N
?ba
0
t?1?q,N
?
P
N
P
0
N
(a
s?1?r,N
?ba
s?1?r,N
)
= ?d
t
?
?,N
d
0
s
"
t?2?q
X
k=0
?
?
k
?
0
?
b
?
k
b
?
0
N
?
X
0
t?1?q?k,N
#
?
P
N
P
0
N
"
s?2?r
X
k=0
X
s?1?r?k,N
?
?
k
??
b
?
k
b
?
N
?
#
,
v
y
10,N
= ?d
t
?
?,N
d
0
s
a
0
t?1?q,N
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
(a
s?1?r,N
?ba
s?1?r,N
)
= ?d
t
?
?,N
d
0
s
a
0
t?1?q,N
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
"
s?2?r
X
k=0
X
s?1?r?k,N
?
?
k
??
b
?
k
b
?
N
?
#
,
v
y
11,N
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
?
a
0
t?1?q,N
?ba
0
t?1?q,N
?
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
a
s?1?r,N
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
"
t?2?q
X
k=0
?
?
k
?
0
?
b
?
k
b
?
0
N
?
X
0
t?1?q?k,N
#
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
a
s?1?r,N
,
172
v
y
12,N
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
?
a
0
t?1?q,N
?ba
0
t?1?q,N
?
?P
N
P
0
N
(a
s?1?r,N
?ba
s?1?r,N
)
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
"
t?2?q
X
k=0
?
?
k
?
0
?
b
?
k
b
?
0
N
?
X
0
t?1?q?k,N
#
?P
N
P
0
N
(a
s?1?r,N
?ba
s?1?r,N
),
v
y
13,N
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
a
0
t?1?q,N
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
(a
s?1?r,N
?ba
s?1?r,N
)
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
a
0
t?1?q,N
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
"
s?2?r
X
k=0
X
s?1?r?k,N
?
?
k
??
b
?
k
b
?
N
?
#
,
v
y
14,N
= ?d
t
?
?,N
d
0
s
?
a
0
t?1?q,N
?ba
0
t?1?q,N
?
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
(a
s?1?r,N
?ba
s?1?r,N
)
= ?d
t
?
?,N
d
0
s
"
t?2?q
X
k=0
?
?
k
?
0
?
b
?
k
b
?
0
N
?
X
0
t?1?q?k,N
#
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
"
s?2?r
X
k=0
X
s?1?r?k,N
?
?
k
??
b
?
k
b
?
N
?
#
,
173
v
y
15,N
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
?
a
0
t?1?q,N
?ba
0
t?1?q,N
?
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
(a
s?1?r,N
?ba
s?1?r,N
)
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
"
t?2?q
X
k=0
?
?
k
?
0
?
b
?
k
b
?
0
N
?
X
0
t?1?q?k,N
#
?
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
"
s?2?r
X
k=0
X
s?1?r?k,N
?
?
k
??
b
?
k
b
?
N
?
#
,
v
y
16,N
=2
?
tr
?
b
0
t?1?q
d
t
?
?,N
d
0
s
b
s?1?r
?
?,N
?
?tr
?
b
b
0
t?1?q,N
d
t
b
?
?,N
d
0
s
b
b
s?1?r,N
b
?
?,N
?i
?tr(P
0
N
P
N
P
0
N
P
N
),
v
y
17,N
=2tr
?
b
0
t?1?q
d
t
?
?,N
d
0
s
b
s?1?r
?
?,N
?
?
tr
?
P
0
N
P
N
P
0
N
P
N
?
b
P
0
N
b
P
N
b
P
0
N
b
P
N
?
,
v
y
18,N
= ?2
?
tr
?
b
0
t?1?q
d
t
?
?,N
d
0
s
b
s?1?r
?
?,N
?
?tr
?
b
b
0
t?1?q,N
d
t
b
?
?,N
d
0
s
b
b
s?1?r,N
b
?
?,N
?i
?
tr
?
P
0
N
P
N
P
0
N
P
N
?
b
P
0
N
b
P
N
b
P
0
N
b
P
N
?
.
Observe that for notational convenience I drop the dependence of the scalarsv
y
m,N
on the values of the indexes q,r,s,t.
I now examine the nonstochastic elements of the scalars v
y
m,N
. Note that the
elements and dimensions of d
t
and d
0
s
do not depend on N and hence they are
trivially uniformly bounded in absolute value. The dimensions of ?
?,N
(defined
in 4.1.16) do not depend onN and its elements are uniformly bounded in absolute
value by Assumptions 1, 2 and 6. I now show that the other nonstochastic com-
174
ponents are uniformly bounded in absolute value when scaled by N
?1
.Notethat
since |?| < 1, it follows from Assumption 5 that a
t?1?q,N
as well as a
s?1?r,N
have elements uniformly bounded in absolute value. By Assumption 3, the ma-
trix P
N
has row and column sums uniformly bounded in absolute value. As a
result, it follows from Lemma C3 that N
?1
a
0
t?1?q,N
P
N
P
0
N
a
s?1?r,N
is uniformly
bounded in absolute value. Similarly, given Assumptions 3 and 5, it follows from
Lemma C3 that N
?1
X
0
t?1?q?k,N
P
N
P
0
N
a
s?1?r,N
(where k =0,..t? 1 ?q)and
N
?1
a
0
t?1?q,N
P
N
P
0
N
X
s?1?r?k,N
(where k =0,..,s? 1 ?r)haveelementsthat
are uniformly bounded in absolute value.
Next I show that the stochastic components of v
y
m,N
with dimensions that do
not depend on N are o
p
(1). Recall that
?
?,N
= diag
?
?
2
?,N
,
?
2
?,N
1??
,?
2
?,N
,...,?
2
?,N
?
, (C.3.36)
is a (T +2)? (T +2)diagonal matrix and that
b
t?1?q
=
?
1
1??
,1,?
t?2?q
,..,?
0
,0
1?(T?t?1?q)
?
, (C.3.37)
is a 1 ? (T +2)vector. Since
b
?
N
p
? ? and |?| < 1, we then have by Theorem 14
175
in P?tscher and Prucha (2001) that
?
?
?
?
b
?
?,N
?
= o
p
(1), (C.3.38)
?
?
k
??
b
?
k
b
?
N
?
= o
p
(1),k? 0
?
b
t?1?q
?
b
b
t?1?q,N
?
= o
p
(1),
?
b
s?1?r
?
b
b
s?1?r,N
?
= o
p
(1),
and
tr
?
b
0
t?1?q
d
t
?
?
d
0
s
b
s?1?r
?
?
?
?tr
?
b
b
0
t?1?q,N
d
t
b
?
?,N
d
0
s
b
b
s?1?r,N
b
?
?,N
?
= o
p
(1).
(C.3.39)
Thus it follows that for m =1,2,4,5,7,9,11,12,13, and 14, all elements of
N
?1
v
y
m,N
areeithero
p
(1)oruniformlyboundedinabsolutevalue. Hence,N
?1
v
y
m,N
=
o
p
(1) for m =1,2,4,5,7,9,11,12,13,14 and 16.
Finally, I examine the remaining scalars v
y
m,N
that contain stochastic elements
with dimensions that depend on N. Observe that by assumption in the lemma,
the parameter ? and the matrix P
N
(?)=(I
N
??W
N
)
?1
satisfy the condition in
Lemma C7. Thus
a
0
t?1?q,N
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
a
s?1?r,N
= o
p
(1), (C.3.40)
X
0
t?1?q?k,N
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
a
s?1?r,N
= o
p
(1),
X
0
t?1?q?k,N
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
X
s?1?r?k,N
= o
p
(1),
tr
?
P
0
N
P
N
P
0
N
P
N
?
b
P
0
N
b
P
N
b
P
0
N
b
P
N
?
= o
p
(1).
176
Hence N
?1
v
y
m,N
= o
p
(1) for m =3,6,8,10,15,17, and 18.Asaresult,wehave
that
[N (T ?1)]
?1
?
ev
y
qr,ts,N
?bv
y
qr,ts,N
?
= o
p
(1). (C.3.41)
Next I consider the lower-diagonal block of
b
V
ts,N
. As above, I express the
difference between the typical element of
e
V
X
ts,N
and
b
V
X
ts,N
as
e
V
X
qr,ts,N
?
b
V
X
qr,ts,N
= X
0
t?q,N
(d
t
?
?,N
d
0
s
?P
N
P
0
N
)X
s?r,N
(C.3.42)
?X
0
t?q,N
?
d
t
b
?
?,N
d
0
s
?
b
P
N
b
P
0
N
?
X
s?r,N
= A
1,N
+A
2,N
+A
3,N
,
where
51
A
1,N
= d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
X
0
t?q,N
P
N
P
0
N
X
s?r,N
, (C.3.43)
A
2,N
= d
t
?
?,N
d
0
s
X
0
t?q,N
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
X
s?r,N
,
A
3,N
= ?d
t
?
?
?,N
?
b
?
?,N
?
d
0
s
X
0
t?q,N
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
X
s?r,N
.
I again do not explicitly denote the dependence of the p?p matrices A
1,N
, A
2,N
and A
3,N
on the value of the indexes q,r,s,t.
Asabovewehavethat
?
?
?,N
?
b
?
?,N
?
= o
p
(1). Since by Assumption
51
Similarly to the decomposition above this uses the the following algebraic rule: let a,b andba,
b
b
be matrices of conformable dimensions. Then
ab?ba
b
b =(a?ba)b + a
?
b?
b
b
?
?(a?ba)
?
b?
b
b
?
.
177
5, the elements of and X
t?q,N
and X
s?r,N
are uniformly bounded in absolute
value and from Assumption 3, the matrix P
N
P
0
N
has row and column sums
uniformly bounded in absolute value, it follows from Lemma C3 that the el-
ements of N
?1
X
0
t?q,N
P
N
P
0
N
X
s?r,N
are uniformly bounded in absolute value
and, therefore, N
?1
A
1,N
= o
p
(1). Similarly, from Lemma C7 it follows that
N
?1
X
0
t?q,N
?
P
N
P
0
N
?
b
P
N
b
P
0
N
?
X
s?r,N
= o
p
(1),andhenceN
?1
A
2,N
andN
?1
A
3,N
are o
p
(1).Asaresult,
[N (T ?1)]
?1
?
e
V
X
ts,N
?
b
V
X
ts,N
?
= o
p
(1). (C.3.44)
Finally, I show that the off-diagonal blocks in
e
V
ts,N
are matrices of zeros. Ob-
serve that from Lemma 4 it follows that the moments ?u
0
s,N
y
s?1?r,N
are linear-
quadratic forms in the innovations?
N
.SinceE
?
?u
0
s,N
y
s?1?r,N
?
=0(asr>0),
it follows that the diagonal elements of the quadratic forms are zeros. Because
elements of X
0
t?q,N
?u
t,N
are linear forms in?
N
, it follows from Lemma A1 that
E
?
X
0
t?q,N
?u
t,N
?u
0
s,N
y
s?1?r,N
?
= 0
p?1
,
and hence the off-diagonal blocks in both
e
V
ts,N
and
b
V
ts,N
are matrices of zeros.
Thus we have together that
[N (T ?1)]
?1
?
e
V
ts,N
?
b
V
ts,N
?
p
? 0
k
t
?k
s
, (C.3.45)
178
or by repeating the above arguments for other values of t and s that
[N (T ?1)]
?1
?
e
V
N
?
b
V
N
?
p
? 0
k?k
. (C.3.46)
From [(T ?1)N]
?1
e
V
N
p
?
e
V
N
(Assumption GMM3) it now follows that
[(T ?1)N]
?1
b
V
N
p
?
e
V.
Proof of Theorem 4: The feasible second stage GMM estimator is
?
?
N
?
b
?
N
?
=
h
?Z
0
N
e
H
N
b
V
?1
N
?
b
?
N
?
e
H
0
N
?Z
N
i
?1
?Z
0
N
e
H
N
b
V
?1
N
?
b
?
N
?
e
H
0
N
?y
N
.
(C.3.47)
To prove the claim it suffices to show that, see e.g. Schmidt (1976), p. 71:
?
1,N
=[N (T ?1)]
?1
?Z
0
N
e
H
N
b
V
?1
N
?
b
?
N
?
e
H
0
N
?Z
N
?[N (T ?1)]
?1
?Z
0
N
e
H
N
e
V
?1
N
e
H
0
N
?Z
N
p
? 0,
(C.3.48)
and
?
2,N
=[N (T ?1)]
?1/2
?Z
0
N
e
H
N
b
V
?1
N
?
b
?
N
?
?
e
H
0
N
?u
N
?[N (T ?1)]
?1/2
?Z
0
N
e
H
N
e
V
?1
N
e
H
0
N
?u
N
p
? 0.
(C.3.49)
Note that
?
1,N
=[N (T ?1)]
?1
?Z
0
N
e
H
N
? (C.3.50)
?
?
?
[N (T ?1)]
?1
b
V
N
?
b
?
N
??
?1
?
?
[N (T ?1)]
?1
e
V
N
?
?1
?
?[N (T ?1)]
?1
e
H
0
N
?Z
N
.
179
From Lemma 7 and Assumption GMM3, it folows that the matrices
[(T ?1)N]
?1
b
V
N
?
b
?
N
?
and [N (T ?1)]
?1
e
V
N
both converge to
e
V in probabil-
ity. Since by Assumption GMM3 the matrix
e
Vis finite and nonsingular, it follows
from Theorem 14 in P?tscher and Prucha (2001) that
?
?
[N (T ?1)]
?1
b
V
N
?
b
?
N
??
?1
?
?
[N (T ?1)]
?1
e
V
N
?
?1
?
= o
p
(1). (C.3.51)
Given Assumption GMM2, it then follows that?
1,N
p
? 0.
Similarlywehavefor?
2,N
:
?
2,N
=[N (T ?1)]
?1
?Z
0
N
e
H
N
? (C.3.52)
?
?
?
[N (T ?1)]
?1
b
V
N
?
b
?
N
??
?1
?
?
[N (T ?1)]
?1
e
V
N
?
?1
?
?[N (T ?1)]
?1/2
e
H
0
N
?u
N
,
whereasabove
[N (T ?1)]
?1
?Z
0
N
e
H
N
p
?
f
M
0
H?Z
, (C.3.53)
and
?
?
[N (T ?1)]
?1
b
V
N
?
b
?
N
??
?1
?
?
[N (T ?1)]
?1
e
V
N
?
?1
?
p
? 0
k?k
. (C.3.54)
Note that from Lemma 5, it follows that
e
V
?1/2
N
e
H
0
N
?u
d
? N (0,I
k
).GivenAs-
180
sumption GMM3, it follows from Theorem 15 in P?tscher and Prucha (2001) that
e
H
0
N
?u
N
[N (T ?1)]
1/2
=
?
e
V
N
N (T ?1)
!
1/2
?
e
V
?1/2
N
e
H
0
N
?u
N
d
? N
?
0,
e
V
?
(C.3.55)
Hence by Corollary 5, part (a), in P?tscher and Prucha (2001), we have that
?
2,N
p
? 0.
ProofofLemma8:Given Assumption GMM3, the claim follows directly from
C.3.48.
181
D Appendix: Tables of Monte Carlo Results
182
True Values
?? W RMSE Bias RMSE Bias RMSE Bias
-0.90 -0.90 1 0.0977 0.0057 0.0948 0.0088 0.0883 -0.0169
-0.75 -0.90 1 0.1213 0.0079 0.1094 0.0087 0.1030 -0.0252
-0.25 -0.90 1 0.2367 0.0139 0.1601 0.0110 0.1634 -0.0858
0.00 -0.90 1 0.3246 0.0151 0.1885 0.0110 0.2089 -0.1369
0.25 -0.90 1 0.5168 0.0140 0.2223 0.0100 0.2758 -0.2090
0.75 -0.90 1 0.9098 -0.1899 0.2126 0.0099 0.3447 -0.2852
0.90 -0.90 1 0.3725 -0.0178 0.1594 0.0050 0.2694 -0.2165
-0.90 -0.50 1 0.0452 0.0014 0.0428 0.0016 0.0411 -0.0066
-0.75 -0.50 1 0.0554 0.0021 0.0502 0.0020 0.0477 -0.0097
-0.25 -0.50 1 0.1004 0.0044 0.0784 0.0013 0.0731 -0.0284
0.00 -0.50 1 0.1386 0.0047 0.0927 0.0041 0.0862 -0.0427
0.25 -0.50 1 0.2112 0.0070 0.1098 0.0059 0.1030 -0.0638
0.75 -0.50 1 1.0751 -0.2148 0.1156 0.0055 0.1167 -0.0864
0.90 -0.50 1 0.2612 -0.0031 0.0877 0.0043 0.0988 -0.0734
-0.90 -0.25 1 0.0372 0.0011 0.0362 0.0025 0.0343 -0.0046
-0.75 -0.25 1 0.0458 0.0008 0.0421 0.0021 0.0410 -0.0068
-0.25 -0.25 1 0.0866 0.0026 0.0655 0.0014 0.0606 -0.0199
0.00 -0.25 1 0.1187 0.0022 0.0781 0.0024 0.0701 -0.0296
0.25 -0.25 1 0.1711 0.0030 0.0902 0.0037 0.0816 -0.0441
0.75 -0.25 1 1.2234 -0.3342 0.0948 0.0052 0.0914 -0.0606
0.90 -0.25 1 0.2557 -0.0095 0.0733 0.0019 0.0783 -0.0533
-0.90 0.00 1 0.0364 0.0018 0.0362 0.0011 0.0346 -0.0048
-0.75 0.00 1 0.0446 0.0015 0.0413 0.0010 0.0396 -0.0064
-0.25 0.00 1 0.0845 0.0016 0.0644 0.0020 0.0591 -0.0181
0.00 0.00 1 0.1135 0.0022 0.0736 0.0014 0.0651 -0.0257
0.25 0.00 1 0.1660 0.0028 0.0856 0.0008 0.0766 -0.0391
0.75 0.00 1 1.3519 -0.3439 0.0884 0.0058 0.0849 -0.0538
0.90 0.00 1 0.2572 -0.0050 0.0713 0.0050 0.0725 -0.0487
-0.90 0.25 1 0.0385 0.0000 0.0365 0.0018 0.0357 -0.0055
-0.75 0.25 1 0.0477 0.0012 0.0430 0.0022 0.0427 -0.0084
-0.25 0.25 1 0.0884 0.0012 0.0669 0.0030 0.0629 -0.0196
0.00 0.25 1 0.1229 0.0022 0.0804 0.0032 0.0714 -0.0295
0.25 0.25 1 0.1824 0.0022 0.0927 0.0019 0.0825 -0.0423
0.75 0.25 1 1.2421 -0.3463 0.0979 0.0032 0.0912 -0.0622
0.90 0.25 1 0.2583 -0.0029 0.0768 0.0046 0.0792 -0.0544
Table D1
Initial IV Estimators of ?
Estimator AH1 AH2 AB
183
True Values
?? W RMSE Bias RMSE Bias RMSE Bias
-0.90 0.50 1 0.0473 0.0002 0.0443 0.0013 0.0426 -0.0071
-0.75 0.50 1 0.0574 0.0010 0.0528 0.0016 0.0497 -0.0107
-0.25 0.50 1 0.1058 0.0028 0.0807 0.0047 0.0732 -0.0264
0.00 0.50 1 0.1449 0.0019 0.0992 0.0040 0.0885 -0.0428
0.25 0.50 1 0.2197 0.0029 0.1153 0.0064 0.1037 -0.0616
0.75 0.50 1 1.0355 -0.2428 0.1204 0.0041 0.1190 -0.0896
0.90 0.50 1 0.2672 -0.0110 0.0948 0.0042 0.1001 -0.0751
-0.90 0.90 1 0.0950 -0.0029 0.0960 0.0013 0.0916 -0.0229
-0.75 0.90 1 0.1178 -0.0037 0.1145 0.0013 0.1100 -0.0321
-0.25 0.90 1 0.2298 -0.0047 0.1761 0.0073 0.1691 -0.0896
0.00 0.90 1 0.3335 -0.0058 0.2008 0.0084 0.2131 -0.1363
0.25 0.90 1 0.5477 -0.0176 0.2251 0.0143 0.2764 -0.2062
0.75 0.90 1 0.9974 -0.1566 0.2144 0.0061 0.3543 -0.2889
0.90 0.90 1 0.3929 -0.0086 0.1662 -0.0005 0.2672 -0.2119
-0.90 -0.90 2 0.0408 0.0006 0.0379 0.0002 0.0372 -0.0067
-0.75 -0.90 2 0.0498 -0.0002 0.0448 0.0004 0.0434 -0.0091
-0.25 -0.90 2 0.0937 0.0001 0.0676 0.0001 0.0655 -0.0235
0.00 -0.90 2 0.1300 -0.0027 0.0821 -0.0015 0.0788 -0.0353
0.25 -0.90 2 0.1905 0.0008 0.0923 0.0003 0.0881 -0.0509
0.75 -0.90 2 1.0960 -0.2450 0.0989 0.0011 0.0985 -0.0708
0.90 -0.90 2 0.2442 -0.0024 0.0770 0.0018 0.0853 -0.0604
-0.90 -0.50 2 0.0367 0.0015 0.0356 0.0009 0.0349 -0.0052
-0.75 -0.50 2 0.0451 0.0003 0.0413 0.0010 0.0412 -0.0074
-0.25 -0.50 2 0.0864 -0.0011 0.0638 0.0004 0.0603 -0.0202
0.00 -0.50 2 0.1170 0.0020 0.0770 -0.0002 0.0696 -0.0283
0.25 -0.50 2 0.1750 0.0060 0.0859 -0.0002 0.0806 -0.0423
0.75 -0.50 2 1.1873 -0.3030 0.0921 0.0035 0.0897 -0.0599
0.90 -0.50 2 0.2555 -0.0040 0.0723 0.0025 0.0762 -0.0521
-0.90 -0.25 2 0.0364 0.0015 0.0349 0.0011 0.0347 -0.0043
-0.75 -0.25 2 0.0441 0.0010 0.0403 0.0009 0.0400 -0.0064
-0.25 -0.25 2 0.0849 -0.0002 0.0640 0.0017 0.0583 -0.0180
0.00 -0.25 2 0.1141 0.0037 0.0747 0.0000 0.0677 -0.0270
0.25 -0.25 2 0.1697 0.0049 0.0844 0.0015 0.0782 -0.0406
0.75 -0.25 2 1.3170 -0.3437 0.0901 0.0032 0.0863 -0.0561
0.90 -0.25 2 0.2563 -0.0088 0.0703 0.0040 0.0732 -0.0489
Table D1 cont.
Initial IV Estimators of ?
Estimator AH1 AH2 AB
184
True Values
?? W RMSE Bias RMSE Bias RMSE Bias
-0.90 0.00 2 0.0364 0.0018 0.0362 0.0011 0.0346 -0.0048
-0.75 0.00 2 0.0446 0.0015 0.0413 0.0010 0.0396 -0.0064
-0.25 0.00 2 0.0845 0.0016 0.0644 0.0020 0.0591 -0.0181
0.00 0.00 2 0.1135 0.0022 0.0736 0.0014 0.0651 -0.0257
0.25 0.00 2 0.1660 0.0028 0.0856 0.0008 0.0766 -0.0391
0.75 0.00 2 1.3519 -0.3439 0.0884 0.0058 0.0849 -0.0538
0.90 0.00 2 0.2572 -0.0050 0.0713 0.0050 0.0725 -0.0487
-0.90 0.25 2 0.0370 0.0011 0.0360 0.0015 0.0345 -0.0047
-0.75 0.25 2 0.0465 0.0011 0.0427 0.0017 0.0399 -0.0069
-0.25 0.25 2 0.0855 0.0013 0.0656 0.0034 0.0604 -0.0202
0.00 0.25 2 0.1192 0.0034 0.0783 0.0033 0.0693 -0.0289
0.25 0.25 2 0.1760 0.0026 0.0886 0.0029 0.0794 -0.0413
0.75 0.25 2 1.2863 -0.3508 0.0909 0.0043 0.0880 -0.0584
0.90 0.25 2 0.2585 0.0024 0.0755 0.0047 0.0759 -0.0516
-0.90 0.50 2 0.0416 0.0011 0.0409 0.0024 0.0391 -0.0063
-0.75 0.50 2 0.0507 0.0012 0.0479 0.0030 0.0460 -0.0090
-0.25 0.50 2 0.0958 0.0016 0.0721 0.0041 0.0679 -0.0250
0.00 0.50 2 0.1344 0.0075 0.0894 0.0060 0.0797 -0.0354
0.25 0.50 2 0.1997 0.0094 0.1063 0.0060 0.0935 -0.0526
0.75 0.50 2 1.2256 -0.2925 0.1091 0.0093 0.1046 -0.0755
0.90 0.50 2 0.2678 -0.0126 0.0885 0.0055 0.0907 -0.0659
-0.90 0.90 2 0.1252 -0.0041 0.1163 0.0077 0.1105 -0.0267
-0.75 0.90 2 0.1538 -0.0030 0.1380 0.0121 0.1285 -0.0365
-0.25 0.90 2 0.2908 -0.0019 0.2099 0.0137 0.2005 -0.1060
0.00 0.90 2 0.4118 -0.0013 0.2549 0.0156 0.2457 -0.1611
0.25 0.90 2 0.6497 -0.0186 0.2939 0.0166 0.3227 -0.2403
0.75 0.90 2 1.2519 -0.3148 0.2742 0.0071 0.4062 -0.3263
0.90 0.90 2 0.5361 -0.0507 0.2255 0.0068 0.3408 -0.2655
-0.90 -0.90 3 0.0392 0.0016 0.0370 0.0020 0.0364 -0.0052
-0.75 -0.90 3 0.0474 0.0021 0.0431 0.0023 0.0419 -0.0075
-0.25 -0.90 3 0.0900 0.0035 0.0664 0.0026 0.0635 -0.0184
0.00 -0.90 3 0.1228 0.0058 0.0790 0.0015 0.0728 -0.0291
0.25 -0.90 3 0.1857 0.0068 0.0916 0.0021 0.0834 -0.0442
0.75 -0.90 3 1.1327 -0.3200 0.0931 0.0023 0.0901 -0.0631
0.90 -0.90 3 0.2562 -0.0151 0.0741 0.0041 0.0777 -0.0534
Estimator AH1 AH2 AB
Table D1 cont.
Initial IV Estimators of ?
185
True Values
?? W RMSE Bias RMSE Bias RMSE Bias
-0.90 -0.50 3 0.0372 0.0014 0.0359 0.0012 0.0344 -0.0042
-0.75 -0.50 3 0.0455 0.0021 0.0417 0.0007 0.0411 -0.0067
-0.25 -0.50 3 0.0886 0.0012 0.0642 0.0015 0.0605 -0.0181
0.00 -0.50 3 0.1181 0.0045 0.0764 0.0016 0.0685 -0.0278
0.25 -0.50 3 0.1757 0.0043 0.0863 0.0009 0.0791 -0.0404
0.75 -0.50 3 1.2685 -0.3750 0.0893 0.0046 0.0855 -0.0572
0.90 -0.50 3 0.2589 -0.0104 0.0714 0.0056 0.0736 -0.0499
-0.90 -0.25 3 0.0363 0.0019 0.0358 0.0005 0.0348 -0.0038
-0.75 -0.25 3 0.0454 0.0014 0.0413 0.0008 0.0397 -0.0062
-0.25 -0.25 3 0.0858 0.0008 0.0641 0.0015 0.0596 -0.0183
0.00 -0.25 3 0.1163 0.0041 0.0744 0.0018 0.0668 -0.0270
0.25 -0.25 3 0.1672 0.0051 0.0857 0.0020 0.0764 -0.0384
0.75 -0.25 3 1.3017 -0.3856 0.0886 0.0052 0.0843 -0.0555
0.90 -0.25 3 0.2542 -0.0107 0.0702 0.0051 0.0720 -0.0484
-0.90 0.00 3 0.0364 0.0018 0.0362 0.0011 0.0346 -0.0048
-0.75 0.00 3 0.0446 0.0015 0.0413 0.0010 0.0396 -0.0064
-0.25 0.00 3 0.0845 0.0016 0.0644 0.0020 0.0591 -0.0181
0.00 0.00 3 0.1135 0.0022 0.0736 0.0014 0.0651 -0.0257
0.25 0.00 3 0.1660 0.0028 0.0856 0.0008 0.0766 -0.0391
0.75 0.00 3 1.3519 -0.3439 0.0884 0.0058 0.0849 -0.0538
0.90 0.00 3 0.2572 -0.0050 0.0713 0.0050 0.0725 -0.0487
-0.90 0.25 3 0.0374 0.0009 0.0364 0.0015 0.0344 -0.0045
-0.75 0.25 3 0.0456 0.0013 0.0417 0.0016 0.0395 -0.0054
-0.25 0.25 3 0.0869 0.0003 0.0651 0.0021 0.0597 -0.0193
0.00 0.25 3 0.1154 0.0028 0.0772 0.0018 0.0693 -0.0276
0.25 0.25 3 0.1709 0.0037 0.0889 0.0025 0.0795 -0.0396
0.75 0.25 3 1.3535 -0.3464 0.0878 0.0023 0.0864 -0.0568
0.90 0.25 3 0.2564 0.0049 0.0739 0.0034 0.0743 -0.0494
-0.90 0.50 3 0.0405 0.0006 0.0397 0.0020 0.0379 -0.0052
-0.75 0.50 3 0.0493 0.0014 0.0472 0.0022 0.0447 -0.0081
-0.25 0.50 3 0.0940 0.0036 0.0720 0.0021 0.0674 -0.0227
0.00 0.50 3 0.1316 0.0046 0.0855 0.0039 0.0774 -0.0328
0.25 0.50 3 0.1919 0.0072 0.1013 0.0039 0.0889 -0.0471
0.75 0.50 3 1.2627 -0.3083 0.1014 0.0098 0.0954 -0.0670
0.90 0.50 3 0.2714 -0.0080 0.0802 0.0042 0.0847 -0.0588
Estimator AH1 AH2 AB
Table D1 cont.
Initial IV Estimators of ?
186
True Values
?? W RMSE Bias RMSE Bias RMSE Bias
-0.90 0.90 3 0.1371 -0.0009 0.1252 0.0076 0.1180 -0.0288
-0.75 0.90 3 0.1699 -0.0027 0.1503 0.0126 0.1369 -0.0393
-0.25 0.90 3 0.3219 0.0020 0.2256 0.0187 0.2058 -0.1003
0.00 0.90 3 0.4354 0.0032 0.2673 0.0204 0.2551 -0.1572
0.25 0.90 3 0.6794 0.0100 0.3249 0.0148 0.3273 -0.2381
0.75 0.90 3 1.3234 -0.3345 0.3122 0.0082 0.4046 -0.3253
0.90 0.90 3 0.5756 -0.0721 0.2463 0.0068 0.3432 -0.2676
Table D1 cont.
Initial IV Estimators of ?
Estimator AH1 AH2 AB
187
True Values
?? W RMSE Bias RMSE Bias RMSE Bias
-0.90 -0.90 1 0.0853 -0.0065 0.0713 -0.0082 0.0850 -0.0016
-0.75 -0.90 1 0.0987 -0.0093 0.0845 -0.0147 0.0987 -0.0070
-0.25 -0.90 1 0.1419 -0.0425 0.1333 -0.0536 0.1468 -0.0411
0.00 -0.90 1 0.1676 -0.0678 0.1616 -0.0822 0.1736 -0.0735
0.25 -0.90 1 0.1989 -0.0998 0.1934 -0.1158 0.2113 -0.1165
0.75 -0.90 1 0.1773 -0.0866 0.1757 -0.1082 0.2431 -0.1575
0.90 -0.90 1 0.1279 -0.0562 0.1291 -0.0787 0.1783 -0.0911
-0.90 -0.50 1 0.0417 -0.0030 0.0404 -0.0027 0.0426 0.0003
-0.75 -0.50 1 0.0490 -0.0039 0.0462 -0.0040 0.0499 -0.0022
-0.25 -0.50 1 0.0703 -0.0140 0.0681 -0.0138 0.0693 -0.0113
0.00 -0.50 1 0.0845 -0.0224 0.0783 -0.0186 0.0773 -0.0162
0.25 -0.50 1 0.0983 -0.0317 0.0927 -0.0306 0.0883 -0.0236
0.75 -0.50 1 0.0901 -0.0319 0.0868 -0.0378 0.0928 -0.0410
0.90 -0.50 1 0.0668 -0.0221 0.0688 -0.0274 0.0777 -0.0279
-0.90 -0.25 1 0.0347 -0.0017 0.0349 -0.0017 0.0367 0.0005
-0.75 -0.25 1 0.0408 -0.0027 0.0406 -0.0025 0.0424 -0.0008
-0.25 -0.25 1 0.0585 -0.0094 0.0589 -0.0085 0.0589 -0.0078
0.00 -0.25 1 0.0688 -0.0147 0.0667 -0.0146 0.0658 -0.0118
0.25 -0.25 1 0.0796 -0.0232 0.0765 -0.0210 0.0737 -0.0150
0.75 -0.25 1 0.0744 -0.0260 0.0771 -0.0281 0.0784 -0.0309
0.90 -0.25 1 0.0584 -0.0180 0.0594 -0.0188 0.0683 -0.0202
-0.90 0.00 1 0.0338 -0.0012 0.0338 -0.0013 0.0349 0.0012
-0.75 0.00 1 0.0386 -0.0021 0.0388 -0.0024 0.0403 -0.0003
-0.25 0.00 1 0.0572 -0.0091 0.0568 -0.0090 0.0556 -0.0058
0.00 0.00 1 0.0649 -0.0127 0.0646 -0.0126 0.0634 -0.0086
0.25 0.00 1 0.0738 -0.0189 0.0734 -0.0191 0.0707 -0.0133
0.75 0.00 1 0.0744 -0.0252 0.0742 -0.0252 0.0764 -0.0284
0.90 0.00 1 0.0572 -0.0167 0.0573 -0.0168 0.0657 -0.0205
-0.90 0.25 1 0.0345 -0.0024 0.0344 -0.0024 0.0378 -0.0005
-0.75 0.25 1 0.0410 -0.0029 0.0403 -0.0035 0.0422 -0.0008
-0.25 0.25 1 0.0594 -0.0099 0.0585 -0.0099 0.0572 -0.0072
0.00 0.25 1 0.0696 -0.0143 0.0688 -0.0149 0.0641 -0.0086
0.25 0.25 1 0.0770 -0.0228 0.0776 -0.0213 0.0739 -0.0118
0.75 0.25 1 0.0765 -0.0271 0.0787 -0.0291 0.0790 -0.0319
0.90 0.25 1 0.0619 -0.0197 0.0611 -0.0206 0.0696 -0.0236
Table D2
Second Stage GMM Estimators of ?
Estimator ignoring mix exp
188
True Values
?? W RMSE Bias RMSE Bias RMSE Bias
-0.90 0.50 1 0.0400 -0.0016 0.0390 -0.0034 0.0432 -0.0021
-0.75 0.50 1 0.0465 -0.0032 0.0471 -0.0047 0.0502 -0.0021
-0.25 0.50 1 0.0699 -0.0115 0.0678 -0.0131 0.0642 -0.0105
0.00 0.50 1 0.0792 -0.0191 0.0779 -0.0191 0.0736 -0.0131
0.25 0.50 1 0.0928 -0.0306 0.0890 -0.0314 0.0821 -0.0203
0.75 0.50 1 0.0938 -0.0340 0.0962 -0.0396 0.0923 -0.0426
0.90 0.50 1 0.0710 -0.0237 0.0723 -0.0284 0.0788 -0.0329
-0.90 0.90 1 0.0899 -0.0101 0.0765 -0.0138 0.0863 -0.0055
-0.75 0.90 1 0.1042 -0.0139 0.0882 -0.0171 0.1009 -0.0072
-0.25 0.90 1 0.1466 -0.0369 0.1290 -0.0453 0.1445 -0.0418
0.00 0.90 1 0.1733 -0.0592 0.1529 -0.0719 0.1709 -0.0700
0.25 0.90 1 0.1917 -0.0890 0.1819 -0.1069 0.2045 -0.1085
0.75 0.90 1 0.1767 -0.0865 0.1862 -0.1129 0.2410 -0.1530
0.90 0.90 1 0.1372 -0.0623 0.1417 -0.0823 0.1722 -0.0912
-0.90 -0.90 2 0.0367 -0.0028 0.0372 -0.0028 0.0399 -0.0005
-0.75 -0.90 2 0.0421 -0.0038 0.0439 -0.0040 0.0456 -0.0018
-0.25 -0.90 2 0.0611 -0.0108 0.0604 -0.0112 0.0595 -0.0069
0.00 -0.90 2 0.0713 -0.0175 0.0696 -0.0163 0.0681 -0.0126
0.25 -0.90 2 0.0834 -0.0265 0.0828 -0.0262 0.0780 -0.0185
0.75 -0.90 2 0.0812 -0.0278 0.0828 -0.0304 0.0871 -0.0365
0.90 -0.90 2 0.0615 -0.0175 0.0631 -0.0196 0.0703 -0.0230
-0.90 -0.50 2 0.0342 -0.0024 0.0345 -0.0021 0.0373 0.0007
-0.75 -0.50 2 0.0400 -0.0031 0.0407 -0.0033 0.0432 -0.0014
-0.25 -0.50 2 0.0579 -0.0093 0.0579 -0.0085 0.0561 -0.0060
0.00 -0.50 2 0.0655 -0.0131 0.0658 -0.0137 0.0628 -0.0088
0.25 -0.50 2 0.0763 -0.0211 0.0767 -0.0211 0.0731 -0.0148
0.75 -0.50 2 0.0752 -0.0246 0.0766 -0.0261 0.0802 -0.0308
0.90 -0.50 2 0.0586 -0.0171 0.0596 -0.0181 0.0666 -0.0208
-0.90 -0.25 2 0.0339 -0.0016 0.0340 -0.0013 0.0347 0.0011
-0.75 -0.25 2 0.0399 -0.0027 0.0400 -0.0025 0.0408 -0.0008
-0.25 -0.25 2 0.0563 -0.0086 0.0574 -0.0093 0.0560 -0.0057
0.00 -0.25 2 0.0645 -0.0123 0.0649 -0.0130 0.0615 -0.0093
0.25 -0.25 2 0.0762 -0.0188 0.0763 -0.0185 0.0693 -0.0140
0.75 -0.25 2 0.0756 -0.0255 0.0753 -0.0256 0.0779 -0.0303
0.90 -0.25 2 0.0577 -0.0168 0.0585 -0.0171 0.0651 -0.0206
Table D2 cont.
Second Stage GMM Estimators of ?
Estimator ignoring mix exp
189
True Values
?? W RMSE Bias RMSE Bias RMSE Bias
-0.90 0.00 2 0.0338 -0.0012 0.0337 -0.0016 0.0349 0.0013
-0.75 0.00 2 0.0386 -0.0021 0.0387 -0.0021 0.0401 -0.0005
-0.25 0.00 2 0.0572 -0.0091 0.0565 -0.0090 0.0549 -0.0060
0.00 0.00 2 0.0649 -0.0127 0.0650 -0.0127 0.0621 -0.0084
0.25 0.00 2 0.0738 -0.0189 0.0734 -0.0192 0.0704 -0.0133
0.75 0.00 2 0.0744 -0.0252 0.0745 -0.0253 0.0766 -0.0284
0.90 0.00 2 0.0572 -0.0167 0.0570 -0.0166 0.0655 -0.0203
-0.90 0.25 2 0.0341 -0.0007 0.0346 -0.0013 0.0362 0.0001
-0.75 0.25 2 0.0401 -0.0028 0.0397 -0.0026 0.0411 -0.0009
-0.25 0.25 2 0.0581 -0.0084 0.0573 -0.0085 0.0571 -0.0059
0.00 0.25 2 0.0674 -0.0137 0.0680 -0.0137 0.0635 -0.0089
0.25 0.25 2 0.0754 -0.0203 0.0766 -0.0208 0.0723 -0.0138
0.75 0.25 2 0.0748 -0.0275 0.0741 -0.0277 0.0773 -0.0299
0.90 0.25 2 0.0589 -0.0191 0.0584 -0.0191 0.0673 -0.0213
-0.90 0.50 2 0.0384 -0.0023 0.0379 -0.0029 0.0383 -0.0016
-0.75 0.50 2 0.0453 -0.0030 0.0447 -0.0030 0.0449 -0.0023
-0.25 0.50 2 0.0661 -0.0106 0.0614 -0.0109 0.0611 -0.0080
0.00 0.50 2 0.0736 -0.0155 0.0708 -0.0173 0.0673 -0.0121
0.25 0.50 2 0.0850 -0.0258 0.0814 -0.0235 0.0752 -0.0156
0.75 0.50 2 0.0859 -0.0305 0.0858 -0.0350 0.0830 -0.0357
0.90 0.50 2 0.0676 -0.0223 0.0684 -0.0258 0.0724 -0.0271
-0.90 0.90 2 0.1070 -0.0068 0.0657 -0.0109 0.0730 -0.0073
-0.75 0.90 2 0.1243 -0.0118 0.0759 -0.0150 0.0837 -0.0103
-0.25 0.90 2 0.1726 -0.0500 0.1142 -0.0420 0.1173 -0.0326
0.00 0.90 2 0.2035 -0.0835 0.1381 -0.0656 0.1349 -0.0479
0.25 0.90 2 0.2406 -0.1138 0.1691 -0.0929 0.1492 -0.0674
0.75 0.90 2 0.2403 -0.1234 0.2314 -0.1464 0.1812 -0.1127
0.90 0.90 2 0.1807 -0.0827 0.1811 -0.1088 0.1585 -0.0905
-0.90 -0.90 3 0.0356 -0.0014 0.0359 -0.0015 0.0390 0.0004
-0.75 -0.90 3 0.0413 -0.0038 0.0417 -0.0028 0.0443 -0.0011
-0.25 -0.90 3 0.0589 -0.0095 0.0603 -0.0088 0.0595 -0.0063
0.00 -0.90 3 0.0688 -0.0148 0.0691 -0.0138 0.0679 -0.0116
0.25 -0.90 3 0.0821 -0.0215 0.0819 -0.0206 0.0762 -0.0152
0.75 -0.90 3 0.0779 -0.0252 0.0791 -0.0270 0.0840 -0.0339
0.90 -0.90 3 0.0610 -0.0172 0.0613 -0.0177 0.0712 -0.0232
Table D2 cont.
Second Stage GMM Estimators of ?
Estimator ignoring mix exp
190
True Values
?? W RMSE Bias RMSE Bias RMSE Bias
-0.90 -0.50 3 0.0344 -0.0019 0.0342 -0.0018 0.0370 0.0011
-0.75 -0.50 3 0.0395 -0.0034 0.0397 -0.0027 0.0418 0.0000
-0.25 -0.50 3 0.0574 -0.0088 0.0569 -0.0078 0.0574 -0.0066
0.00 -0.50 3 0.0658 -0.0128 0.0664 -0.0121 0.0643 -0.0098
0.25 -0.50 3 0.0783 -0.0189 0.0783 -0.0187 0.0711 -0.0138
0.75 -0.50 3 0.0742 -0.0249 0.0747 -0.0249 0.0781 -0.0307
0.90 -0.50 3 0.0587 -0.0170 0.0603 -0.0174 0.0670 -0.0217
-0.90 -0.25 3 0.0338 -0.0016 0.0339 -0.0015 0.0357 0.0015
-0.75 -0.25 3 0.0391 -0.0023 0.0396 -0.0021 0.0407 -0.0004
-0.25 -0.25 3 0.0564 -0.0085 0.0569 -0.0083 0.0562 -0.0064
0.00 -0.25 3 0.0659 -0.0130 0.0665 -0.0135 0.0626 -0.0094
0.25 -0.25 3 0.0756 -0.0179 0.0745 -0.0182 0.0700 -0.0140
0.75 -0.25 3 0.0744 -0.0247 0.0746 -0.0244 0.0752 -0.0292
0.90 -0.25 3 0.0582 -0.0172 0.0588 -0.0173 0.0662 -0.0208
-0.90 0.00 3 0.0338 -0.0012 0.0336 -0.0012 0.0349 0.0013
-0.75 0.00 3 0.0386 -0.0021 0.0387 -0.0022 0.0401 -0.0006
-0.25 0.00 3 0.0572 -0.0091 0.0569 -0.0089 0.0554 -0.0059
0.00 0.00 3 0.0649 -0.0127 0.0651 -0.0129 0.0621 -0.0091
0.25 0.00 3 0.0738 -0.0189 0.0738 -0.0190 0.0706 -0.0136
0.75 0.00 3 0.0744 -0.0252 0.0741 -0.0253 0.0763 -0.0286
0.90 0.00 3 0.0572 -0.0167 0.0573 -0.0167 0.0658 -0.0204
-0.90 0.25 3 0.0347 -0.0009 0.0349 -0.0012 0.0354 0.0000
-0.75 0.25 3 0.0395 -0.0023 0.0398 -0.0023 0.0405 -0.0008
-0.25 0.25 3 0.0574 -0.0094 0.0582 -0.0090 0.0551 -0.0058
0.00 0.25 3 0.0658 -0.0137 0.0665 -0.0137 0.0614 -0.0091
0.25 0.25 3 0.0758 -0.0201 0.0777 -0.0208 0.0713 -0.0136
0.75 0.25 3 0.0741 -0.0270 0.0741 -0.0274 0.0776 -0.0284
0.90 0.25 3 0.0588 -0.0186 0.0591 -0.0186 0.0666 -0.0203
-0.90 0.50 3 0.0381 -0.0015 0.0364 -0.0029 0.0378 -0.0015
-0.75 0.50 3 0.0449 -0.0027 0.0439 -0.0039 0.0429 -0.0021
-0.25 0.50 3 0.0633 -0.0097 0.0604 -0.0100 0.0580 -0.0079
0.00 0.50 3 0.0720 -0.0162 0.0688 -0.0156 0.0646 -0.0117
0.25 0.50 3 0.0809 -0.0219 0.0805 -0.0232 0.0729 -0.0157
0.75 0.50 3 0.0816 -0.0309 0.0850 -0.0340 0.0808 -0.0325
0.90 0.50 3 0.0650 -0.0225 0.0677 -0.0253 0.0698 -0.0244
Table D2 cont.
Second Stage GMM Estimators of ?
Estimator ignoring mix exp
191
True Values
?? W RMSE Bias RMSE Bias RMSE Bias
-0.90 0.90 3 0.1131 -0.0056 0.0570 -0.0086 0.0666 -0.0091
-0.75 0.90 3 0.1307 -0.0109 0.0674 -0.0131 0.0740 -0.0130
-0.25 0.90 3 0.1833 -0.0499 0.0986 -0.0335 0.1008 -0.0252
0.00 0.90 3 0.2080 -0.0836 0.1128 -0.0459 0.1135 -0.0371
0.25 0.90 3 0.2467 -0.1224 0.1455 -0.0736 0.1227 -0.0489
0.75 0.90 3 0.2560 -0.1356 0.2234 -0.1442 0.1462 -0.0799
0.90 0.90 3 0.1988 -0.0946 0.1803 -0.1092 0.1385 -0.0702
Table D2 cont.
Second Stage GMM Estimators of ?
Estimator ignoring mix exp
192
True Values
?? W
RMSE Bias RMSE Bias RMSE Bias RMSE Bias
-0.90 -0.90 1 0.028 0.013 0.028 0.013 0.033 0.016 0.018 -0.001
-0.75 -0.90 1 0.028 0.014 0.028 0.013 0.033 0.016 0.018 -0.001
-0.25 -0.90 1 0.033 0.019 0.030 0.014 0.036 0.019 0.018 -0.001
0.00 -0.90 1 0.039 0.024 0.030 0.016 0.040 0.024 0.018 -0.001
0.25 -0.90 1 0.052 0.032 0.030 0.017 0.048 0.030 0.018 -0.001
0.75 -0.90 1 0.069 0.046 0.029 0.017 0.055 0.037 0.018 -0.001
0.90 -0.90 1 0.039 0.024 0.029 0.015 0.045 0.030 0.018 -0.001
-0.90 -0.50 1 0.047 0.003 0.047 0.003 0.047 0.005 0.048 -0.001
-0.75 -0.50 1 0.047 0.004 0.047 0.004 0.048 0.005 0.048 -0.001
-0.25 -0.50 1 0.048 0.006 0.047 0.004 0.047 0.007 0.048 -0.001
0.00 -0.50 1 0.047 0.008 0.048 0.005 0.048 0.007 0.048 -0.001
0.25 -0.50 1 0.052 0.014 0.047 0.005 0.050 0.011 0.048 -0.001
0.75 -0.50 1 0.115 0.067 0.047 0.006 0.055 0.018 0.048 -0.001
0.90 -0.50 1 0.057 0.020 0.047 0.005 0.052 0.014 0.048 -0.001
-0.90 -0.25 1 0.057 0.001 0.056 0.000 0.057 0.001 0.057 -0.001
-0.75 -0.25 1 0.056 0.001 0.056 0.000 0.057 0.001 0.057 -0.001
-0.25 -0.25 1 0.058 0.001 0.057 0.001 0.056 0.002 0.057 -0.001
0.00 -0.25 1 0.057 0.002 0.057 0.001 0.058 0.003 0.057 -0.001
0.25 -0.25 1 0.057 0.005 0.057 0.001 0.057 0.004 0.057 -0.001
0.75 -0.25 1 0.088 0.041 0.056 0.002 0.057 0.007 0.057 -0.001
0.90 -0.25 1 0.061 0.011 0.057 0.001 0.058 0.006 0.057 -0.001
-0.90 0.00 1 0.061 -0.001 0.061 -0.001 0.060 -0.001 0.061 -0.001
-0.75 0.00 1 0.061 -0.001 0.061 -0.001 0.060 -0.001 0.061 -0.001
-0.25 0.00 1 0.061 -0.001 0.060 -0.002 0.061 -0.001 0.061 -0.001
0.00 0.00 1 0.062 -0.001 0.060 -0.002 0.061 -0.001 0.061 -0.001
0.25 0.00 1 0.061 -0.001 0.060 -0.002 0.061 -0.001 0.061 -0.001
0.75 0.00 1 0.075 -0.001 0.062 -0.001 0.060 -0.001 0.061 -0.001
0.90 0.00 1 0.063 0.001 0.061 -0.001 0.060 0.000 0.061 -0.001
-0.90 0.25 1 0.059 -0.003 0.060 -0.003 0.058 -0.003 0.058 -0.001
-0.75 0.25 1 0.059 -0.003 0.059 -0.003 0.057 -0.003 0.058 -0.001
-0.25 0.25 1 0.059 -0.004 0.059 -0.004 0.057 -0.004 0.058 -0.001
0.00 0.25 1 0.058 -0.005 0.058 -0.005 0.058 -0.005 0.058 -0.001
0.25 0.25 1 0.061 -0.007 0.059 -0.005 0.060 -0.006 0.058 -0.001
0.75 0.25 1 0.100 -0.049 0.060 -0.005 0.061 -0.008 0.058 -0.001
0.90 0.25 1 0.064 -0.011 0.060 -0.005 0.061 -0.006 0.058 -0.001
Table D3
Unweighted Spatial GM Estimators of ?
True Initial Estimator AH1 AH2 AB
193
True Values
?? W
RMSE Bias RMSE Bias RMSE Bias RMSE Bias
-0.90 0.50 1 0.051 -0.006 0.051 -0.006 0.051 -0.006 0.049 -0.001
-0.75 0.50 1 0.051 -0.006 0.050 -0.005 0.051 -0.007 0.049 -0.001
-0.25 0.50 1 0.050 -0.007 0.051 -0.007 0.051 -0.008 0.049 -0.001
0.00 0.50 1 0.052 -0.009 0.050 -0.007 0.052 -0.010 0.049 -0.001
0.25 0.50 1 0.055 -0.013 0.051 -0.008 0.054 -0.013 0.049 -0.001
0.75 0.50 1 0.120 -0.075 0.051 -0.009 0.058 -0.020 0.049 -0.001
0.90 0.50 1 0.059 -0.018 0.051 -0.008 0.054 -0.014 0.049 -0.001
-0.90 0.90 1 0.028 -0.013 0.027 -0.013 0.031 -0.015 0.019 0.000
-0.75 0.90 1 0.028 -0.014 0.028 -0.013 0.031 -0.016 0.019 0.000
-0.25 0.90 1 0.034 -0.019 0.029 -0.016 0.035 -0.019 0.019 0.000
0.00 0.90 1 0.039 -0.023 0.031 -0.017 0.038 -0.023 0.019 0.000
0.25 0.90 1 0.051 -0.032 0.031 -0.018 0.049 -0.030 0.019 0.000
0.75 0.90 1 0.070 -0.048 0.031 -0.017 0.059 -0.040 0.019 0.000
0.90 0.90 1 0.040 -0.025 0.028 -0.015 0.045 -0.029 0.019 0.000
-0.90 -0.90 2 0.132 0.005 0.132 0.006 0.131 0.007 0.132 -0.002
-0.75 -0.90 2 0.132 0.005 0.132 0.006 0.130 0.007 0.132 -0.002
-0.25 -0.90 2 0.131 0.011 0.133 0.007 0.134 0.012 0.132 -0.002
0.00 -0.90 2 0.136 0.016 0.133 0.008 0.134 0.015 0.132 -0.002
0.25 -0.90 2 0.143 0.032 0.136 0.009 0.140 0.025 0.132 -0.002
0.75 -0.90 2 0.325 0.223 0.132 0.012 0.149 0.044 0.132 -0.002
0.90 -0.90 2 0.154 0.046 0.133 0.009 0.135 0.028 0.132 -0.002
-0.90 -0.50 2 0.128 -0.001 0.127 0.000 0.127 0.001 0.126 -0.003
-0.75 -0.50 2 0.127 -0.001 0.126 0.001 0.126 0.001 0.126 -0.003
-0.25 -0.50 2 0.124 0.002 0.126 0.000 0.123 0.002 0.126 -0.003
0.00 -0.50 2 0.125 0.005 0.125 0.001 0.124 0.003 0.126 -0.003
0.25 -0.50 2 0.126 0.012 0.125 0.002 0.127 0.007 0.126 -0.003
0.75 -0.50 2 0.209 0.118 0.128 0.001 0.131 0.016 0.126 -0.003
0.90 -0.50 2 0.136 0.020 0.127 0.000 0.127 0.008 0.126 -0.003
-0.90 -0.25 2 0.120 -0.004 0.119 -0.003 0.119 -0.002 0.116 -0.003
-0.75 -0.25 2 0.120 -0.003 0.118 -0.002 0.118 -0.002 0.116 -0.003
-0.25 -0.25 2 0.117 -0.001 0.117 -0.002 0.115 -0.002 0.116 -0.003
0.00 -0.25 2 0.118 -0.001 0.116 -0.003 0.118 -0.001 0.116 -0.003
0.25 -0.25 2 0.115 0.004 0.117 -0.002 0.118 0.000 0.116 -0.003
0.75 -0.25 2 0.161 0.055 0.119 -0.003 0.123 0.003 0.116 -0.003
0.90 -0.25 2 0.129 0.005 0.118 -0.004 0.120 0.001 0.116 -0.003
True Initial Estimator AH1 AH2 AB
Unweighted Spatial GM Estimators of ?
Table D3 cont.
194
True Values
?? W
RMSE Bias RMSE Bias RMSE Bias RMSE Bias
-0.90 0.00 2 0.106 -0.005 0.107 -0.004 0.106 -0.004 0.103 -0.003
-0.75 0.00 2 0.107 -0.004 0.106 -0.005 0.104 -0.003 0.103 -0.003
-0.25 0.00 2 0.107 -0.003 0.107 -0.006 0.104 -0.004 0.103 -0.003
0.00 0.00 2 0.106 -0.004 0.105 -0.005 0.104 -0.004 0.103 -0.003
0.25 0.00 2 0.105 -0.005 0.105 -0.005 0.107 -0.004 0.103 -0.003
0.75 0.00 2 0.139 0.004 0.107 -0.005 0.108 -0.005 0.103 -0.003
0.90 0.00 2 0.116 -0.006 0.106 -0.006 0.108 -0.005 0.103 -0.003
-0.90 0.25 2 0.089 -0.006 0.090 -0.006 0.090 -0.005 0.087 -0.002
-0.75 0.25 2 0.088 -0.006 0.090 -0.005 0.089 -0.005 0.087 -0.002
-0.25 0.25 2 0.089 -0.006 0.090 -0.007 0.088 -0.006 0.087 -0.002
0.00 0.25 2 0.089 -0.006 0.088 -0.006 0.087 -0.006 0.087 -0.002
0.25 0.25 2 0.090 -0.008 0.088 -0.007 0.089 -0.008 0.087 -0.002
0.75 0.25 2 0.130 -0.037 0.090 -0.007 0.095 -0.010 0.087 -0.002
0.90 0.25 2 0.099 -0.013 0.089 -0.006 0.093 -0.009 0.087 -0.002
-0.90 0.50 2 0.068 -0.007 0.068 -0.006 0.069 -0.006 0.068 -0.002
-0.75 0.50 2 0.068 -0.006 0.068 -0.007 0.068 -0.006 0.068 -0.002
-0.25 0.50 2 0.069 -0.007 0.069 -0.007 0.069 -0.007 0.068 -0.002
0.00 0.50 2 0.069 -0.008 0.068 -0.007 0.070 -0.009 0.068 -0.002
0.25 0.50 2 0.067 -0.009 0.069 -0.007 0.071 -0.010 0.068 -0.002
0.75 0.50 2 0.124 -0.058 0.070 -0.009 0.077 -0.016 0.068 -0.002
0.90 0.50 2 0.080 -0.020 0.070 -0.009 0.073 -0.013 0.068 -0.002
-0.90 0.90 2 0.028 -0.007 0.027 -0.007 0.027 -0.008 0.025 0.000
-0.75 0.90 2 0.028 -0.008 0.027 -0.007 0.027 -0.008 0.025 0.000
-0.25 0.90 2 0.031 -0.012 0.028 -0.009 0.030 -0.011 0.025 0.000
0.00 0.90 2 0.033 -0.016 0.029 -0.010 0.033 -0.014 0.025 0.000
0.25 0.90 2 0.040 -0.022 0.030 -0.012 0.039 -0.020 0.025 0.000
0.75 0.90 2 0.056 -0.031 0.030 -0.011 0.050 -0.027 0.025 0.000
0.90 0.90 2 0.041 -0.018 0.029 -0.010 0.044 -0.022 0.025 0.000
-0.90 -0.90 3 0.194 0.003 0.193 0.001 0.189 0.001 0.187 -0.011
-0.75 -0.90 3 0.193 0.003 0.193 0.000 0.187 0.002 0.187 -0.011
-0.25 -0.90 3 0.189 0.006 0.189 0.000 0.185 0.005 0.187 -0.011
0.00 -0.90 3 0.187 0.015 0.187 0.000 0.183 0.007 0.187 -0.011
0.25 -0.90 3 0.191 0.034 0.185 0.001 0.179 0.013 0.187 -0.011
0.75 -0.90 3 0.395 0.255 0.183 0.004 0.189 0.039 0.187 -0.011
0.90 -0.90 3 0.204 0.047 0.188 0.002 0.187 0.023 0.187 -0.011
Table D3 cont.
Unweighted Spatial GM Estimators of ?
Initial Estimator AH1 AH2 AB True 
195
True Values
?? W
RMSE Bias RMSE Bias RMSE Bias RMSE Bias
-0.90 -0.50 3 0.170 -0.005 0.171 -0.005 0.170 -0.004 0.168 -0.010
-0.75 -0.50 3 0.171 -0.004 0.171 -0.005 0.168 -0.002 0.168 -0.010
-0.25 -0.50 3 0.169 -0.002 0.170 -0.006 0.166 -0.003 0.168 -0.010
0.00 -0.50 3 0.167 0.002 0.170 -0.006 0.165 -0.003 0.168 -0.010
0.25 -0.50 3 0.167 0.009 0.167 -0.007 0.164 -0.002 0.168 -0.010
0.75 -0.50 3 0.261 0.127 0.167 -0.003 0.170 0.011 0.168 -0.010
0.90 -0.50 3 0.172 0.020 0.171 -0.006 0.167 0.006 0.168 -0.010
-0.90 -0.25 3 0.154 -0.006 0.155 -0.007 0.154 -0.006 0.152 -0.009
-0.75 -0.25 3 0.155 -0.006 0.155 -0.006 0.152 -0.006 0.152 -0.009
-0.25 -0.25 3 0.154 -0.005 0.155 -0.007 0.150 -0.007 0.152 -0.009
0.00 -0.25 3 0.152 -0.005 0.154 -0.008 0.149 -0.007 0.152 -0.009
0.25 -0.25 3 0.150 -0.001 0.152 -0.009 0.150 -0.006 0.152 -0.009
0.75 -0.25 3 0.199 0.054 0.150 -0.007 0.151 0.001 0.152 -0.009
0.90 -0.25 3 0.152 0.001 0.153 -0.007 0.151 0.001 0.152 -0.009
-0.90 0.00 3 0.136 -0.007 0.135 -0.008 0.133 -0.009 0.132 -0.009
-0.75 0.00 3 0.136 -0.007 0.136 -0.007 0.133 -0.007 0.132 -0.009
-0.25 0.00 3 0.135 -0.007 0.134 -0.009 0.131 -0.008 0.132 -0.009
0.00 0.00 3 0.132 -0.008 0.134 -0.009 0.131 -0.009 0.132 -0.009
0.25 0.00 3 0.134 -0.007 0.133 -0.010 0.132 -0.008 0.132 -0.009
0.75 0.00 3 0.166 0.001 0.134 -0.009 0.132 -0.005 0.132 -0.009
0.90 0.00 3 0.136 -0.009 0.132 -0.010 0.132 -0.005 0.132 -0.009
-0.90 0.25 3 0.111 -0.007 0.112 -0.008 0.109 -0.008 0.109 -0.007
-0.75 0.25 3 0.111 -0.008 0.113 -0.008 0.109 -0.008 0.109 -0.007
-0.25 0.25 3 0.112 -0.009 0.111 -0.009 0.109 -0.010 0.109 -0.007
0.00 0.25 3 0.111 -0.008 0.111 -0.009 0.110 -0.011 0.109 -0.007
0.25 0.25 3 0.111 -0.011 0.110 -0.009 0.111 -0.011 0.109 -0.007
0.75 0.25 3 0.148 -0.040 0.111 -0.011 0.111 -0.013 0.109 -0.007
0.90 0.25 3 0.115 -0.019 0.112 -0.012 0.109 -0.011 0.109 -0.007
-0.90 0.50 3 0.083 -0.007 0.084 -0.007 0.083 -0.007 0.082 -0.006
-0.75 0.50 3 0.083 -0.007 0.085 -0.007 0.083 -0.008 0.082 -0.006
-0.25 0.50 3 0.085 -0.009 0.084 -0.009 0.084 -0.010 0.082 -0.006
0.00 0.50 3 0.085 -0.010 0.085 -0.009 0.085 -0.011 0.082 -0.006
0.25 0.50 3 0.084 -0.013 0.084 -0.009 0.085 -0.011 0.082 -0.006
0.75 0.50 3 0.139 -0.059 0.087 -0.012 0.088 -0.017 0.082 -0.006
0.90 0.50 3 0.094 -0.022 0.087 -0.012 0.086 -0.016 0.082 -0.006
Initial Estimator AH1 AH2 AB True 
Table D3 cont.
Unweighted Spatial GM Estimators of ?
196
True Values
?? W
RMSE Bias RMSE Bias RMSE Bias RMSE Bias
-0.90 0.90 3 0.033 -0.007 0.033 -0.006 0.032 -0.006 0.031 0.001
-0.75 0.90 3 0.034 -0.007 0.033 -0.007 0.032 -0.007 0.031 0.001
-0.25 0.90 3 0.035 -0.012 0.033 -0.009 0.035 -0.010 0.031 0.001
0.00 0.90 3 0.038 -0.016 0.035 -0.011 0.037 -0.015 0.031 0.001
0.25 0.90 3 0.044 -0.022 0.036 -0.012 0.044 -0.020 0.031 0.001
0.75 0.90 3 0.066 -0.035 0.037 -0.012 0.054 -0.029 0.031 0.001
0.90 0.90 3 0.052 -0.020 0.037 -0.010 0.053 -0.023 0.031 0.001
True 
Unweighted Spatial GM Estimators of ?
Initial Estimator AH1 AH2 AB
Table D3 cont.
197
True Values
?? W
RMSE Bias RMSE Bias RMSE Bias RMSE Bias
-0.90 -0.90 1 0.036 0.017 0.036 0.017 0.041 0.021 0.023 0.000
-0.75 -0.90 1 0.037 0.018 0.036 0.017 0.041 0.022 0.023 0.000
-0.25 -0.90 1 0.044 0.026 0.039 0.020 0.047 0.027 0.023 0.000
0.00 -0.90 1 0.051 0.032 0.039 0.022 0.053 0.032 0.023 0.000
0.25 -0.90 1 0.069 0.044 0.041 0.024 0.063 0.041 0.023 0.000
0.75 -0.90 1 0.087 0.060 0.039 0.021 0.073 0.051 0.023 0.000
0.90 -0.90 1 0.052 0.032 0.037 0.019 0.059 0.039 0.023 0.000
-0.90 -0.50 1 0.048 0.002 0.049 0.003 0.048 0.004 0.047 -0.001
-0.75 -0.50 1 0.048 0.003 0.049 0.003 0.049 0.004 0.047 -0.001
-0.25 -0.50 1 0.049 0.006 0.049 0.004 0.049 0.006 0.047 -0.001
0.00 -0.50 1 0.049 0.009 0.048 0.004 0.048 0.007 0.047 -0.001
0.25 -0.50 1 0.052 0.015 0.048 0.006 0.050 0.010 0.047 -0.001
0.75 -0.50 1 0.117 0.070 0.049 0.006 0.054 0.018 0.047 -0.001
0.90 -0.50 1 0.058 0.020 0.048 0.004 0.051 0.013 0.047 -0.001
-0.90 -0.25 1 0.054 0.000 0.054 0.000 0.054 0.000 0.054 -0.002
-0.75 -0.25 1 0.054 0.000 0.055 0.000 0.053 0.000 0.054 -0.002
-0.25 -0.25 1 0.054 0.001 0.054 0.000 0.053 0.000 0.054 -0.002
0.00 -0.25 1 0.053 0.002 0.053 0.001 0.053 0.001 0.054 -0.002
0.25 -0.25 1 0.054 0.006 0.053 0.001 0.053 0.002 0.054 -0.002
0.75 -0.25 1 0.089 0.042 0.054 0.002 0.055 0.007 0.054 -0.002
0.90 -0.25 1 0.058 0.012 0.055 0.000 0.052 0.004 0.054 -0.002
-0.90 0.00 1 0.057 -0.002 0.057 -0.002 0.057 -0.002 0.056 -0.003
-0.75 0.00 1 0.057 -0.001 0.057 -0.002 0.056 -0.002 0.056 -0.003
-0.25 0.00 1 0.057 -0.002 0.057 -0.002 0.056 -0.003 0.056 -0.003
0.00 0.00 1 0.055 -0.003 0.056 -0.003 0.056 -0.003 0.056 -0.003
0.25 0.00 1 0.056 -0.002 0.055 -0.002 0.055 -0.003 0.056 -0.003
0.75 0.00 1 0.074 -0.001 0.057 -0.002 0.057 -0.002 0.056 -0.003
0.90 0.00 1 0.062 -0.001 0.057 -0.001 0.056 -0.001 0.056 -0.003
-0.90 0.25 1 0.056 -0.003 0.056 -0.003 0.055 -0.003 0.054 -0.002
-0.75 0.25 1 0.056 -0.003 0.056 -0.004 0.056 -0.004 0.054 -0.002
-0.25 0.25 1 0.056 -0.004 0.055 -0.004 0.055 -0.005 0.054 -0.002
0.00 0.25 1 0.055 -0.006 0.056 -0.005 0.055 -0.006 0.054 -0.002
0.25 0.25 1 0.057 -0.008 0.055 -0.005 0.056 -0.007 0.054 -0.002
0.75 0.25 1 0.095 -0.050 0.056 -0.006 0.057 -0.010 0.054 -0.002
0.90 0.25 1 0.061 -0.011 0.056 -0.004 0.057 -0.007 0.054 -0.002
Table D4
Initial Estimator AH1 AH2 AB True 
Weighted Spatial GM Estimators of ?
198
True Values
?? W
RMSE Bias RMSE Bias RMSE Bias RMSE Bias
-0.90 0.50 1 0.049 -0.006 0.049 -0.006 0.050 -0.007 0.047 -0.002
-0.75 0.50 1 0.050 -0.006 0.049 -0.006 0.050 -0.007 0.047 -0.002
-0.25 0.50 1 0.050 -0.008 0.050 -0.007 0.050 -0.009 0.047 -0.002
0.00 0.50 1 0.053 -0.011 0.051 -0.008 0.051 -0.010 0.047 -0.002
0.25 0.50 1 0.056 -0.015 0.051 -0.008 0.053 -0.013 0.047 -0.002
0.75 0.50 1 0.120 -0.076 0.050 -0.010 0.057 -0.021 0.047 -0.002
0.90 0.50 1 0.059 -0.018 0.050 -0.007 0.053 -0.015 0.047 -0.002
-0.90 0.90 1 0.037 -0.018 0.037 -0.017 0.041 -0.020 0.022 -0.001
-0.75 0.90 1 0.038 -0.019 0.038 -0.018 0.042 -0.021 0.022 -0.001
-0.25 0.90 1 0.044 -0.025 0.039 -0.021 0.047 -0.026 0.022 -0.001
0.00 0.90 1 0.053 -0.032 0.041 -0.023 0.052 -0.031 0.022 -0.001
0.25 0.90 1 0.071 -0.047 0.041 -0.024 0.065 -0.041 0.022 -0.001
0.75 0.90 1 0.094 -0.066 0.040 -0.022 0.078 -0.054 0.022 -0.001
0.90 0.90 1 0.055 -0.035 0.037 -0.020 0.061 -0.041 0.022 -0.001
-0.90 -0.90 2 0.115 0.005 0.115 0.005 0.118 0.007 0.118 -0.001
-0.75 -0.90 2 0.117 0.007 0.114 0.006 0.116 0.009 0.118 -0.001
-0.25 -0.90 2 0.116 0.011 0.115 0.009 0.118 0.014 0.118 -0.001
0.00 -0.90 2 0.120 0.019 0.116 0.009 0.121 0.018 0.118 -0.001
0.25 -0.90 2 0.126 0.034 0.117 0.012 0.128 0.025 0.118 -0.001
0.75 -0.90 2 0.307 0.210 0.119 0.016 0.134 0.041 0.118 -0.001
0.90 -0.90 2 0.140 0.046 0.117 0.011 0.125 0.026 0.118 -0.001
-0.90 -0.50 2 0.111 0.002 0.110 0.001 0.110 0.002 0.110 -0.001
-0.75 -0.50 2 0.110 0.002 0.110 0.001 0.110 0.002 0.110 -0.001
-0.25 -0.50 2 0.109 0.003 0.108 0.003 0.109 0.004 0.110 -0.001
0.00 -0.50 2 0.110 0.006 0.110 0.003 0.110 0.006 0.110 -0.001
0.25 -0.50 2 0.111 0.014 0.110 0.006 0.114 0.008 0.110 -0.001
0.75 -0.50 2 0.191 0.108 0.112 0.005 0.115 0.013 0.110 -0.001
0.90 -0.50 2 0.123 0.021 0.110 0.005 0.113 0.010 0.110 -0.001
-0.90 -0.25 2 0.102 -0.001 0.103 -0.001 0.103 0.001 0.102 -0.002
-0.75 -0.25 2 0.102 -0.001 0.102 -0.001 0.102 0.001 0.102 -0.002
-0.25 -0.25 2 0.101 0.000 0.102 -0.001 0.101 0.000 0.102 -0.002
0.00 -0.25 2 0.100 0.001 0.102 0.001 0.102 0.000 0.102 -0.002
0.25 -0.25 2 0.101 0.005 0.104 0.002 0.103 0.002 0.102 -0.002
0.75 -0.25 2 0.145 0.051 0.103 0.001 0.105 0.006 0.102 -0.002
0.90 -0.25 2 0.111 0.006 0.103 0.001 0.105 0.003 0.102 -0.002
Initial Estimator AH1 AH2 AB True 
Table D4 cont.
Weighted Spatial GM Estimators of ?
199
True Values
?? W
RMSE Bias RMSE Bias RMSE Bias RMSE Bias
-0.90 0.00 2 0.092 -0.002 0.092 -0.002 0.093 -0.002 0.091 -0.002
-0.75 0.00 2 0.091 -0.002 0.092 -0.002 0.093 -0.002 0.091 -0.002
-0.25 0.00 2 0.091 -0.002 0.092 -0.001 0.094 -0.002 0.091 -0.002
0.00 0.00 2 0.091 -0.002 0.092 -0.001 0.094 -0.001 0.091 -0.002
0.25 0.00 2 0.090 0.000 0.092 -0.001 0.093 -0.001 0.091 -0.002
0.75 0.00 2 0.129 0.005 0.091 -0.003 0.094 -0.003 0.091 -0.002
0.90 0.00 2 0.104 -0.003 0.093 -0.003 0.096 -0.003 0.091 -0.002
-0.90 0.25 2 0.080 -0.003 0.080 -0.003 0.082 -0.003 0.079 -0.002
-0.75 0.25 2 0.080 -0.003 0.080 -0.002 0.081 -0.004 0.079 -0.002
-0.25 0.25 2 0.078 -0.005 0.080 -0.003 0.080 -0.004 0.079 -0.002
0.00 0.25 2 0.078 -0.004 0.080 -0.003 0.081 -0.004 0.079 -0.002
0.25 0.25 2 0.079 -0.004 0.080 -0.004 0.080 -0.005 0.079 -0.002
0.75 0.25 2 0.122 -0.033 0.079 -0.005 0.083 -0.009 0.079 -0.002
0.90 0.25 2 0.086 -0.010 0.081 -0.005 0.084 -0.006 0.079 -0.002
-0.90 0.50 2 0.065 -0.005 0.064 -0.005 0.066 -0.004 0.065 -0.001
-0.75 0.50 2 0.065 -0.005 0.064 -0.004 0.065 -0.004 0.065 -0.001
-0.25 0.50 2 0.064 -0.005 0.066 -0.005 0.066 -0.007 0.065 -0.001
0.00 0.50 2 0.064 -0.006 0.065 -0.005 0.067 -0.008 0.065 -0.001
0.25 0.50 2 0.064 -0.009 0.065 -0.007 0.067 -0.009 0.065 -0.001
0.75 0.50 2 0.123 -0.060 0.065 -0.008 0.071 -0.014 0.065 -0.001
0.90 0.50 2 0.072 -0.016 0.067 -0.007 0.069 -0.010 0.065 -0.001
-0.90 0.90 2 0.033 -0.009 0.032 -0.009 0.033 -0.010 0.029 0.000
-0.75 0.90 2 0.034 -0.011 0.033 -0.010 0.033 -0.010 0.029 0.000
-0.25 0.90 2 0.037 -0.016 0.034 -0.012 0.037 -0.014 0.029 0.000
0.00 0.90 2 0.040 -0.020 0.035 -0.013 0.040 -0.017 0.029 0.000
0.25 0.90 2 0.049 -0.028 0.037 -0.015 0.048 -0.025 0.029 0.000
0.75 0.90 2 0.068 -0.040 0.039 -0.014 0.059 -0.035 0.029 0.000
0.90 0.90 2 0.048 -0.023 0.036 -0.013 0.053 -0.028 0.029 0.000
-0.90 -0.90 3 0.166 0.013 0.165 0.014 0.167 0.016 0.165 0.003
-0.75 -0.90 3 0.167 0.013 0.164 0.014 0.167 0.015 0.165 0.003
-0.25 -0.90 3 0.164 0.018 0.160 0.012 0.163 0.016 0.165 0.003
0.00 -0.90 3 0.165 0.021 0.162 0.014 0.161 0.019 0.165 0.003
0.25 -0.90 3 0.173 0.033 0.162 0.016 0.167 0.027 0.165 0.003
0.75 -0.90 3 0.366 0.239 0.167 0.018 0.177 0.043 0.165 0.003
0.90 -0.90 3 0.182 0.049 0.167 0.010 0.164 0.027 0.165 0.003
Initial Estimator AH1 AH2 AB
Table D4 cont.
True 
Weighted Spatial GM Estimators of ?
200
True Values
?? W
RMSE Bias RMSE Bias RMSE Bias RMSE Bias
-0.90 -0.50 3 0.148 0.005 0.148 0.007 0.146 0.007 0.147 0.002
-0.75 -0.50 3 0.147 0.006 0.145 0.006 0.146 0.006 0.147 0.002
-0.25 -0.50 3 0.148 0.009 0.144 0.006 0.146 0.007 0.147 0.002
0.00 -0.50 3 0.146 0.012 0.142 0.007 0.145 0.007 0.147 0.002
0.25 -0.50 3 0.144 0.017 0.143 0.008 0.144 0.013 0.147 0.002
0.75 -0.50 3 0.240 0.122 0.147 0.006 0.144 0.018 0.147 0.002
0.90 -0.50 3 0.157 0.022 0.145 0.002 0.144 0.012 0.147 0.002
-0.90 -0.25 3 0.135 0.004 0.133 0.004 0.133 0.004 0.134 0.002
-0.75 -0.25 3 0.135 0.003 0.134 0.004 0.132 0.003 0.134 0.002
-0.25 -0.25 3 0.133 0.004 0.130 0.003 0.131 0.003 0.134 0.002
0.00 -0.25 3 0.132 0.005 0.129 0.004 0.128 0.004 0.134 0.002
0.25 -0.25 3 0.131 0.011 0.128 0.003 0.130 0.007 0.134 0.002
0.75 -0.25 3 0.177 0.058 0.132 0.002 0.131 0.005 0.134 0.002
0.90 -0.25 3 0.140 0.011 0.133 -0.001 0.132 0.005 0.134 0.002
-0.90 0.00 3 0.118 0.002 0.117 0.002 0.117 0.001 0.117 0.000
-0.75 0.00 3 0.118 0.001 0.117 0.001 0.116 0.001 0.117 0.000
-0.25 0.00 3 0.116 0.001 0.114 0.002 0.114 0.001 0.117 0.000
0.00 0.00 3 0.115 0.001 0.115 0.001 0.114 0.001 0.117 0.000
0.25 0.00 3 0.113 0.002 0.113 0.000 0.114 0.001 0.117 0.000
0.75 0.00 3 0.145 0.006 0.115 -0.002 0.117 -0.001 0.117 0.000
0.90 0.00 3 0.123 -0.003 0.119 -0.004 0.119 -0.002 0.117 0.000
-0.90 0.25 3 0.099 -0.001 0.098 -0.002 0.098 -0.003 0.098 -0.001
-0.75 0.25 3 0.098 -0.001 0.098 -0.001 0.098 -0.003 0.098 -0.001
-0.25 0.25 3 0.095 -0.002 0.096 -0.002 0.097 -0.003 0.098 -0.001
0.00 0.25 3 0.097 -0.002 0.096 -0.002 0.096 -0.004 0.098 -0.001
0.25 0.25 3 0.097 -0.005 0.097 -0.003 0.097 -0.003 0.098 -0.001
0.75 0.25 3 0.133 -0.038 0.097 -0.006 0.099 -0.006 0.098 -0.001
0.90 0.25 3 0.105 -0.011 0.103 -0.006 0.100 -0.008 0.098 -0.001
-0.90 0.50 3 0.076 -0.004 0.075 -0.004 0.075 -0.004 0.076 -0.002
-0.75 0.50 3 0.075 -0.004 0.075 -0.004 0.076 -0.005 0.076 -0.002
-0.25 0.50 3 0.075 -0.005 0.075 -0.004 0.076 -0.006 0.076 -0.002
0.00 0.50 3 0.075 -0.006 0.076 -0.004 0.077 -0.007 0.076 -0.002
0.25 0.50 3 0.078 -0.009 0.076 -0.005 0.078 -0.008 0.076 -0.002
0.75 0.50 3 0.137 -0.064 0.078 -0.008 0.081 -0.012 0.076 -0.002
0.90 0.50 3 0.088 -0.016 0.078 -0.008 0.079 -0.010 0.076 -0.002
Table D4 cont.
AH1 AH2 AB True 
Weighted Spatial GM Estimators of ?
Initial Estimator
201
True Values
?? W
RMSE Bias RMSE Bias RMSE Bias RMSE Bias
-0.90 0.90 3 0.038 -0.011 0.038 -0.010 0.037 -0.011 0.035 -0.001
-0.75 0.90 3 0.038 -0.011 0.039 -0.010 0.037 -0.011 0.035 -0.001
-0.25 0.90 3 0.042 -0.015 0.039 -0.012 0.040 -0.014 0.035 -0.001
0.00 0.90 3 0.046 -0.021 0.040 -0.014 0.044 -0.019 0.035 -0.001
0.25 0.90 3 0.055 -0.029 0.042 -0.016 0.053 -0.027 0.035 -0.001
0.75 0.90 3 0.077 -0.045 0.041 -0.016 0.066 -0.036 0.035 -0.001
0.90 0.90 3 0.057 -0.027 0.041 -0.014 0.059 -0.030 0.035 -0.001
True Initial Estimator AH1 AH2 AB
Table D4 cont.
Weighted Spatial GM Estimators of ?
202
Figure1: QQ Plot of IV Estimator AH1
Figure 2: QQ Plot of IV Estimator AH2
203
Figure 3: QQ Plot of IV Estimator AB
Figure 4: QQ Plot of GMM Estimator AB Ignoring Spatial Correlation
204
Figure 5: QQ Plot of GMM Estimator AB based on
b
V
mix
Figure 6: QQ Plot of GMM Estimator AB based on
b
V
E
205
Figure 7: Normal Probability QQ Plot
Figure 8: Student t Probability QQ Plot
206
E Appendix: Symbols and Notation Used
In this Appendix, I provide a brief explanation of the different (standard) symbols
used throughout the thesis.
N cross-sectional dimension of the data under consideration
T time dimension of the data under consideration
I
N
N ?N identity matrix
e
T
T ? 1 vector of ones
J
T
T ?T matrix of ones
Q
0
transformation matrix that subtracts location specific sample means
Q
1
transformation matrix that calculates location specificsamplemeans
? first difference operator (in time dimension)
D first difference transformation matrix
? for all (logical predicate)
? exists (logical predicate)
? relation operator ?belongs to a set?
? infinity
R setofrealnumbers
N set of natural numbers
210
?(x) neighborhood of a real number x
sup supremum
inf infimum
min minimum
argmin
???
{} argument that maximizes a maximization problem in brackets
with parameters ? restricted to a set?
lim
n??
a
n
limes superior of the sequence a
n
? Kronecker product operator
kMk matrix norm [tr(M
0
M)]
1/2
?
min
(?) smallest eigenvalue of a matrix?
diag(d
1
,...,d
N
) diagonal matrix with d
1
,...,d
N
on the main diagonal
E(y) expected value of a vector/scalar y
VC(y) variance covariance matrix of a vectory
Cov(z
1
,z
2
) covariance of a two scalar random variables
d
? convergence in distribution
p
? convergence in probability
r
? convergence in r-th mean
211
N (x,?) multivariate normal distribution with mean x and variance covari-
ance matrix?
L
p
space of random variables with finite p-th absolute moments
|x| absolute value of a number/random variable
k?k
r
[E(?
r
)]
1/r
O
p
(k) sequence random variables is of order in probability of at most N
k
O(k) deterministic sequenceisoforderofatmostN
k
2SLS two stage least squares
3SLS three stage least squares
CV covariance (estimator)
GLS generalized least squares
GM generalized moments
GMM generalized method of moments
HAC heteroscedasticity and autocorrelation consistent
IV instrumental variable
LIML limited information maximum likelihood
LSDV least-squares dummy variable (estimator)
MD minimum distance
ML maximum likelihood
OLS ordinary least squares
SAR spatial autoregressive
STAR space-time autoregressive
212
STARMA space-time autoregressive moving average
SUR seemingly unrelated regressions
VAR vector autoregressive
VARMA vector autoregressive moving average
WG within group
213
F Appendix: Inequalities
In this Appendix, I provide a list of inequalities used throughout the thesis. The
following is based on, e.g. Bierens (1994), Section 1.4.
F.1 Deterministic Inequalities
(Bernoulli)Letx?R,x>1 and n?N.Then
(1 +x)
n
? 1+nx, (C.1.1)
with the inequality being sharp for x 6=0and n>1.
(Triangle)Letx,y ?C.Then
|x|?|y| ? |x?y| ? |x| + |y|. (C.1.2)
F.2 Stochastic Inequalities
(Chebyshev)LetX be a non-negative random variable with a finite mean?
X
and
finite variance ?
2
X
. Then for any ??R,?>0
P
?
|X ??
X
| >
r
?
2
X
?
!
??. (C.2.3)
(Holder)LetX andY be random variables and letp,q ?R,p>1,
1
p
+
1
q
=1.
214
Then
E(|XY|) ? [E(|X|
p
)]
1
p
[E(|Y|
q
)]
1
q
. (C.2.4)
(Cauchy-Schwartz)Forp = q =2,wehave
E(|XY|) ?
q
E
?
|X|
2
?
q
E
?
|Y|
2
?
. (C.2.5)
(Lyapunov)ForY =1we have for p>1
E(|X|) ? [E(|X|
p
)]
1
p
. (C.2.6)
(Minkowski)Ifforsomep? 1,E(|X|
p
) <? and E(|Y|
p
) <? ,then
E(|X +Y|) ? [E(|X|
p
)]
1
p
[E(|Y|
p
)]
1
p
. (C.2.7)
(Jensen)LetX be a random variable and f : D ? R ? R beaconvexreal
function. Then
f [E(X)] ?E[f (X)]. (C.2.8)
Observe that by selecting the random variables to be constants, the above in-
equalities can be applied in the deterministic case as well.
Since the mean of a finite number of non-random variables in R may be con-
sidered as mathematical expectations, it follows from H?lder?s inequality that for
215
real numbers x
i
, y
i
, p>1,
1
p
+
1
q
=1:
?
?
?
?
?
m
X
i=1
x
i
y
i
?
?
?
?
?
?
?
m
X
i=1
|x
i
|
p
!1
p
?
m
X
i=1
|y
i
|
q
!1
q
. (C.2.9)
Similarly from Lyapunov?s inequality (or by selecting y
i
=1in the above):
?
?
?
?
?
m
X
i=1
x
i
?
?
?
?
?
p
?m
p?1
m
X
i=1
|x
i
|
p
,p? 1. (C.2.10)
Finally, by Minkowski?s inequality
?
?
?
?
?
m
X
i=1
x
i
+y
i
?
?
?
?
?
1
p
?
?
m
X
i=1
|x
i
|
p
!1
p
+
?
m
X
i=1
|y
i
|
q
!1
q
. (C.2.11)
Note if x
i
and y
i
are random variables, then the last three inequalities hold for
all their realizations. As a result, we can apply these inequalities also in cases
where x
i
and y
i
are stochastic. The same holds for the triangle inequality.
216
References
[1] Abraham, B., 1983, The Exact Likelihood for a Space Time Model,
Metrika, 30, 239-243.
[2] Ahn, S.C., and P. Schmidt, 1995, Efficient Estimation of Models for Dy-
namic Panel Data, Journal of Econometrics, 68, 5-27.
[3] Alonso-Borrego, C. and M. Arellano, 1999, Symmetrically Normalized
Instrumental-Variable Estimation Using Panel Data, Journal of Business
& Economic Statistics, 17, 36-49.
[4] Alvarez, J. and M. Arellano, 2003, The Time-Series and Cross-Section As-
ymptotics of Dynamic Panel Data Estimators, Econometrica, 71(4), 1121-
59.
[5] Anderson, T.W., and C. Hsiao, 1981, Estimation of Dynamic Models with
Error Components, Journal of American Statistical Association, 76, 598-
606.
[6] Anderson, T.W., and C. Hsiao, 1982, Formulation and estimation of dy-
namic models using panel data, Journal of Econometrics, 18, 47-82.
[7] Anselin, L., 1988, Spatial Econometrics: Methods and Models (Kluwer
Academic Publishers, Boston).
[8] AnselinL.andS.Hudak, 1992,SpatialEconometricsinPractice; AReview
ofSoftwareOptions, RegionalScienceandUrbanEconomics, 22,509-536.
217
[9] Arellano, M., 1989, A Note on the Anderson-Hsiao Estimator for Panel
Data, Economics Letters, 31, 337-341.
[10] Arellano, M., and S.R. Bond, 1991, Some Tests of Specification for Panel
Data: Monte Evidence and an Application to Employment Equations, Re-
view of Economic Studies, 58, 277-297.
[11] Arellano, M., and O. Bover, 1995, Another Look at the Instrumental Vari-
able Estimation of Error-Components Models, JournalofEconometrics,
68, 29-51.
[12] Audretsch, D.B., Feldmann, M.P., 1996.R&Dspilloversandthegeography
of innovation and production, American Economic Review, 86, 630-640.
[13] Baltagi, B.H., 1995 and 2002, Econometric Analysis of Panel Data Wiley,
New York.
[14] Baltagi, B.H., Li, D., 2001a. Double length artificial regressions for testing
spatial dependence, Econometric Reviews, 20, 31-40.
[15] Baltagi, B.H., Li, D., 2001b. LM test for functional form and spatial error
correlation, International Regional Science Review, 24, 194-225.
[16] Baltagi, B.H., S.H. Song and W. Koh, 2003, Testing Panel Data
Regression Models with Spatial Error Correlation, Journal of
Econometrics,117(1),123-50.
218
[17] Bartsch, H.J., 1987, Mathematische Formeln (VEB Fachbuchverlag,
Leipzig, DDR).
[18] Bernat Jr., G., 1996, Does Manufacturing matter? A spatial econometric
view of Kaldor?s laws, JournalofRegionalScience, 36, 463-477.
[19]Besley,T.,Case,A.,1995. Incumbent behavior: Vote-seeking, taxsetting,
and yardstick competition, American Economic Review, 85, 25-45.
[20] Bhargava,A.andJ.D.Sargan,1983, Estimating Dynamic Random Effects
Models from Panel Models Covering Short Time Periods, Econometrica,
51, 1635-1659.
[21] Bierens, H.J., 1994, Topics in Advanced Econometrics (Cambridge Univer-
sity Press, Cambridge, U.K.).
[22] Binder, M., C. Hsiao, and M.H. Pesaran, 2002, Estimation and Inference
in Short Panel Vector Autoregressions with Unit Roots and Cointegration,
Working Paper, University of Maryland, College Park.
[23] Binder, M., C. Hsiao, J. Mutl, and M.H. Pesaran, 2002, Computational
Issues in the Estimation of Higher-Order Panel Vector Autoregressions,
mimeo, University of Maryland, College Park.
[24] Blundell, R.W., and S.R. Bond, 1998, Initial Conditions and Moment Re-
strictions in Dynamic Panel Data Models, Journal of Econometrics,82,
135-156.
219
[25] Blundell, R.W., Smith, R.J., 1991, Initial conditions and efficient estima-
tion in dynamic panel data models, Annales d?Economie et de Statistique,
20/21, 109?123.
[26] Bollinger, C., Ihlanfeldt, K., 1997. The impact of rapid rail transit on eco-
nomic development: The case of Atlanta?s Marta, Journal of Urban Eco-
nomics, 42, 179-204.
[27] Bronnenberg, B.J. and V. Mahajan, 2001, Unobserved Retailer Behavior in
Multimarket Data: Joint Spatial Dependence in Market Shares and Promo-
tion Variables, Marketing Science, 20, 284-299.
[28] Brown, P.E., K.F. Karesen, G.O. Roberts and S. Tonellato, 2000, Blur-
Generated Non-Separable Space-Time Models, Journal of the Royal Sta-
tistical Society, Series B (Statistical Methodology), 62(4), 847-860.
[29] Buettner, T., 1999. The effect of unemployment, aggregate wages, and spa-
tial contiguity on local wages: An investigation with German district level
data, Papers in Regional Science, 78, 47-67.
[30] Case, A., 1991. Spatial patterns in household demand, Econometrica,59,
953-966.
[31] Case, A., Hines Jr., J., Rosen, H., 1993. Budget spillovers and fiscal policy
independence: Evidence from the States, Journal of Public Economics,52,
285-307.
220
[32] Chamberlain, G., 1982, Multivariate Regression Models for Panel Data,
Journal of Econometrics, 18, 5-46.
[33] Chamberlain, G., 1984, Panel data, Ch. 22 in: Z. Griliches and M. Intriliga-
tor, eds., Handbook of Econometrics, Vol. II (North-Holland, Amsterdam).
[34] Chang, Y., 2002, Nonlinear IV Unit Root Tests in Panels with Cross-
Sectional Dependency, Journal of Econometrics, 110(2), 261-92.
[35] Chen, X. and T. Conley, 2001, A New Semiparametric Spatial Model for
Panel Time Series, JournalofEconometrics, 105(1), 59-83.
[36] Cliff, A. and J. Ord, 1973, Spatial Autocorrelation (Pion, London).
[37] Cliff, A. and J. Ord, 1981, Spatial Processes,Models and Applications
(Pion, London).
[38] Conley, T., 1999, GMM estimation with cross sectional dependence, Jour-
nalofEconometrics92, 1-45.
[39] Cressie, N., 1993, Statistics of Spatial Data (Wiley, New York).
[40] Das, D., Kelejian, H.H., Prucha, I.R., 2003. Small sample properties of es-
timators of spatial autoregressive models with autoregressive disturbances,
Papers in Regional Science, 82, 1-26.
[41] Dhrymes, P.J., 1984, Mathematics forEconometrics(Springer-Verlag, New
York).
221
[42] Do?l?,Z.andO.Do?l?,1999,Differenci?ln?po?cetfunkc?v?cepromn?enn?ch
(Masarykova univerzita v Brn?e, Czech Republic).
[43] Dowd,M.R.,LeSage,J.P.,1997. Analysis of spatial contiguity influences
on state price level formation, International Journal of Forecasting,13,
245-253.
[44] Driscoll, J. and A. Kraay, 1995, Spatial Correlations on Panel Data, Policy
Research Working Paper, 1553, The World Bank.
[45] Driscoll, J. and A. Kraay, 1998, Consistent covariance matrix estimation
with spatially dependent panel data, The Review of Economics and Statis-
tics,80, 549-560.
[46] Elhorst, J.P., 2001, Dynamic models in space and time, Geographical
Analysis, 33, 119?140.
[47] G?nsler, P. andW. Slute, Wahrscheinlichkeitstheorie (SpringerVerlag, New
York).
[48] Giacomini, R. and C.W.J. Granger, 2004, Aggregation of space-time
processes, Journal of Econometrics, 118, 7-26.
[49] Hahn, J., 1999, How informative is the initial condition in the dynamic
panel model with fixed effects, Journal of Econometrics, 93, 309?326.
222
[50] Hahn, J. and G. Kuersteiner, 2002, Asymptotically Unbiased Inference for
a Dynamic Panel Model with Fixed Effects when both n, and T are Large,
Econometrica, 70(4), 1639-1657.
[51] Haining, R., 1990, Spatial data Analysis in the Social and Environmental
Sciences (Cambridge University Press, Cambridge).
[52] Hansen, L.P., 1982, Large Sample Properties of Generalized Method of
Moments Estimators, Econometrica, 50, 1029-1054.
[53] Hansen, L.P., J. Heaton and A. Yaron, 1996, Finite Sample Properties of
Some Alternative GMM Estimators, Journal of Business & Economic Sta-
tistics, 14, 262-280.
[54] Harris, R.D.F and E. Tzavalis, 1999, Inference for Unit Roots in Dynamic
Panels where the Time Dimension is Fixed, Journal of Econometrics,91,
201-226.
[55] Horn, R.H. and C.H. Johnson, 1985, Matrix Analysis (Cambridge Univer-
sity Press, Cambridge).
[56] Horn, R.H. and C.H. Johnson, 1991, Topics in Matrix Analysis (Cambridge
University Press, Cambridge).
[57] Holtz-Eakin, D., 1994. Public sector capital and the productivity puzzle,
Review of Economics and Statistics, 76, 12-21.
223
[58] Holtz-Eakin, D., W. Newey and H. Rosen, 1988, EstimatingVector Autore-
gressions with Panel Data, Econometrica, 56, 1371-1395.
[59] Hordijk, L., 1979, Problems in estimating econometric relations in space,
Papers of the Regional Science Association, 42, 99-115.
[60] Hsiao, C., M.H. Pesaran and A.K. Tahmiscioglu, 2002, Maximum Like-
lihood Estimation of Fixed Effects Dynamic Panel Data Models Covering
Short Time Periods, Journal of Econometrics, 109(1), 107-50.
[61] Judson, R.A., Owen, A.L., 1999, Estimating dynamic panel data models: a
guide for macroeconomists, Economic Letters, 65, 9?15.
[62] Karr, Alan F., 1993, Probability (Springer-Verlag, New York).
[63] Kapoor, M., H.H. Kelejian and I.R. Prucha, 2005, Panel Data Models with
Spatially Correlated Error Components, Journal of Econometrics,forth-
coming.
[64] Keane, M.P. and D.E. Runkle, 1992, On the Estimation of Panel-Data Mod-
els with Serial Correlation When Instruments Are Not Strictly Exogenous,
Journal of Business & Economic Statistics, 10(1), 1-9.
[65] Kelejian, H.H., Prucha., I.R., 1997. Estimation of spatialregressionmod-
els with autoregressive errors by Two-Stage Least Squares procedures: A
serious problem, International Regional Science Review, 20, 103-111.
224
[66] Kelejian, H.H. and I.R. Prucha, 1998, A generalized spatial two-stage least
squares procedure for estimating a spatial autoregressive model with au-
toregressive disturbances, Journal of Real Estate Finance and Economics
17, 99-121.
[67] Kelejian, H.H. and I.R. Prucha, 1999, A generalized moments estimator
forthe autoregressive parameter ina spatialmodel,International Economic
Review 40, 509-533.
[68] Kelejian, H.H. and I.R. Prucha, 2001, On the asymptotic distribution of
the Moran I test statistic with applications, Journal of Econometrics,104,
219-257.
[69] Kelejian, H.H. and I.R. Prucha, 2005, HAC Estimation in a Spatial Frame-
work, Department of Economics, University of Maryland, College Park,
mimeo.
[70] Kelejian, H.H. and I.R. Prucha, 2004, Estimation of simultaneous systems
of spatially interelated cross sectional equations, JournalofEconometrics,
118, 27-50.
[71] Kelejian, H.H., I.R. Prucha and Y. Yuzefovich, 2004, Instrumental Variable
Estimation of a Spatial Autoregressive Model with Autoregressive Distur-
bances: Large and Small Sample Results, Department of Economics, Uni-
versity of Maryland, forthcoming in J. LeSage and R.K. Pace, Advances in
Econometrics, New York: Elsevier.
225
[72] Kelejian, H.H., Robinson, D., 1997. Infrastructure productivity estimation
and its underlying econometric specifications, Papers in Regional Science,
76, 115-131.
[73] Kelejian, H.H., Robinson, D., 2000. Returns to investment in navigation
infrastructure: An equilibrium approach, Annals of Regional Science,34,
83-108.
[74] Kiviet, J.F., 1995, On bias, inconsistency and efficiency in various estima-
tors of dynamic panel data models, Journal of Econometrics, 68, 53?78.
[75] Korniotis, G.M.,2005, A Dynamic Panel Estimator with Both Fixed and
Spatial Effects, mimeo, University of Notre Dame.
[76] Kyriakidis, P.CandA.G.Journel, 1999, Geostatistical Space-Time Models:
AReview,Mathematical Geology, 31(6), 651-84.
[77] Lee, L.-F., 2001a, Generalized method of moments estimation of spatial
autoregressiveprocesses, Department ofEconomics, OhioStateUniversity,
mimeo.
[78] Lee, L.-F., 2001b, GMM and 2SLS estimation of mixed regressive, spatial
autoregressive models, Department of Economics, Ohio State University,
mimeo.
[79] Lee, L.F., 2002. Consistency and efficiency of least squares estimation for
mixed regressive, spatial autoregessive models, Econometric Theory,18,
252-277.
226
[80] Lee, L.F., 2003. Best spatial Two-Stage Least Squares estimators for a spa-
tial autoregressive model with autoregressive disturbances, Econometric
Reviews, 22, 307-335.
[81] Lee, L.-F., 2004, Asymptotic distributions of maximum likelihood estima-
tors for spatial autoregressive models, Econometrica, 72(6), 1899-1925.
[82] LeSage, J. P., 1997. Bayesian estimation of spatial autoregressive models,
International Regional Science Review, 20, 113-129.
[83] LeSage, J. P., 1999. A spatial econometric analysis of China?s economic
growth, Journal of Geographic Information Sciences, 5, 143-153.
[84] LeSage, J. P., 2000. Bayesian estimation of limited dependent variable spa-
tial autoregressive models, Geographic Analysis, 32, 19-35.
[85] LeSage, J.P and A. Krivelyova, 1999, A spatial prior for Bayesian vector
autoregressive models, Journal of Regional Science, 39(2), 297-317.
[86] LeSage,J.P. and R.K. Pace, 2004, Models for Spatially Dependent Missing
Data, Journal of Real Estate Finance and Economics, 29(2), 233-254.
[87] Mutl, J., 2006, Misspecification of Space: An Illustration Using Growth
Convergence Regressions, Workshop on Spatial Econometrics and Statis-
tics,Rome,Italy.
227
[88] Nerlove, M., 1967, Experimental Evidence on the Estimation of Dynamic
Economic Relations from a Time Series of Cross-Sections, Economic Stud-
ies Quarterly, 18, 42-74.
[89] Nerlove, M., 1967, Further Evidence on the Estimation of Dynamic Eco-
nomic Relations from a Time Series of Cross Sections, Econometrica,39,
359-387.
[90] Newey, W.K. and K.D. West, 1987, A Simple, Positive Semi-Definite Het-
eroscedasticity and Autocorrelation Consistent Covariance Matrix, Econo-
metrica, 55(3), 703-708.
[91] Nickell, S., 1981, Biases in Dynamic Models with Fixed Effects, Econo-
metrica, 16, 1-32.
[92] Pace, R., Barry, R., 1997. Sparse spatial autoregressions, Statistics and
Probability Letters, 33, 291-297.
[93] Pace, R.K., R. Barry, J. Clapp and M. Rodriquez, 1998, Spatiotemporal
Autoregressive Models of Neighborhood Effects, Journal of Real Estate
Finance and Economics,17, 15-33.
[94] Pfeifer, P.E., Deutsch, S.J., 1980, A three-stage iterative procedure for
space-time modeling, Technometrics, 22, 35?47.
[95] Pinkse, J., Slade, M.E., 1998. Contracting in space: An application of spa-
tial statistics to discrete-choice models, Journal of Econometrics, 85, 125-
154.
228
[96] Pinkse, J., Slade, M. E., Brett, C., 2002. Spatial price competition: A semi-
parametric approach, Econometrica, 70, 1111-53.
[97] P?tscher, B.M. and I.R. Prucha, 1994, Generic uniform convergence and
equicontinuity concepts for random functions: An exploration of the basic
structure, Journal of Econometrics, 60, 23-63.
[98] P?tscher, B.M. and I.R. Prucha, 1997, Dynamic Nonlinear Econometric
Models, Asymptotic Theory (Springer Verlag, New York).
[99] P?tscher, B.M. and I.R. Prucha, 2001, Basic elements of asymptotic theory.
InB.H.Baltagi, ed., ACompaniontoTheoreticalEconometrics(Blackwell,
New York), 201-229.
[100] Prucha, I.R., 1985, Maximum likelihood and instrumental variable estima-
tioninsimultaneousequationsystemswitherrorcomponents, International
Economic Review 26, 491-506.
[101] Prucha, I.R., 2004, Econ 721 Handout: Lag Operators and Difference
Equations, mimeo, University of Maryland.
[102] Prucha, I.R., 2005, Econ 624 Handout: Classical Nonlinear Econometrics
Models, mimeo, University of Maryland.
[103] Rao, C.R., 1973, Linear Statistical Inference and Its Applications (Wiley,
New York).
229
[104] Rey, S. and M. Boarnet, 1999, A taxonomy of spatial econometric models
for simultaneous systems. In L. Anselin and R. Florax, eds., Advances in
Spatial Econometrics (Springer Verlag, New York).
[105] Sargan, J.D., 1958, The Estimation of Economic Relationships Using In-
strumental Variables, Econometrica, 26, 393-415.
[106] Schmidt, P., 1976, Econometrics (Marcel Dekker, New York).
[107] Sevestre, P. and A. Trognon, 1985, A note on autoregressive error-
component models, Journal of Econometrics, 29, 231-245.
[108] Shroder, M., 1995. Games the states don?t play: Welfare benefits and the
theory of fiscal federalism, Review of Economics and Statistics, 77, 183-
191.
[109] Stoffer, D.S., 1986, Estimation and Identification of Space-Time ARMAX
ModelsinthePresenceofMissingData,JournaloftheAmericanStatistical
Association, 81(395), 762-62.
[110] Trognon, A., 1978, MiscellaneousAsymptoticPropertiesofOrdinaryLeast
Squares and Maximum Likelihood Estimators in Dynamic Error Compo-
nent Models, Annales de l?INSEE, 30, 632-657.
[111] Vigil, R., 1998.Interactionsamongmunicipalitiesintheprovisionofpolice
services: A spatial econometric approach, University of Maryland, Ph.D.
Thesis.
230
[112] Whittle, P., 1954, On stationary processes in the plane, Biometrica 41, 434-
449.
[113] Yang, Z., 2005, Quasi-Maximum Likelihood Estimation for Spatial Panel
Data Regressions, mimeo, Singapore Management University.
[114] Zellner, A., 1962, An efficient method of estimating seemingly unrelated
regressions and tests for aggregation bias, Journal of the American Statis-
tical Society, 57, 348-368.
231