ABSTRACT
Title of Dissertation: LIVE AND FIVE ESTIMATION OF SIMULTANEOUS
EQUATIONS MODELS WITH HIGHER-ORDER
SPATIAL AND SOCIAL INTERACTIONS
Jiankun Chen
Doctor of Philosophy, 2022
Dissertation Directed by: Professors Ingmar Prucha and Andrew Sweeting
Department of Economics
The first part of the dissertation introduces a new class of limited and full information GMM
estimators for simultaneous equation systems (SEM) with network interdependence modeled by
Cliff-Ord type spatial lags (Cliff and Ord (1973, 1981)). We consider the same model specifi-
cation as that in Drukker, Egger, and Prucha (2022) and allow for higher order spatial lags in
the dependent variables, the exogenous variables and the disturbances. The network is defined
in terms of a measure of proximity and can accommodate a wide class of dependence structures
that may appear in both micro and macro economic settings. We show that the scores of the
log-likelihood function can be viewed as a weighted sum of linear and quadratic components
that motivate valid moment conditions. One contribution of this dissertation is showing that the
linear moments can be written to permit instrumental variable (IV) interpretation, extending on
the existing results in the context of classical SEMs. Towards constructing the linear moments,
the instruments exploit the nonlinear structure of the parameters implied by the reduced form
model, while those utilized by the existing 2SLS- and 3SLS-type estimators do not. From this
perspective, the new estimation methodology incorporates the ideas underlying the LIVE and the
FIVE estimators in Brundy and Jorgenson (1971) for classical SEMs, as well as the IV estimators
using optimal instruments for spatial autoregressive (SAR) models. In addition to the linear IV
estimators, we also consider one-step GMM estimators that utilize both the linear and quadratic
moments implied by the scores. Our new LIVE and FIVE estimators for the network SEMs re-
main computationally feasible even in large sample and are robust against heteroskedasticity of
unknown form. Monte Carlo simulations show that the new estimators in general outperform the
existing 2SLS- and 3SLS-type estimators for this class of models when the instruments are weak.
In the second part of the dissertation, we estimate the consumer demand for gasoline in the
market of Vancouver, Canada. We employ a demand system with a spatial network component,
utilizing the model and the estimation methods considered in the first part. Demand elasticity
for gasoline at aggregate level are well documented in the literature, while estimates at station
level are relatively scarce. We estimate the station-level demand elasticities as well as (spatial)
elasticity of substitution under a variety of network structures based on different proximity mea-
sures. We collected station-level data on retail prices, sales volume, station characteristics of
the 151 stations, as well as the characteristics of local markets, for September 2019 as well as
March 2020. To deal with the endogeneity of prices, existing works typically exploit variations
in the characteristics of each station?s direct competitors. We argue that in a geographically con-
tinuous market, this strategy may not be sufficient. In spirit of Fan (2013), our instruments also
exploit the variations in the characteristics of the competitors of each station?s competitors (indi-
rect competitors). We find that the own-price demand elasticity is between ?12 and ?4 while the
cross-station price elasticity is in general between 0.6 ? 6, depending on the construction of the
network matrices that governs the degree of competition. We also report the impact measures that
provides interpretations on the estimated coefficients of the exogenous variables in the context of
spatial network models. We find that the availability of service station in general have contributed
positively on the sales volume at a station. In general, a station located within a neighborhood of
more drivers face stronger demand.
LIVE AND FIVE ESTIMATION OF SIMULTANEOUS EQUATIONS
MODELS
WITH HIGHER-ORDER SPATIAL AND SOCIAL INTERACTIONS
by
Jiankun Chen
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2022
Advisory Committee:
Professors Ingmar Prucha and Andrew Sweeting, Co-Chairs
Professor John Chao
Professors Guido Kuersteiner
Professor Haluk U?nal
? Copyright by
Jiankun Chen
2022
Dedication
To my mother and father, Shumei Li (???) and Fengyu Chen (???) for their endless
love and encouragement.
To my aunt and uncle, Shuling Li (???) and Feng Yan (??), for their unconditional
support since my childhood.
To Alan Taylor and Kenneth Elzinga, who inspired and encouraged me to embark on the
journey of economic research.
ii
Acknowledgments
I would like to express my sincerest gratitude to my advisors, Professors Ingmar Prucha and
Andrew Sweeting. I am deeply indebted for their invaluable support, encouragement and patience
during my research. This dissertation would not have been possible without their guidance.
I would also thank Professors John Chao, Guido Kuersteiner, and Haluk U?nal for serving on
my committee and offering their helpful comments.
My gratitude also goes to all the faculty, staff and students in the economics department at
the University of Maryland. They make the department a friendly and supportive place.
iii
Table of Contents
Dedication ii
Acknowledgements iii
Table of Contents iv
List of Tables vii
List of Figures viii
Chapter 1: Introduction 1
Chapter 2: LIVE and FIVE Estimation of the Simultaneous Equation Models with
Higher Order Spatial/Social Interactions 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Structural Form Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Reduced Form and Structural Model with Exclusion Restrictions . . . . . 20
2.2.3 Model Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Maximum Likelihood Estimation and Estimator Generating Equations . . . . . . 26
2.3.1 Scores of the Log-likelihood Function . . . . . . . . . . . . . . . . . . . 26
2.3.2 IV Interpretation and Estimator Generating Equations . . . . . . . . . . 31
2.3.3 Connection to LIVE/FIVE and Optimal IV Estimation . . . . . . . . . . 32
2.4 Moment Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.1 Heteroskedasticity-robust Moment Conditions . . . . . . . . . . . . . . 38
2.4.2 Approximated Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 LIVE and FIVE Estimators for Network SEM . . . . . . . . . . . . . . . . . . . 49
2.5.1 Limited Information Estimators . . . . . . . . . . . . . . . . . . . . . . 51
2.5.2 Full Information Estimators . . . . . . . . . . . . . . . . . . . . . . . . 56
2.6 Identification Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.6.1 Scenario I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.6.2 Scenario II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.7 Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.7.1 Data Generation Process . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.7.2 Implemented Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.7.3 Performance of Estimators under Strong and Weak Identifications Sce-
nario I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
iv
2.7.4 Performance of Estimators under Strong and Weak Identifications Sce-
nario II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.7.5 Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.7.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Chapter 3: Empirical Application: Demand Estimation for Retail Gasoline Market with
Network Dependence 91
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.3.1 Theoretical Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.3.2 Econometric Specification . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.5 Instruments and Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.6 Estimation Results and Impact Measures . . . . . . . . . . . . . . . . . . . . . . 115
3.6.1 Main Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.6.2 Impact Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Chapter A: Appendix to Chapter 2 125
A.1 Appendix: Example expression of EYn . . . . . . . . . . . . . . . . . . . . . . 125
A.2 Proofs of Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.2.1 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.2.2 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
A.2.3 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
A.2.4 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.2.5 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
A.3 Explicit Expressions of VCV matrices . . . . . . . . . . . . . . . . . . . . . . . 153
A.3.1 Explicit Expression of ??qgg,n(g) . . . . . . . . . . . . . . . . . . . . . . 153
A.3.2 Explicit Expression of ??qn . . . . . . . . . . . . . . . . . . . . . . . . . 156
A.4 Additional Monte Carlo Results . . . . . . . . . . . . . . . . . . . . . . . . . . 161
A.4.1 Results with Alternative Wn?s . . . . . . . . . . . . . . . . . . . . . . . 161
A.4.2 Correlated x.k?s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Chapter B: Appendix to Chapter 3 168
B.1 Theoretical Motivation for the Demand Equation . . . . . . . . . . . . . . . . . 168
B.2 Edgeworth Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
B.2.1 Retail Margins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
B.2.2 Markov Switching Regression (MSR) . . . . . . . . . . . . . . . . . . . 173
B.3 Test for IV power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
B.4 Additional Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
B.5 Impact Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
B.6 Test for Network Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
v
Bibliography 191
vi
List of Tables
2.1 Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 1 . . 75
2.2 Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 2 . . 76
2.3 Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 3 . . 77
2.4 Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 1 . 81
2.5 Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 2 . 82
2.6 Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 3 . 83
2.7 Median and RMSE of Scenario I under Heteroskedasticity, Parameter Constella-
tion 1-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.1 Summary Statistics of Retail Prices, Sales Volume . . . . . . . . . . . . . . . . . 108
3.2 Summary Statistics of Station Characteristics . . . . . . . . . . . . . . . . . . . 110
3.3 Correlation of Station Characteristics in Neighboring Markets . . . . . . . . . . 113
3.4 First-stage OLS Regression and tests for IV power . . . . . . . . . . . . . . . . 116
3.5 Estimation Results with W based on common boundaries . . . . . . . . . . . . . 122
3.6 Impact Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
A.1 Median and RMSE of Scenario I, homoskedasticity, alternative Wn?s . . . . . . . 163
A.2 Median and RMSE of Scenario II, homoskedasticity, alternative Wn?s . . . . . . 164
A.3 Median and RMSE of Scenario I, homoskedasticity, correlated Xn . . . . . . . . 166
A.4 Median and RMSE of Scenario II, homoskedasticity, correlated Xn . . . . . . . . 167
B.1 Within-Regime Estimates and Expected Duration in Days, September 2019 . . . 175
B.2 Within-Regime Estimates and Expected Duration in Days, September 2020 . . . 176
B.3 Switching Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
B.4a First-stage OLS Regression and tests for IV power . . . . . . . . . . . . . . . . 179
B.4b First-stage OLS Regression and tests for IV power (continued) . . . . . . . . . . 180
B.6 Estimation Results with W2,n (2-miles radius) . . . . . . . . . . . . . . . . . . . 181
B.7 Estimation Results with W3,n (common boundary and reciprocal of the travel
distance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
B.8 Estimation Results with W4,n (nearest neighbor) . . . . . . . . . . . . . . . . . . 183
B.9 Estimation Results with W5,n (common street) . . . . . . . . . . . . . . . . . . . 184
B.10 Estimation Results with W6,n (hybrid measure of common street and travel distance)185
B.11 Definition of Regression Variables . . . . . . . . . . . . . . . . . . . . . . . . . 186
B.12 Impact Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
B.13 Test for Network Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
vii
List of Figures
3.1 Market Area based on Common Boundaries . . . . . . . . . . . . . . . . . . . . 104
3.2 Dynamics of Average Retail Price . . . . . . . . . . . . . . . . . . . . . . . . . 109
B.1 Average Retail Margin Computed with Spot Rack Price, Rack Price of 5 and 10
Days? Lead, Aug - Nov, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
B.2 Average Retail Margin Computed with Spot Rack Price, Rack Price of 5 and 10
Days? Lead, Feb - Apr, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
viii
Chapter 1: Introduction
Empirical literature has documented substantial evidence of cross sectional interdependence
among observations both at the micro and at the macro level. At the micro level, cross sectional
units are often individuals, firms or plants, whereas at the macro level the units could be states
or countries. These works often utilize data with spatial features and/or employ models allow for
network structures among cross-sectional units. Due to the large size of relevant literature, we
only mention a few here. Using county level data, Gallardo, Whitacre, Kumar, and Upendram
(2021) study the impact of broadband access on job productivity. Nakamura and Avner (2021)
analyze the spatial distributions of job accessibility and housing rents in Nairobi, Kenya. Fingle-
ton and Szumilo (2019) estimate the impact of investment on high-speed rail infrastructure on
wage levels in England and Wales. In their analysis on causal effects of democracy on economic
growth, Acemoglu, Naidu, Restrepo, and Robinson (2019) model explicitly spatial correlation
among variables (e.g., GDP and shocks) to control for regionally correlated omitted factors. At
the micro level, Pinkse, Slade, and Brett (2002) estimate spatial price competition among whole-
sale gasoline terminals. With bank-level data of euro area countries, Gibson, Hall, Petroulas,
Spiliotopoulos, and Tavlas (2020) analyze the spillover effects on other banks when providing
emergency liquidity assistance (ELA) to one bank during the euro area crisis. There are also
works that apply spatial models to study social networks, include Ballester, Calvo?-Armengol,
1
and Zenou (2006), Lee (2007), Calvo?-Armengol, Patacchini, and Zenou (2009), Blume, Brock,
Durlauf, and Ioannides (2011), Liu (2014) and Cohen-Cole, Liu, and Zenou (2018). 1
The first part of the dissertation considers the estimation of simultaneous equation models
(SEM) with network interdependence. An important class of models for spatial networks origi-
nates from the single equation model introduced by Cliff and Ord (1973, 1981). This model can
be viewed as a variant of the model introduced by Whittle (1954) and is often referred to as a
spatial autoregressive (SAR) model; see, e.g., Anselin (1988). In SAR models, cross-sectional
interactions are modeled through Cliff-Ord type spatial lags.2 We consider the same model spec-
ification as that in Drukker, Egger, and Prucha (2022) and allow for higher order spatial lags in
the dependent variables, the exogenous variables and the disturbances. Following their termi-
nology, we would refer to this class of models as the simultaneous spatial autoregressive model
with spatially autoregressive disturbances (SE-SARAR) model. In this dissertation, we consider
a new class of generalized method of moments (GMM) estimators that utilize approximations to
the optimal instruments towards constructing the linear moments. We also consider the GMM
estimators that utilize both the linear moments and the quadratic moments that originate from the
scores of the log-likelihood function. More specifically, the new estimators build on three lines
of the literature. First, we show that the linear parts of the ML scores can be viewed as a set of
1In addition, Baltagi and Bresson (2011), Jeanty, Partridge, and Irwin (2010) study housing price spillovers at the
county level. Hauptmeier, Mittermaier, and Rincke (2012) focus on fiscal competition over taxes and public input.
Agrawal (2015) studies spatial fiscal competition in the local tax rates near district borders. Conley and Ligon (2002),
Ertur and Koch (2007) document spatial spillover in economic growth. Behrens, Ertur, and Koch (2012) suggest that
bilateral trade flows exhibit spatial interdependence, and Baltagi, Egger, and Pfaffermayr (2007), Baltagi, Egger, and
Pfaffermayr (2008) and Blonigen, Davies, Waddell, and Naughton (2007) find similar pattern in bilateral foreign
direct investment. Ertur and Musolesi (2017) examine the technological knowledge spillovers among countries.
Some recent works address the relation between COVID-19 and economic activities. For example, Ascani, Faggian,
and Montresor (2021) used Italian data to show that geographical concentration of economic activity in specific areas
of the country acts as a vehicle of disease transmission and thus generates a core-periphery pattern in the geography
of COVID-19. Lee and Huang (2022) analyze the shifting housing preferences during the pandemic in the United
States.
2In the introduction part of Chapter 2, we explain the term spatial lags in details.
2
estimator generating equations from which a generic form of instrumental variable (IV) estima-
tors can be derived. This result extends on its relevant counterparts in Hausman (1975), Hendry
(1976) and Prucha and Kelejian (1984) in the context of classical SEMs. Second, the new estima-
tors incorporate the underlying ideas of the LIVE and the FIVE estimators proposed by Brundy
and Jorgenson (1971) in the context of classical SEMs. The LIVE and the FIVE estimators can
be viewed as a specific form of the limited information and the full information IV estimators,
respectively. Compared to the 2SLS and the 3SLS estimators, the LIVE and the FIVE differ in
the approach of estimating the expected values of the endogenous variables towards constructing
instruments and utilize better the information of the nonlinear parameter restrictions implied by
the reduced form model. The new estimators considered in this dissertation share this feature of
the LIVE and the FIVE estimators. Finally, the estimation methodology considered in this pa-
per also relates to the instrumental variable estimators with optimal instruments in the context of
single equation spatial autoregressive (SAR) models. Early contribution to this line of literature
includes Lee (2003) and Kelejian, Prucha, and Yuzefovich (2004). Furthermore, our new GMM
estimators that utilize both the linear and the quadratic moments remain robust to heteroskedas-
ticity of unknown form and computationally feasible even when sample size (i.e., the size of
the networks) becomes large. The Monte Carlo results show that the new GMM estimators out-
perform their existing counterparts, e.g., the 2SLS-type and 3SLS-type estimators considered in
Drukker, Egger, and Prucha (2022), when the instruments are weak.
In the second part of this dissertation, we estimate the consumer demand for retail gaso-
line in the market of Vancouver, Canada. We employ a demand system with a spatial network
component, utilizing the model and the estimation methods considered in the first part. Demand
elasticity for gasoline at aggregate level are well documented in the literature, while estimates
3
at station level are relatively scarce. We estimate the station-level demand elasticities as well
as (spatial) elasticity of substitution under a variety of network structures based on different
proximity measures. We collected station-level data on retail prices, sales volume, station char-
acteristics of the 151 stations, as well as the characteristics of local markets, for September, 2019
and March 2020. We obtained the sales volume data from Kalibrate. The price data were col-
lected from Gasbuddy.com at daily frequency and then aggregated to monthly frequency.3 The
data on the station and regional characteristics are provided by Kalibrate?s survey and Census
Canada 2016. Towards constructing the network matrices, we also collected the traffic and geo-
graphical information using Google Map API. We estimate the station-level demand elasticities
as well as (spatial) elasticity of substitution under a variety of network structures based on differ-
ent proximity measures. To deal with the endogeneity of prices, existing works typically exploit
variations in the characteristics of each station?s direct competitors. We argue that in a geograph-
ically continuous market, this strategy may not be sufficient. Following Fan (2013)?s arguments,
we additionally exploit the variations in the characteristics of the competitors of each station?s
competitors (indirect competitors). We find that the own-price demand elasticity is between ?12
and ?6 while the cross-station price elasticity in general between 0.6 ? 6, depending on the
specific construction of the network matrices that governs the degree of competition. These esti-
mates are largely consistent with economic theory, but of smaller magnitude than those reported
in some recent works, e.g., Houde (2012). One possible explanation is that we build networks
based on different sources of information. Houde (2012) exploits heavily on the local traffic pat-
tern and road structure and thus allow a station to compete with another located far away if they
are located on a main commute route or a segment of highway. Our approach of constructing
3In Chapter 3, we show that there is no significant cyclical pattern in prices during the sample periods.
4
the spatial network matrices lies in line with the majority of existing literature. 4 Our proximity
measures exploit the information of stations? neighborhoods or the pair-wise travel distance be-
tween neighboring stations. Thus, we implicitly assume that the competition is largely local. We
also computed the impact measures that provides interpretations on the estimated coefficients of
the exogenous variables in the context of spatial network models. We find that the availability
of service station in general has contributed positively on the sales at station level. In general, a
station located within a neighborhood of more drivers face stronger demand.
The organization of the dissertation is as follows. Chapter 2 proposes the new class of GMM
estimators for the simultaneous equations model with network dependence. Chapter 3 illustrates
the empirical relevance of the first chapter by estimating the demand system with network depen-
dence for the retail gasoline market in Vancouver, Canada. We relegate the proofs and technical
details of Chapter 2 to Chapter A. Additional results to Chapter 3 are documented in Chapter B.
4See, e.g., Pennerstorfer (2009), Pennerstorfer and Weiss (2013), Pinkse, Slade, and Brett (2002), among others.
5
Chapter 2: LIVE and FIVE Estimation of the Simultaneous Equation Models
with Higher Order Spatial/Social Interactions
In this chapter, we introduce a new class of GMM estimators for simultaneous equation
models (SEM) of cross sectional data with network interactions in the dependent variables, the
exogenous variables and the disturbances. The model and the proposed estimators are applicable
to a wide class of networks, including spatial networks and social networks.
2.1 Introduction
In spatial autoregressive (SAR) models, cross-sectional network interactions are modeled
through Cliff-Ord type spatial lags, which are weighted averages of the model variables. For a
network consists of n cross-sectional units, a (first-order) spatial lag in the endogenous variable
yn (n ? 1) is Wnyn, where Wn (n ? n) is often referred to as the spatial weights matrix.1 The
weights reflect the relative importance of the links between cross-sectional units in generating
network interactions. In the context of geographical networks, the elements in Wn are typically
based on some measure of distance between units, e.g., the inverse of Euclidean distance. We
emphasize that, however, the measure of proximity is not necessarily indexed by the locations of
the units and thus the notion of distance is not confined to geographical distance. The flexibility
1In the (social) network literature, this Wn matrix is sometimes referred to as the adjacency matrix.
6
in constructing the measures of proximity makes the model capable of describing a wide variety
of networks. For example, in the context of social networks, proximity measures could depend
on the number of friends that each person has in the sample. Spatial econometrics has a long
history in regional science, urban economics and development; see, e.g., Anselin (1988). The
development of econometric methods of estimation and inference for Cliff-Ord type models has
also attracted lots of attention.2
While most theoretical works in the spatial econometrics literature focused on single-equation
models, the literature on the estimation of simultaneous systems of spatially interrelated cross
sectional equations is relatively scarce. In economics, different outcome variables are frequently
determined jointly within a system of equations. In the context of a network SEM, the simulta-
neous nature of the outcomes can stem from both the interactions between different endogenous
economic variables (i.e., the classical simultaneous effect), as well as the interaction between
cross sectional units (i.e., the network generated interdependence). Extending on the methodol-
ogy developed in Kelejian and Prucha (1998, 1999) for single equation SAR models, Kelejian
and Prucha (2004) provide an early development of GMM estimators for a SE-SARAR model
with first order spatial lags (i.e., model allows for one spatial weights matrix Wn). Liu (2014,
2019, 2020), Cohen-Cole, Liu, and Zenou (2018) and Liu and Saraiva (2019) build and extend
on Kelejian and Prucha (2004) in estimation methodology and identification conditions within
the context of social interaction models with first order spatial lags, and cross-sectionally inde-
pendent disturbances. More specifically, Liu (2014) focuses on IV-based linear estimators and
shows that the Bonacich centrality (Bonacich (1987)) provides additional information to iden-
tify peer effects and can be used as an instrumental variable. Liu (2020) extends on Liu (2014)
2See, e.g., Anselin (2010) for a review of the development of spatial econometric methods.
7
and considers one-step GMM estimation methods that utilize both the linear and the quadratic
moment conditions. Liu (2014, 2020) also provide bias correction procedures with many instru-
ments. Liu and Saraiva (2019) consider one-step GMM estimation based on both the linear and
quadratic moments and allows for heteroskedasticity of unknown form. Cohen-Cole, Liu, and
Zenou (2018) consider the identification and estimation of social interaction effects in the con-
text of multivariate choices. Liu (2019) proposes an estimation methodology for a simultaneous
system of equations with binary outcomes generated from an incomplete information network
game. Drukker, Egger, and Prucha (2022) further extend on the methodology of Kelejian and
Prucha (2004) to a more general model specification that allow for higher order spatial lags in
the dependent variables, the exogenous variables and the disturbances. In addition to 2SLS- and
3SLS-type estimators, they also consider one-step limited and full information GMM estimators
which utilize both linear and quadratic moments and remain robust against heteroskedasticity.
The model specification in the current chapter is the same as that in Drukker, Egger, and Prucha
(2022), which, to the best of our knowledge, is more general than the existing cross-sectional
SEMs with network interactions in the literature. Other recent contributions to the literature
on spatial simultaneous equation models include Baltagi and Deng (2015), who consider an ex-
tension of a two-equation system with first-order spatial lags to panels. Wang, Li, and Wang
(2014) analyze the quasi maximum likelihood (QML) estimator for a two-equation system with
first-order spatial lags in the cross section. Yang and Lee (2017) consider the quasi maximum
likelihood estimator for a multi-equation system with a first-order spatial lag in the dependent
variable. Yang and Lee (2019) provide an extension to dynamic panel data models allowing
for multiple weight matrices. In contrast to this dissertation,these papers either consider only
first-order spatial lags in the dependent variable, and/or do not allow for spatial spillovers in the
8
disturbance process. In the Section 2.2, we will explain that allowing for higher order spatial
lags brings considerable flexibility and robustness to the model. Also will be explained below,
this dissertation also differs from the above cited literature in terms of estimation methodology.
In particular, we considers an alternative way of constructing the IVs and the moment conditions,
and consider both two-step estimation procedures and one-step GMM estimators.
The new class of GMM estimators considered in this chapter builds on three lines of the
literature. First, in the context of classical SEM, Hausman (1975) shows that ML estimator can
be written to carry IV interpretation where the instruments embody all the a priori restrictions.
Hendry (1976) and Prucha and Kelejian (1984) demonstrate that the normal equations of ML
estimators can be viewed as an estimator generating equations and IV estimators can be viewed
as numerical approximations to the solutions of this nonlinear system. These insights carry over
to the present simultaneous equation system with network dependence. In Section 2.3, we extend
on these insights and show that the linear parts of the ML scores can also be viewed as a set of
estimator generating equations from which IV estimators can be derived.
The new estimators also incorporate the underlying ideas of the LIVE and the FIVE esti-
mators proposed by Brundy and Jorgenson (1971) in the context of classical SEM, which can be
viewed as a specific form of the limited information and the full information IV estimators, re-
spectively. Compared to the 2SLS and the 3SLS estimators, the LIVE and the FIVE differ in the
approach of estimating the expected values of the endogenous variables towards constructing in-
struments. In the first stage, 2SLS (and 3SLS) estimates for the expected value of the endogenous
variables implied by the reduced form model by OLS, without imposing parameter restrictions.
Alternatively, Brundy and Jorgenson (1971) utilize the specific form of the reduced form model
and compute reduced form parameters with some consistent initial estimates. In this way, their
9
instruments exploit the underlying nonlinear structure of parameters in the reduced form while
2SLS and 3SLS do not. In finite sample, the LIVE and the FIVE estimators may be more efficient
than the 2SLS and the 3SLS, respectively. The 2SLS- and 3SLS-type estimators considered in
Kelejian and Prucha (2004) and Drukker, Egger, and Prucha (2022) also estimate the expected
values of the endogenous variables in an unrestricted fashion. While numerically convenient,
this procedure ignores the underlying nonlinear structure of the parameters in the reduced form
and hence loses efficiency. This motivates the exploration of estimators with more carefully con-
structed instruments for this class of network systems. To indicate the links to the LIVE and the
FIVE estimators in the context of classical SEMs, we would refer to the new limited and full
information IV estimators for SEMs with network dependence as the Generalized Spatial LIVE
(GSLIVE) estimator and the Generalized Spatial FIVE (GSFIVE) estimator, respectively.
Finally, the estimation methodology considered in this chapter relates to the instrumental
variable estimators with optimal instruments in the context of single-equation spatial autore-
gressive (SAR) models. As an early contribution to estimation methodology for this class of
models, Kelejian and Prucha (1998) introduce 2SLS and 3SLS type estimators, noting that the
mean of endogenous variable yn depends on Xn, W X , W 2n n nXn and so on. To obtain the instru-
me[nts, they regress right-hand side v]ariables over the collection of linearly independent columns
in Xn,WnX ,W 2X , . . . ,W Sn n n n Xn for some finite order S. This resembles the first stage of
classical 2SLS estimator and thus, in light of the above discussion, also ignores the nonlinear
structure of the reduced form model. Alternatively, Lee (2003) constructs the IV by estimat-
ing for the optimal instruments of endogenous variables (i.e., Eyn) implied by the reduced form
model with some consistent initial estimates. 3 However, their instruments involve inversions
3Specifically, they considered initial estimates obtained by the GS2SLS estimator in Kelejian and Prucha (1998).
10
of n ? n matrices and thus become computationally infeasible in large sample. To cope with
this challenge, Kelejian, Prucha, and Yuzefovich (2004) approximate for the inverses embedded
in Eyn with corresponding geometric sums of finite order.4 The estimators considered in this
chapter share the same spirit with Kelejian, Prucha, and Yuzefovich (2004) in constructing the
instruments for the endogenous regressors, i.e., approximate for Eyn implied by the reduced form
with consistent initial estimates and geometric series of finite order. On a more general level, the
IV estimators considered Brundy and Jorgenson (1971), Lee (2003) and Kelejian, Prucha, and
Yuzefovich (2004) all utilize the nonlinear structure of parameters in Eyn implied by the reduced
form when construct the instruments.
In addition to the GSLIVE and the GSFIVE estimator that utilize the linear moments, we
would also consider one-step GMM estimators that utilize both the linear and the quadratic mo-
ments that originates from the scores of the log likelihood function. The spatial literature has
long recognized the value of quadratic moments in identifying spatial parameters. As suggested
in Kelejian and Prucha (1998), the linear moment conditions may fail to identify the regression
parameters.5 An extreme case of such scenario is when the true parameter values on the ex-
ogenous variables are all zeros. In these cases, the ML estimator, whose scores consist of both
the linear and the quadratic components, may still remain consistent in estimating the regres-
sion parameters. We note that the quadratic components of the scores represent valid quadratic
moment conditions. 6 Building on Kelejian and Prucha (1999), Kelejian and Prucha (2004)
4Note that in Kelejian, Prucha, and Yuzefovich (2004), they allow for the order of approximation in the geometric
sums goes to infinity and thus their estimator is asymptotically equivalent to that considered in Lee (2003).
5For example, Kelejian and Prucha (1998) suggest a case in which the weight matrix is row normalized and the
only exogenous regressor is a constant.
6Specifically, quadratic components of the scores can be showed to be of the form: ?nAn?n, where ?n is a
n ? 1 vector of disturbances and An being nonstochastic. Under homoskedasticity (E? ?n?n = ?nIn), imposing
tr(An) = 0 implies E?nAn?n = ?ntr(An) = 0. Under heteroskedasticity, ones needs to further assume An to be
of zero diagonal for these quadratic moments to be valid. See Proposition 3 and Lemma 1 for more discussion.
11
suggest a GMM procedure with quadratic moments to estimate the spatial autoregressive pa-
rameter in the disturbance process with the disturbances correlated across equations.7 In this
chapter, we also extend the GSFIVE estimator to the Linear-Quadratic Generalized Spatial FIVE
(LQ-GSFIVE) estimator by complementing the linear moments with the moments implied by
the quadratic components of the scores. Analogously, we extend the GSLIVE estimator to the
Linear-Quadratic Generalized Spatial LIVE (LQ-GSLIVE) by incorporating quadratic moments
but ignoring the cross-equation error structure. One major difference to the existing estimators is
that the quadratic moments utilized in this dissertation explicitly take into account the nonlinear
structure of the structural parameters implied by the quadratic components of the scores of the
log-likelihood function. However, while ML is asymptotically efficient under normality, it is in
general inconsistent under heteroskedasticity. This drawback passes onto the quadratic moments
implied by the quadratic scores. To cope with this drawback, we further modify on the quadratic
moments and derive their heteroskedasticity-robust counterparts. As such, our GMM estimators
that utilize both the linear and the quadratic moments remain consistent under heteroskedastic
disturbances.
Relative to the ML estimator, another advantage of the new estimators is that they remain
feasible when sample size gets large. As the scores of log-likelihood function depend on the
inverses of n ? n matrices where n is the sample size, ML procedures are not feasible when n
becomes large. Note that n also determines the size of the network in the spatial/social network
7Acknowledging that the quadratic moments could help with identifying spatial autoregressive parameters, Lee
(2007) and Lin and Lee (2010) both considered GMM estimator with a single vector consists of both the linear and
quadratic moments for single equation SAR models. Kuersteiner and Prucha (2020) built on this idea and showed
that it can help with the weak identification problem of linear moments. For a simultaneous system, Liu and Saraiva
(2019), Drukker, Egger, and Prucha (2022) considered GMM estimators utilizing both linear and quadratic moments
that remain consistent when linear moments are not sufficient for identification. Their quadratic moments exploit the
covariance structure of model disturbances both within and across equations.
12
models. For instance, there are over 3, 000 counties and 70, 000 census tracts in the U.S. To
address this difficulty, we approximate for the inverses embedded in the instruments (and moment
conditions) with geometric series of finite order. In the context of single equation SAR models,
this idea is utilized by Kelejian, Prucha, and Yuzefovich (2004) in order to cope with the large n
problem exists in the 2SLS-type estimator with optimal instruments considered in Lee (2003).
The rest of this chapter is organized as follows. In Section 2.2, we specify the considered
simultaneous equation system with cross-sectional network interactions. We would also discuss
model assumptions. In Section 2.3, we derive the scores of the log-likelihood function and show
that the linear and quadratic parts of the scores represent valid moment conditions. We would
also show that the linear moments carry an IV interpretation, which in turn motivate the generic
form of linear IV estimators. In Section 2.4, we present the heteroskedasticity-robust limited
and full information linear and quadratic moment conditions and their approximated versions
with the inverse matrices replaced by the corresponding geometric series. In Section 2.5, we
define and discuss the implementation details of the GSLIVE and the GSFIVE estimators, as
well as the LQ-GSLIVE and the LQ-GSFIVE estimators. We also provide a brief discussion
on the identification of the model in Section 2.6, which would motivate the strong and the weak
identification scenarios that appear in our Monte Carlo simulations. Section 2.7 documents the
Monte Carlo results and Section 2.8 concludes. All proofs and auxiliary discussions are relegated
to Chapter A.
Notations: We adopt the following notations throughout the dissertation. Let (An)n?N be
some sequence of n ? m matrices, then we denote the (i, j)-th element of An with aij,n. The
i-th row and j-th column with ai.,n and a.j,n. The j-th column of the n ? n identity matrix In
will be denoted as ij,n. If An is nonsingular, we denote the (i, j)-th element of A?1n as a
ij
n . More
13
generally, if An is a blocked matrix of size nG ? nP with G ? P blocks, we then denote the
(g, h)-th n ? n block as Agh,n with g = 1, ..., G and h = 1, ..., P . Borrowing from the standard
matrix notations, we let Ag.,n denote the g-th n ? nP block of An and let A.h,n denote the h-th
nG ? n block of An. Moreover, if G = P , we denote the (g, h)-th block of A?1 as Aghn n with
g, h = 1, ..., G. We further let Ag.n denote the g-th n ? nG block of A?1n and let A.hn den{ote th}e
h-th nG ? n block of A?1n . For any n ? n matrix A nn = (aij,n) we denote with diagi=1 aii,n
the n? n diagonal matrix with the (i, i)-th ele{ment b}eing aii,n. Analogously, for some nG? nG
matrix An = (A Ggh,n) we denote with diagg=1 Agg,n the diagonal block matrix with the (g, g)-th
matrix block being Agg,n.
Let An be of dimension mn ?mn with elements potentially depend on sample size n, then
the maximum column sum and row sum matrix norms of An are, respectively, defined as
?mn ?mn
?An?1 = max |aij,n| and ?An?? = max |aij,n|.1?j?mn 1?j?mn
i=1 j=1
If ?An?1 ? c and ?An?? ? c for some finite constant c which does not depend on n, then we say
that the row and column sums of the sequence of matrices An are uniformly bounded in absolute
value.
2.2 Model
In this section we specify our simultaneous system of G equations for G endogenous vari-
ables observed for n cross sectional units. We consider the same model as Drukker, Egger, and
Prucha (2022), but will develop new estimation methods for that model. We note that our presen-
tation and discussion of the model follows very closely and/or duplicates that of Drukker, Egger,
14
and Prucha (2022).
In our model, simultaneity can stem from two sources. The first source is the classical
simultaneity across equation that captures the dependence of the g-th endogenous variable for
the i-th unit on the other endogenous variables for the i-th unit. The other source stems from
Cliff and Ord (1973, 1981) type cross-sectional network interactions between cross-sectional
unit i and j, which are modeled in the form of ? spatial lags?. The model specification is fairly
general and allows for multiple network structures, in the form of higher order spatial lags in the
endogenous variables, in the exogenous variables as well as in the disturbances.
2.2.1 Structural Form Model
We assume that the cross-sectional data of n units are generated by the following system
(g = 1, ..., G):
? ? ???G K G ? ?Py ?g,n = b y? lg l,n
+ c
? kg
xk,n + ?lg,pWp,n yl,n + ug,n, (2.1)
l=?1 k=1 l=1 p=1
u = ? Qg,n ? ?g,qMq,n ug,n + ?g,n,
q=1
where yg,n is the n?1 vector of cross-sectional observations on the dependent variable in the g-th
equation, xk,n is the n? 1 vector of cross-sectional observations on the k-th exogenous variable,
ug,n is the n?1 disturbance vector in the g-th equation, and ?g,n is the n?1 vector of innovations
entering the disturbance process for the g-th equation. With blg and ckg we denote the scalar
parameters corresponding to the l-th endogenous and k-th exogenous variables, respectively, of
the g-th equation. In general, the structural model parameters are not identified without certain
15
restrictions. Those restrictions will be introduced in the next subsection.8
With above notations, recall that a classical SEM model can be formulated as (g = 1, ..., G)
?G ?K
yg,n = blgyl,n + ckgxk,n + ug,n, (2.2)
l=1 k=1
where blg represent the classical simultaneity effects. The difference between the network SEM
(2.1) and the classical SEM (2.2) is that the former contains the additional explanatory regres-
sors Wp,nyl,n with associated parameters ?lg,p. The terms ?lg,pWp,nyl,n capture the spatial/cross-
sectional dependence among agents. Note that in the classical SEM, the equilibrium outcome
of the i-th agent depends only on the other outcome variables, the exogenous characteristics and
unobservables of itself. In contrast, in a simultaneous equation model with network dependence,
the equilibrium outcome of the i-th agent also depend on the outcomes of the other agents. In
other words, the equilibrium outcomes are determined simultaneously by all agents. Consistent
with the usual terminology for Cliff-Ord type network interactions, we refer to the nonstochastic
n? n matrices Wp,n and Mq,n as spatial weights matrices and to
yl,p,n = Wp,nyl,n and ug,q,n = Mq,nug,n
as spatial lags on yl,n and on ug,n. Correspondingly, the (scalar) parameters ?lg,p and ?g,q are
referred to as the spatial autoregressive parameters. Observe that the i-th element of yl,p,n and
ug,q,n are given by
?n ?n
yil,p,n = wij,p,nyjl,n and uig,q,n = mij,q,nujg,n.
j=1 j=1
8At minimum, bgg for g = 1, . . . , G are restricted to zeros in the model.
16
The weights matrices contain the information on the links between units and on the relative
weight of those links. For example, in the spatial settings, the elements in the weights matrices are
often some function of the inverse distance between units. The spatial autoregressive parameters
describe the strength of the spillovers. It is worth noting that, Cliff-Ord type network models do
not require indexing the observations by locations. They only rely on some measure of proximity
between units in the formation of the spatial weights. Although originally introduced for spatial
networks, the notion of network in these models goes beyond geographic ones and thus can
accommodate a wide class of networks, as remarked in the introduction. For example, in the
context of social network models, one common specification has been to assign to each of the
i-th individual?s friends a weight of 1/ni, where ni denotes the total number of friends of i, while
assigning zero weights to individuals not being a friend of i.9
As is often the case in the literature, the elements of the spatial weight matrices are allowed
to depend on the sample size. This permits normalization of these matrices where the normal-
ization factor(s) depend on the sample size, which in turn, implies that the model parameters
?depend on sample size.
10 As seen from the above, the i-th element of, say, yl,p,n is given by
n
j=1wij,p,nyjl,n. Note that in light of this, even if the elements of the spatial weights matrices
do not depend on sample size, the elements of the spatial lag yl,p,n and, analogously, the elements
of ug,q,n will generally depend on sample size. This in turn implies that the elements of yg,n and
ug,n will generally depend on the sample size, or in other words, form triangular arrays. In allow-
ing the elements of xk,n to depend on the sample size, we implicitly also allow for some of the
exogenous variables to be spatial lags of exogenous variables. For example, the elements of xk,n
9See, e.g., Cohen-Cole, Liu, and Zenou (2018), among others.
10See Kelejian and Prucha (2010) for further discussions regarding normalizations.
17
?
could be of the form x nik,n = j=1wij,p,n?j where ?j is some basic exogenous variable. Thus the
model allows for, as remarked above, cross-sectional interactions in the endogenous variables,
the exogenous variables, and the disturbances.
The above model generalizes the spatial simultaneous equation model considered in Kele-
jian and Prucha (2004) in allowing for higher-order spatial lags. One attractiveness of this feature
is that they can capture different forms of proximity between units. For example, in the context
of social networks, different matrices may refer to different circle of friends, e.g., one matrix may
contain the very close friends, and a second matrix contains the other friends. As remarked in
Drukker, Egger, and Prucha (2022), an estimation theory that allows for multiple spatial weights
matrices can also be used to accommodate certain parameterizations of the spatial weights. We
borrow their discussion of the motivating example in the following. Spatial weights are often
specified as functions of some distance measure. Let W = (wij) be the basic spatial weights
matrix, let dij denote some distance measure between units i and j, and let d?ij be some conti-
guity measure taking on values of one or zero. Then, the researcher may specify the weights
as the product of the contiguity measure and a polynomial in dij , treating the coefficients of the
polynomial as unknown parameters:
[ ]
wij = d
?
ij ?1dij + . . .+ ?
p
pdij .
Now, suppose the researcher models yg as a function of, say, ?lgWyl, then it follows that
? ? ? ?? ?P P
?lgWyl = ?? ? ? ?lg ?pWp yl = ?lg,pWp yl,
p=1 p=1
18
with ?lg,p = ?lg? ?
p
p, Wp = (wij,p), and wij,p = dijdij . In allowing for higher-order spatial lags,
model (2.1) covers this specification as a special case. Note further that the above specification
can accommodate other basis functions (e.g., alternate basis functions) and more general mea-
sures of distance and contiguity. The same ideas also apply to the modeling of the disturbance
process. As suggested in Drukker, Egger, and Prucha (2022), we would refer to this model as a
simultaneous spatial autoregressive model of order P with spatially autoregressive disturbances
of order Q, for short, a simultaneous SARAR(P ,Q) model, consistent with the terminology of
Anselin and Florax (1995).
In order to distinguish the true parameters from other possible values in the parameter space,
we will denote in the following the true parameters as blg,0, ckg,0, ?lg,p,0 and ?g,p,0. Model (2.1)
can be written more compactly as
Yn = YnB0 +XnC0 + Y n?0 + Un, (2.3)
Un = UnP0 + En
with
Yn = (y1,n, ..., yG,n)n?G, Xn = (x1,n, ..., xK,n)n?K
Un = (u1,n, ..., uG,n)n?G, En = (?1,n, ..., ?G,n)n?G
Y n = (y1,1,n, ..., y1,P,n, ..., yG,1,n, ..., yG,P,n)n?PG,
Un = (u1,1,n, ..., uQ,1,n, ..., u1,G,n, ..., uQ,G,n)n?QG
where the parameter matrices B0 = (blg,0)G?G, C0 = (ckg,0)K?G, ?0 = (?lg,p,0)PG?G, and
19
P0 = (?g,q,0)QG?G are defined conformably.11
2.2.2 Reduced Form and Structural Model with Exclusion Restrictions
To derive the reduced form of the above model, we denote the vectorized model variables as
yn = vec(Yn), xn = vec(Xn), un = vec(Un), ?n = vec(En).
We also denote with Wn and Mn the stacked spatial weight matrices
Wn = [W
? ?
1,n, ...,WP,n]
?, Mn = [M
?
1,n, ...,M
? ?
Q,n] .
Observing that vec(Y n) = (IG ? Wn)yn and vec(Un) = (IG ? Mn)un and that for any
two conformable matrices A1 and A2, vec(A1A ?2) = (A2 ? I)vec(A1), it is readily seen that the
spatial simultaneous equation system (2.1) can be re-written in the stacked notation as
yn = B
? ?
0yn + C0xn + un, (2.4)
un = P
?
0 un + ?n,
where B?0 = (B
?
0 ? I ? ? ? ? ?n) + (?0 ? In)(IG ?Wn), C0 = C0 ? In, and P0 = (P0 ? In)(IG ?Mn).
11Under this formulation, the g-th column of ?0 and P0 are, respectively,
[? ? ?1g,1,0, ..., ?1g,P,0, ..., ?Gg,1,0, ..., ?Gg,P,0] and [0, ..., 0, ?g,1,0, ..., ?g,Q,0, 0, ..., 0] . For normalization, diago-
nal elements of B0 are restricted to zeros.
20
To facilitate discussion, we denote
Sn(?0, ?0) = InG ?B?0 ,
Rn(?
?
0) = InG ? P0 .
The reduced form of the system is given by
y = (I ?B?)?1 ?n nG 0 (C0xn + un) = S?1n (?0, ?0) (C?0xn + un) , (2.5)
un = (InG ? P ? ?1 ?10 ) ?n = Rn (?0)?n,
assuming that the inverses of InG ?B? ?0 and InG ? P0 exist.
In general, the structural parameters of the spatial simultaneous equation system (2.1) and
(2.3) are not identified without imposing exclusion restrictions. Let ?g,0, ?g,0, ?g,0 and ?g,0 denote
the mg,? ? 1, mg,? ? 1, mg,?? 1 and mg,?? 1 vectors of non-zero elements of the g-th column of
B0, C0, ?0 and P0, respectively. Let Yg,n, Xg,n, Y g,n, U g,n and Eg,n be the corresponding matrices
of observations on the endogenous variables, exogenous variables, spatially lagged endogenous
variables, spatially lagged disturbances and disturbances appearing in the structural equation for
the g-th endogenous variable. Then model (2.3) can be expressed as (g = 1, ..., G):
yg,n = Zg,n?g,0 + ug,n, (2.6)
ug,n = U g,n?g,0 + ?g,n, (2.7)
where Zg,n = [Y ? ? ? ?g,n, Y g,n, Xg,n] and ?g,0 = [?g,0, ?g,0, ?g,0] .
21
For purposes of estimation, it proves helpful to apply a spatial Cochran?e-Orcutt transfor-
mation to the model. In particular, premultiplying (2.6) by Rg,n(?g,0) = In ? q?I ?g,? g,q,0Mq,n
yields
y?g,n = Z?g,n?g,0 + ?g,n, (2.8)
with y?g,n = y?g,n(?g,0) = Rg,n(?g,0)yg,n and Z?g,n = Z?g,n(?g,0) = Rg,n(?g,0)Zg,n. Stacking the
transformed equations yields
y?n = Z?n?0 + ?n, (2.9)
[ ]? { } [ ]?
with y?n = y??1,n, . . . , y
?
?G,n , Z
G ? ?
?n = diagg=1 Z?g,n , ?0 = ?1,0, . . . , ?G,0 .
It then follows from (2.6) and above transformed form of the model (2.8) that
? ? ?[ ]
? ?g,n = In ? ? ?g,q,0Mq,n yg,n ? Zg,n?g,0 = Rg,n(?g,0)ug,n. (2.10)
q?Ig,?
We can stack (2.6) and (2.10) over G equations to write the whole system as
yn = Zn?0 + un, (2.11)
?n = Rn(?0)un,
{ } { }
where Zn = diagG Z G ? ? ?g=1 g,n , Rn(?0) = diagg=1 Rg,n(?g,0) , ?0 = [?1,0, ..., ?G,0] denotes the
vector of all parameters in the structural equations and ?0 = [??1,0, ..., ?
? ?
G,0] is the vector of all
spatial autoregressive parameters in the disturbance process. To facilitate below expressions in
the next subsection, we let ? ? ? ?g,0 = [?g,0, ?g,0] be the vector of parameters in the gth equation of
the full model and ? ?0 = [?1,0, ..., ?
? ?
G,0] be the vector of all model parameters.
22
2.2.3 Model Assumptions
Given that we consider the same model specification as in Drukker, Egger, and Prucha
(2022), the following assumptions regarding the data generating process (DGP) are also the same
as those maintained in their paper.
Assumption 1. For p = 1, ..., P and q = 1, ..., Q:
(a) A?ll dia?gonal elem? ents ?of Wp,n and Mq,n are zero.
?(b) ?W? ? ??c, ?p,n M? ?q,n ? c for some finite constant c which does not depend on n, and? ? 1W ? ?
1
p,n ? = 1, Mq,n ? = 1.
Assumption 2. (a)The matrix?Sn(?
?
0, ?0) = InG ?B0 is nonsingular. (b)The spatial autoregres-
sive parameters satisfy supn q?I |?g,q,0| < 1 for g = 1, ..., G, with Ig,? g,? = {qg,1, . . . , qg,qg}
? {1, ..., Q} denotes the set of indices associates with the elements of ? .12g,0
(c)The row and column sums of S?1n (?0, ?0) are uniformly bounded in absolute value.
The above assumptions are in line with the spatial literature. Assumption 1(a) entails a
normalization rule and embodies the fact that unit i is not treated as a neighbor of itself. Assump-
tion 1(b) implies that the row and column sums of the matrices Wp,n and M? q,n a?re uniformly
?bounde?d in absolute value. As pointed out in Kelejian and Prucha (2010), ?W ?p,n = 1 and?M ?
?
q,n ? = 1 implies a normalization for the parameters. This normalization can always be
achieved by appropriately re-scaling the elements of the spatial weight matrix, along with corre-
spondingly redefined spatial autoregressive parameters.13
12Note that the index set Ig,? varies with g and hence embodies different exclusive restrictions imposed on spatial
autoregressive parameters in different disturbance equations.
13See Kelejian and Prucha (2010) or Drukker, Egger, and Prucha (2022) for a more detailed discussion.
23
Assumption 2(a) ensures that the first equation of {the express}ion for the reduced form
(?2.5) is well defined.
14 Observe that Rn(?0) = diagGg=1 Rg,n(?g,0) with Rg?,n(?g,0) = In ??
q?I ??g,q,0M? q,n. In? light of this, it follows from Assumption 1(b) and 2(b) that
?InG ?Rn(?0)?? ?g,?
max ?? ?g q?I g,q,0 < 1, which in turn implies that Rn(?n) is nonsingular, by Horn and John-g,?
son (1985), pp.301). Consequently, the second equation of the reduced form in (2.5) is also well
defined, and thus yn is uniquely defined by the model.
Assumption 1(b) and 2(b) imply even that sup ?P ?n 0 ?? < 1, which in turn implies that
the row sums?of the mat?rices R?1g,n[(?g,0) are un]iformly[bounded in absol]ute value. To see this,
observe that ?? ?1 ?Rg,n(?g,0)? ? 1/ 1? ?P ?0 ?? ? 1/ 1? supn ?P ?0 ?? < ?; see Horn and?
Johnson (1985), pp.301).
Assumption 3. The matrix of (nonstochastic) exogenous regressors Xn has full column rank.
Furthermore, the elements of Xn are uniformly bounded in absolute value by some finite constant.
Above assumption are standard in the spatial literature. In particular, in treating Xn as
nonstochastic, analysis in this dissertation should be viewed as conditional on Xn.
We next state the assumptions maintained for the disturbance ?n. Let Vn = [v1,n, ..., vG,n] be
an n?G matrix of basic innovations and let vn = vec(Vn). Then assume
Assumption 4. The innovations ?n are generated as follows
?n = (?
?
? ? In)vn, (2.12)
where ??? is a nonsingular G ? G matrix and the random variables {vig,n : i = 1, ..., n, g =
14One common yet more restrictive condition in the literature is to assume ?B?0? < 1 for some induced matrix
form.
24
1, ..., G} are, for each n, identically and independently distributed with zero mean, unitary vari-
ance, and finite 4 + ? moments for some ? > 0, and their distribution does not depend on n.
Furthermore, let ?0 = ?????, then the diagonal elements of ?0 are bounded by some finite con-
stant.
The above assumption on the innovation process is in line with the specification of the
disturbance terms for a classical simultaneous equation system. Let ?n(i) denote the i?th row
of En, then equation (2.12) implies En = Vn?? since ?n = vec(En). It is then readily seen
that the innovation vectors {?n(i) : 1 ? i ? n} are i.i.d. with zero mean and VC matrix ?0.
With respect to the stacked innovation vector, the above assumption implies that E?n = 0 and
E?n?
?
n = ?0 ? In. For notational convention, let ?0 be the column vector consists of nonzero
upper diagonal elements of ?0, the G ? G variance-covariance matrix of the innovation vectors
{?n(i) : 1 ? i ? n}. Following the aforementioned notations, ?g.,0 and ?.g,0 denote the g-th row
and the g-th column of ?0, respectively. Similarly, ?g.,0 and ?.g,0 denote the g-th row and the g-th
column of ??1, respectively. We note that ?? = ? and ?g.
?
0 g.,0 .g,0 0 = ?
.g
0 by symmetry of ?0.
From the reduced form model (2.5) and Assumption 4, we see that the VC matrices of un
and yn are given by, respectively,
? = R?1u,n n (?0)(?
?1
0 ? In)Rn (?0)?, (2.13)
? ?1 ?1 ?y,n = Sn (?0, ?0)?u,nSn (?0, ?0) . (2.14)
Assumption 2 and 4 imply that the row and column sums of the VC matrix of un and that of
yn are uniformly bounded in absolute value, thus limiting the degree of correlation between,
respectively, the elements of un and of yn.
25
2.3 Maximum Likelihood Estimation and Estimator Generating Equations
Hendry (1976) and Prucha and Kelejian (1984) show that the normal equations of the ML
estimator can be utilized as a set of estimator generating equations, and IV estimators can be
viewed as numerical approximations to its solution. Hausman (1975) shows explicitly that full
information maximum likelihood estimator of a classical simultaneous system can be written into
a form that permits IV interpretation where the instruments embody all the underlying parameter
restrictions. Extending on these results, one contribution of the dissertation is showing that the
linear components of the ML scores for the simultaneous system with network dependence can
also be re-written to carry IV interpretations.
In this section, we first derive the scores of the ML estimator of the network SEM model in
equation (2.3). The scores are seen to be composed of weighted averages of linear and quadratic
forms. We then show that both the linear and the quadratic forms represent valid moment condi-
tions, and that the linear moments can be written into a form defining the generic IV estimator.
2.3.1 Scores of the Log-likelihood Function
In light of the reduced form model (2.5), we see that yn ? N(?y,?y) under normality,
where
?y = (I
? ?1 ?
nG ?B0) C0xn,
and ?y is defined in equation (2.14). In order to obtain more elegant expression of the scores,
it proves helpful to reparameterize the model with ??1 instead of ?, observing that there is a
one-to-one correspondence between the elements of ? and ??1. In the following, let ? denote
26
the upper diagonal elements of ??1(?). Assuming now the elements of ? are distributed i.i.d.
normal, the reparameterized log-likelihood function is then given by15
? nG nlnLn(?, ?) = ln(2?)? ln|?(?)|+ ln|Sn(?, ?)|+ ln|Rn(?)|
2 2
? 1[Sn(?, ?)yn ? C?n(?)x]?R?n(?)(??1(?)? In)Rn(?)[Sn(?, ?)yn ? C?n(?)xn].2
(2.15)
To help with the presentation of scores, define L?,g as the G?mg,? selection matrix on B0
for equation g such that ? ?g,0 = L?,gb.g,0 where b.g,0 denotes the g-th column of B0 and recall that
?g,0 denotes the vector non-zero elements in b.g,0. The selection matrices L?,g, L?,g and L?,g are
defined analogously. We also denote the selection matrix associated with ? as L 16?. The proofs
of the propositions in this section are given in Chapter A.2.
Proposition 1. The scores of (2.15) with respect to the parameters of interests in equation g, for
15With ?y,n in (2.14) and yen(?) = S
?1
n (?, ?)C
?
n(?)xn, note that (yn?yen(?))???1y,n(yn?yen(?)) = [Sn(?, ?)yn?
C?n(?)xn]
?R?n(?)(?
?1(?)? In)Rn(?)[Sn(?, ?)yn ? C?n(?)xn].
16To be clear, L?,g , L?,g and L{?,g a}re of dimension PG ?mg,?,K??mg,? and Q ?mg,?, respectively. For the
full system, e.g., L = diagG? g=1 L?,g which is of dimension G
2 ? Gg=1 mg,? . Consistent with the definition of
?, selection matrix L? is of dimension G2 ?G(G+ 1)/2.
27
g = 1, ..., G, are:
?lnLn(?, ?)
? =Y
? ? g.
g,nRg,n(?g) (? (?)? In)Rn(?)un(?)? ??,g,n(?, ?), (2.16)??g
?lnLn(?, ?) ? ? g.
???
=Xg,nRg,n(?g) (? (?)? In)Rn(?)un(?), (2.17)
g
?lnLn(?, ?) ? ? g.
? =Y g,nRg,n(?g) (? (?)? In)Rn(?)un(?)? ??,g,n(?, ?), (2.18)??g
?lnLn(?, ?)
=U ? g.? g,n(?g) (? (?)? In)Rn(?)un(?)? ??,g,n(?g), (2.19)??g
?lnLn(?, ?) 1 ? [ ]
? = L? nvec(?(?))? vec(En(?)
?En(?)) , (2.20)
?? 2
where
? (?, ?) = L??,g,n ?,g[tr(S
1g Gg ?
n (?, ?)), . . . , tr(Sn (?, ?))] , (2.21)
??,g,n(?, ?) = L
?
?,g[tr(S
1g(?, ?)W ), . . . , tr(S1gn 1,n n (?, ?)WP,n), (2.22)
. . . , tr(SGgn (?, ?)W1,n), . . . , tr(S
Gg
n (?, ?)W
?
P,n)] (2.23)
??,g,n(? ) = L
? [tr(R?1 (? )M , . . . , tr(R?1 (? )M )]?g ?,g g,n g 1,n g,n g Q,n .
Note that the endogenous variables and their spatial lags can be decomposed into
Y en =Yn (?0) + Vn(?0), (2.24)
e
Y n =Y n(?0) + V n(?0), (2.25)
e
with Y en (?0) = EYn and EVn(?0) = 0, as well as Y n(?0) = EY n and EV n(?0) = 0. Obviously,
e
Y en (?0) is the best instrument for Yn and Y n(?0) is the best instrument for Y n.
28
e
Explicit expressions for the columns of Y en (?0), Y n(?0), Vn(?0) and V n(?0) are obtained
from the reduced form of the model (2.5). Let yen(?0) = vec(Y
e
n (?0)) and vn(?0) = vec(Vn(?0)),
it is readily seen that ye ?1n(?0) = Sn (?0, ?0)C
?
0xn and vn(?0) = S
?1
n (?0, ?0)R
?1
n (?0)?n. Recalling
that vec(Y e en) = (IG ?Wn)yn, then it is furthermore readily seen that yn(?0) = (IG ?Wn)yn(?0)
and vn(?0) = (IG ?Wn)vn(?0). Note that both vn(?0) and vn(?0) are linear in ?n.
For the g-th equation, matrix Y e eg,n(?0) = Yn (?0)L?,g collects the mean of columns in Yg,n,
i.e., the endogenous variables that appear on the right hand side of the g-th equation. Similarly,
e e
Y g,n(?0) = Y n(?0)L?,g collects the mean of columns in Y g,n, i.e., the spatially lagged endoge-
nous variables that appear on the right hand side of the g-th equation. The stochastic components
Vg,n(?0) and V g,n(?0) are defined analogously. We can then re-write the score w.r.t. to ?g and ?g
as
?lnLn(?, ?)
=Y e ? ? g.? g,n(?) Rg,n(?g) (? (?)? In)Rn(?)un(?)??g
+ [V ?g,n(?) Rg,n(? )
?(?g.g (?)? In)Rn(?)un(?)? ??,g,n(?, ?)],
?lnLn(?, ?) e
? =Y g,n(?)
?R (? )?(?g.g,n g (?)? In)Rn(?)un(?)
??g
+ [V (?)?R (? )?(?g.g,n g,n g (?)? In)Rn(?)un(?)? ??,g,n(?, ?)],
from which we see that the scores in (2.16) - (2.19) composed of either linear, or quadratic, or
linear-quadratic forms in terms of ?n(?) = Rn(?)un(?). The following proposition collects the
linear components of the scores and show that they represent valid moment conditions.
Proposition 2. In light of above decomposition, the linear part of the scores (2.16),(2.17),(2.18)
29
can be collected as
?? ??? Y e? g,n(?)
?R ?g,n(?g) (?
g.
n (?)? In)?n(?) ?
1 ?l ?
?
1
mg,n(?) =
e e ? g.
n ???? Y g,n(?)
?Rg,n(? )
?(?g.g n (?)? I = Zn)?n(?) ???? n ?g,n
(?) (?n (?)? In)?n(?),
X ? ? g.g,nRg,n(?g) (?n (?)? In)?n(?)
[ ] (2.26)
e
where Ze?g,n(?) = R (? )Z
e
g,n g g,n(?) and Z
e (?) = Y eg,n g,n(?), Y g,n(?), Xg,n . Under Assumption 4,
Emlg,n(?0) = 0 and hence represent valid moment conditions.
In addition to the information carried by the linear parts of the scores, the ML estimator
also exploit the quadratic parts of the scores when it comes to identification (and estimation). It
is well known to the spatial literature that the quadratic moment conditions could also help with
identifying spatial autoregressive parameters when linear moments are weak.17 The following
proposition collects the quadratic components of the scores and show that they also represent
valid moment conditions, in addition to the linear ones.
Proposition 3. In light of above decomposition, the quadratic part of the scores (2.16),(2.18)
and (2.19) can be collected as
?? ? ? ??? mq,? ? ? ? ?g,n(?) ?? ?? Vg,n(?) Rg,n(?g) (?g.n (?)? In)?n(?)? ??,g,n(?, ?) ???
mq
? ? 1 ? ?
g,n(?) =??? mq,?? g,n(?) ???
=
? ?n ?? V
? ? g.
? g,n(?) Rg,n(?g) (?n (?)? I )?
(2.27)
n n(?)? ??,g,n(?, ?) ????
mq,?g,n(?) U g,n(?g)
?(?g.n (?)? In)?n(?)? ??,g,n(?g)
1
= V ? g.?g,n(?) (?n (?)? In)?n(?)? ?g,n(?),n
17For a discussion, see, e.g., Kelejian and Prucha (1998), Lee (2007), Kuersteiner and Prucha (2020).
30
[ ] [
where V?g,n(?) = Rg,n(]?g)Vg,n(?), Rg,n(?g)V g,n(?), U g,n(?g) and ?g,n(?) = ? (?, ?)
?
?,g,n ,
? ?
?
?,g,n(?, ?) , ??,g,n(?g)
? . Under Assumption 4, Emqg,n(?0) = 0 for g = 1, ..., G and hence
represent valid moment conditions.
2.3.2 IV Interpretation and Estimator Generating Equations
The implication of Proposition 2 is twofold. First, it shows that the linear part of the scores
can be used to motivate a set of valid linear moments. Second, it implies that this set of linear
moment conditions can be re-written to motivate IV estimators. To see the latter, let
? ?
??? ml? 1,n(?0) ???
ml (? ) = ??? .. ?n 0 ? . ????
mlG,n(?0)
denote the sample moment vector obtained{from sta}cking the mlg,n(?) in (2.26) over G equations,
and correspondingly let Zen(?) = diag
G
g=1 Z
e
g,n(?) . Then
Eml
1
n(?0) = E Z
e
n(?0)
?Rn(?0)
?(??1(?0)? In)Rn(?0)(yn ? Zn?0) (2.28)
n
1
= E Ze ? ?1?n(?0) (? (?0)? In)(y?n(?0)? Z?n(?0)?0) = 0,n
with Ze e?n(?0) = Rn(?0)Zn(?0) and where the other Cochrane-Orcutt transformed matrices and
vectors Z?n(?0) = Rn(?0)Zn and y?n(?0) = Rn(?0)yn are as defined in the previous section.
In the spirit of Hendry (1976) and Prucha and Kelejian (1984), sample moment analogue of
equation (2.28) can be viewed as an estimator generating equation in the following sense: given
31
some initial estimates ??n and ??n, it is readily seen that the sample analogue of (2.28) can be
solved to yield
?? = [Ze (?? )?(???1 ? I )Z ]?1Ze (?? )?(???1n ?n n n n ?n ?n n n ? In)y?n(??n), (2.29)
which defines the generic form of the IV estimators. Limited information estimators are obtained
by setting ??n = IG. This result is an extension on the results of Hausman (1975) and Prucha
and Kelejian (1984) in the context of classical SEMs. Within the context of the classical SEM,
Hendry (1976) and Prucha and Kelejian (1984) observe that the 2SLS/3SLS and LIVE/FIVE
estimators can be viewed as special cases of the class of estimators defined by the estimator
generating equations corresponding to the ML score. Those estimators are also special cases
of the generic form of the IV estimators discussed in Hausman (1975). Analogously, the our
generic IV equation (2.29) also motivates the GS2SLS/GS3SLS considered in Drukker, Egger,
and Prucha (2022) as well as the GSLIVE/GSFIVE estimators considered in this dissertation. As
will be discussed in the next section, their difference lies in the approach of approximating the
instruments Zen(?0).
2.3.3 Connection to LIVE/FIVE and Optimal IV Estimation
As remarked in the introduction, the GSLIVE and the GSFIVE estimators incorporate ideas
underlying the LIVE and the FIVE estimators introduced by Brundy and Jorgenson (1971) in the
context of a classical SEM. To give additional background information on LIVE (FIVE) estima-
tors, we briefly review in which way LIVE (FIVE) estimators differs from 2SLS (3SLS) estima-
tors in the construction of the instruments. The optimal instruments are given by the expected
32
value of the endogenous variables, which are well know to be functions of the reduced form
parameters. Since the reduced form parameters are unobserved they have to be estimated. The
difference between the classical IV estimators is solely in the way the reduced form parameters
are estimated. The LIVE (FIVE) estimators exploit the structure of the reduced form parame-
ters implied by the reduced form model, while 2SLS (3SLS) estimators do not. The LIVE (and
the FIVE) estimators employ consistent initial estimates of the structural parameters to compute
the (estimated) expected value of the endogenous variables, which in turn are used to construct
the instruments. The 2SLS (and the 3SLS) estimators simply estimate the expected values of
the endogenous variables by running an OLS regression of the endogenous variables against the
exogenous variables.
As will become clear, similar differences distinguish the GS2SLS (GS3SLS) estimator con-
sidered in Drukker, Egger, and Prucha (2022) and the GSLIVE (GSFIVE) estimator considered
in the current paper in the context of network SEM. For network SEM, the optimal instruments
Zeg,n(?0) involve the inverse of n ? n or even nG ? nG matrices, depending on the structure of
the model. A further difference is that for numerical simplicity GS2SLS (GS3SLS) estimator do
not fully exploit the structure of those inverse matrices. In that sense the GSLIVE (GSFIVE)
estimator considered in the paper share related in spirit to Lee (2003) and Kelejian, Prucha, and
Yuzefovich (2004) that consider IV estimation with optimal instruments in the context of single-
equation spatial autoregressive models. Our estimators are in spirit close to Kelejian, Prucha, and
Yuzefovich (2004), in that we also use a series approximation of the inverse matrices, to avoid
computational issues with those matrices for large samples. In the following we provide a more
formal discussion of these links between the new estimators considered in this dissertation and
those previously considered in the literature.
33
2SLS and LIVE of Classical SEM
Recall from the reduced form (2.5) of our model that Eyn = vec(EYn) = (InG?B?)?10 C?0xn
with B? ? ?0 = B0 ? In + (?0 ? In)(IG ?Wn) and C?0 = C ?0 ? In. For classical SEM, ?0 = 0 and
thus B?0 = B
?
0 ? In. Therefore, the reduced form model becomes
[ ]
Eyn = (InG ?B? ?1 ?0 ? In) C0x = (I ?B? )?1n G 0 ? I ?n (C0 ? In)xn
and thus EYn = Xn?0 with ?0 = C ?1 180(IG ?B0) .
In the context of classical SEM, the 2SLS and the 3SLS estimate EYn by an OLS regression
of Y against X . Specifically, E?Y = X ?? where ?? = (X ?X )?1X ? Y .19n n n n n n n n n n LIVE and FIVE
estimators employ an initial consistent estimator of the structural parameters, say ??n. Given this
initial estimator they estimate EYn by E?Yn = Y en (??n) = Xn??n where ??n = C?n(IG ? B? ?1n) and
thus utilize explicitly the structure of the reduced form model. In the vectorized form, we have 20
[ ]
E?y = ye(?? ) = (I ? B?? ? I )?1C??x = (I ? B?? )?1 ? I (C? ?n n n nG n n n n G n n n ? In)xn. (2.30)
The GS2SLS and GS3SLS estimators considered in Drukker, Egger, and Prucha (2022)
differ from the GSLIVE and the GSFIVE estimators considered in this paper in a closely related
manner. To see this observe that provided that ?B?0? < 1 for some induced matrix norm ?.? we
18Recall that vec(A1,nA2,n) = (A?2,n ? In)vec(A1,n) for any two conformable matrices A1,n and A2,n.
19Note that this is equivalent to the vectorized form E?yn = (???n ? In)vec(Xn).
20The first equality of (2.30) involves inversion of a nG ? nG matrix. This is not necessary for a classical SEM
as suggested in the second equality, but it is only given for the ease of making a connection to the corresponding
expression of E?yn for a SEM with network interactions in (2.32) in the following.
34
have (I ?B?
?
nG 0)
?1 = ?s=0(B
? s
0) , and consequently the reduced form can be written as
??
Eyn = (I
? ?1 ? ? s ?
nG ?B0) C0xn = (B0) C0xn.
s=0
[ ]
In light of the structure of B?0 and C
? ? ? ? ?
0 , for Wn = W1n,W2n, . . . ,WPn , we can then approximate
for EYn as follows with finite S:
? ? ?
EY ? . . . W s1 s2 sPn ? ?? ? l ,nW1 l2,n . . .Wl XP ,n n?(l1,s1),...,(l ,s ), (2.31)P Pl1,...,lP s1=0 sP=0
s1+...+sP?S
where lp ? {1, . . . , P} and the elements of the matrices ?(l1,s1),...,(l ,s ) are functions of theP P
structural parameters. Depending on the model not all combinations of products may appear
in the approximation of EYn, or equivalently, some of the ?(l1,s1),...,(l ,s may be zero. As aP P )
motivating example, we give the explicit expression of approximated EYn up to the second order
for a two equation system in the Chapter A.1.
Analogously to 2SLS and 3SLS for the classical SEM, GS2SLS and GS3SLS do not ex-
ploit the structure of above re(duced) form m[odel. Adopting a s]imilar notation from Drukker,
Egger, and Prucha (2022) that SAs,n := A1,n, A2,n, . . . , AS,n for any conformable matricess=1
A1,n, . . . , AS,n, define
( )P
X1,n = (Wj1,nXn ,j1=1 )P
X2,n = Wj1,nWj2,nXn ,j1,j2=1
... ( )P
XR,n = Wj1,nWj2,n . . .Wj ,nXn ,R j1,j2,...,jR=1
35
[ ]
and let HR,n = Xn, X1,n, . . . , XR,n . To approximate EYn, the GS2SLS and GS3SLS estima-
tors run an OLS regression of Yn against a collection of the independent columns in
[ ]
HR,n,W1,nHR,n, . . . ,WP,nHR,n .
Let ??(l1,s1),...,(l ,s ) denote the OLS estimator of ?(l1,s1),...,(l ,s ) from such a regression, then weP P P P
can express the approximation of EYn as
? ? ?
E?Yn = ? .?.?.
s1 s2
?Wl ,nWl ,n . . .W
sP
l ,nXn??1 2 P (l1,s1),...,(l .P ,sP )
l1,...,lP s1=0 sP=0
s1+...+sP?S
The advantage of this approximation is that it is readily computable, but it is not consistent.
The above approach differs from the GSLIVE and the GSFIVE estimators, which explicitly
exploit the structure of parameters implied by the reduced form model. Given a consistent initial
estimates ?? we estimate EYn as
( )?1
E?yn =y
e
n(??n) = I
?
nG ? B?n C??nxn, (2.32)
B??n =B?
?
n ? In + (???n ? In)(IG ?Wn).
IV Estimators of SAR models
The GSLIVE (GSFIVE) estimators considered in this paper also relates to Lee (2003) and
Kelejian, Prucha, and Yuzefovich (2004) who consider IV estimators with optimal instruments in
the context of single-equation SAR model. To illustrate, consider the following SAR model with
36
first order spatial lags in both yn and un:
yn =?Wnyn +Xn? + un
un =?Mnun + ?n.
The reduced form of yn is given by yn = (In ? ?W )?1n (Xn? + un) and thus the optimal
instrument for yn is its mean
Eyn = (In ? ?W )?1n Xn?.
?
With |?| < 1, we can express (I ?1 ? s sn ? ?Wn) = s=0 ? Wn . As noted in Kelejian and Prucha
(1998), Eyn can thus be expressed as a weighted sum of the sequence of matrices Xn, WnXn,
W 2nXn, . . .. [Therefore, they define]d the IV matrix Hn that consists of linearly independent
columns in Xn,WnXn, . . . , XSnXn for S being finite. Their first step IV estimator is ??n =
(Z? ?nZ?n)
?1Z? ?nyn with Z?n = H (H
?H )?1 ?n n n HnZn and thus ??n is a 2SLS-type estimator. The
GS2SLS and GS3SLS estimators of Drukker, Egger, and Prucha (2022) generalize this idea.
Lee (2003) alternatively estimates for Eyn as y?n = (I ?1n ? ??nWn) Xn??n, assuming the
availability of consistent initial estimates ??n and ??n. Their first step IV estimator is thus ??n =
(Z? ? ?1 ?nZ?n) Z?nyn, where Z?n = [Wny?n, Xn]. Lee (2003)?s estimator exploits fully the nonlinear
structure of the parameters in the reduced form model or in Eyn. Since the expression for y?n
involves the inverse of n ? n matrix, Z?n may not be commutable in large sample. To overcome
?this difficulty, Kelejian, Prucha, and Yuzefovich (2004) used finite order approximation Wny?n =S ??sW s+1s=0 n n X??n in replace for Wny?n and their estimator remains feasible even when n is large.
Note that Kelejian, Prucha, and Yuzefovich (2004) allows the order of series sum S to grow with
37
sample size and approach infinity. Asymptotically, their estimator is equivalent to that considered
in Lee (2003).
Recall that for our network SEM, E?y ?n given in (2.32) involves the inverse of InG ? B?n,
which is an nG? nG matrix. Therefore, GSLIVE and GSFIVE could again be computationally
infeasible when n becomes large. LIVE and FIVE e[stimators in the con]text of a classical SEM do
not suffer from this issue since in this case E?yn = (I ? B??G n)?1 ? I C??n nxn and the dimension
of I ?G ? B?n does not depend on the sample size n. To cope with the difficulty in computing
(InG ? B??n)?1 for network SEM, we consider approximating (InG ? B??n)?1 with a geometric
series of finite order. We defer the details to Section 2.4.2 where we introduce the approximated
moment conditions.
2.4 Moment Conditions
In this section, we present the heteroskedasticity-robust moment conditions based on Propo-
sition 2 and Proposition 3. We will also give the series approximated version of these moment
conditions that will be used in defining the GMM estimators in Section 2.5.
2.4.1 Heteroskedasticity-robust Moment Conditions
Recall that by Assumption 4 we have E?n = 0 and
? ?
???? ?11In . . . ?1GIn ???? . . ?E?n? .n = ???? .
. . . .. ??? ,?
?G1In . . . ?GGIn
38
where we drop the subscript zero for the true parameters for notational convenience. Now con-
sider the case where the innovations are heteroskedastic in the sense that
? ?
???? ?11 . . . ?1G? ?? ?
?
E? ?? = ? .n .. . . .. ?n ,? . . ????
?G1 . . . ?GG
{ }
where ? ngh = diagi=1 ?ii,gh denotes the true VC matrix under heteroskedasticity. In the follow-
ing we introduce a modification of the moment conditions such that they remain valid under the
above form of heteroskedasticity. For this discussion we assume the availability of some initial
estimator ??g,n such that
1
? ?gh = plimn?? ??g,n??h,nn
with ??g,n = Rg,n(??g,n)(yg,n?Zg,n??g,n). For this discussion we will also assume that ? = (?gh) is
known, while noting that for the empirical implementation ? will be replaced by the correspond-
ing estimator with the (g, h)-th element given by n?1???g,n??h,n.
Heteroskedasticity-robust Linear Moment Conditions
In light of equation (2.28) and ?n = Rn(?0)un(?0), under heteroskedasticity, consider the
following modified linear moments (at ?0):
1 ?1
mln(?0) = Z
e(? )?n 0 Rn(? )
?
0 (? (?0)? In)?n. (2.33)
n
It is straight forward to see that Emln(?0) = 0 even under heteroskedasticity, and thus they
represent valid full information moment conditions even under heteroskedasticity. Of course, the
39
limited information counterpart for the g-th equation
ml
1
g,n(?0, g) = Z
e ? ?
g,n(?0) Rg,n(?g,0) ?g,n (2.34)n
also satisfies Emlg,n(?0, g) = 0 and thus represent valid limited information moment conditions
under heteroskedasticity.
Heteroskedasticity-robust Quadratic Moment Conditions
Let An = (Agh,n) for g, h = 1, . . . , G be some non-stochastic nG ? nG matrix where the
blocks Agh,n are of dimension n ? n. It is well known in the literature that to make a quadratic
moment condition of the form
1
E ??g,nAgh,n?g,n = 0n
robust against heteroskedasticity, we can set the diagonal elements of the matrix in the middle of
the quadratic form, Agh,?n, equa?l to zeros.
21 For the system case we consider m?omen?t conditions of
the?form
1
??
?
nAn?n =
1 G G ?? 1 ? 1 G G ?
n n g=1 h=1 g,n
Agh,n?h,n, and thus E?nAn?n = E? A ? =n n g=1 h=1 g,n gh,n h,n
1 G G tr(A ?g=1{ h=1} gh,nE?h,n?g,n). We see immediately that
1E??nAn?n = 0 for any E?h,n?
?
g,n =n n
diagni=1 ?ii,hg , provided that the diagonal elements of Agh,n are zeros.
For the following it proves convenient to introduce the following notation:
( ) ( )
MATd Bgh,n = Bgh,n ? diag(Bgh,n) and MATD (Bn) = (MATd Bgh,n ).
( )
That is MATd Bgh,n is the matrix obtained from Bgh,n by setting all diagonal elements equal to
zero, and thus MATD (Bn) is obtained from Bn by setting the diagonal elements of each n ? n
21Here we let Agh,n denote the (g, h)-th n? n block of the nG? nG matrix An.
40
block equal to zero.22
Moreover, we note that the diagonal elements of each n? n block in
?1 ( ) ?1
(? (?0)? In)MATD (Bn) or MATD B?n (? (?0)? In) (2.35)
are zeros.23 Hence, in light of above discussion,
1 ? ?1 1 ( ) ?1E?n(? (?0)? In)MAT ? ?D (Bn) ?n = E?nMATD Bn (? (?0)? In)?n = 0.n n
To present our heteroskedasticity-robust quadratic moments, it proves helpful to first show
that each element in the quadratic moments implied by (2.27) in Proposition 3 can indeed be
expressed as a quadratic form ?n. In the vector of quadratic components of the scores (2.27),
columns in V ?1g,n(?0) and V g,n(?0) are given by (n? 1 blocks of) vn(?0) = Sn (? , ? )R?10 0 n (?0)?n
and vn(?0) = (IG ? Wn)vn(?0), respectively, and thus columns in Vg,n(?) and V g,n(?) are then
seen to be weighted sums of ?g,n for g = 1, . . . , G. Also, recall that a typical column in U g,n is
of the form Mq,nug,n and u ?1g,n = Rg,n(?g,0) ?g,n. In the following lemma we provide the details
of how each individual element in mqg,n(?) can be expressed as a quadratic function of ?n(?).
Lemma 1. Let Sh.n (?, ?) denotes the h-th n ? nG block of S?1n (?, ?) and ig,G denotes the g-th
column of the identity matrix of dimension G. Then the element in mq,?g,n(?) that is associated with
22Recall the notational convention that Bn = (Bgh,n) where Bgh,n is the (g, h)-th n? n block of Bn.
23 ?1To see this, note that each n ? n block in (? (?0) ? In) is a diagonal matrix and the diagonal elements of
each n?n block in MATD (Bn) are zeros by construction. Let An = (Agk,n) and Cn = (Ckh,n) be some nG?nG
matrices, with the (g, k)-th n ? n block of An being Agk,n and the (k, h)-th n ? n block of Cn being Ckh,n.
Furthermore, let Agk,n be a diagonal matrix and diagonal elements of Ckh,n be zeros (or?vice versa). Then the
diagonal elements of Agk,nCkh?,n are zeros. Note that the (g, h)-th ? Gn n block in AnCn is k=1 Agk,nCkh,n and
thus the diagonal elements of Gk=1 Agk,nCkh,n are also zeros. Since this holds for any (g, h)-th block in AnCn,
the diagonal elements of each n? n block in AnCn are also zeros.
41
the score w.r.t. bhg can, upon replacing ? with ?, be written as
1 ?1 ( )
?n(?)
?(? ? I )R (?) i h. ?1n n g,G ? Sn (?, ?) Rn (?)?n(?)?
1
tr(Shgn (?, ?)), (2.36)n n
the element in mq,?g,n(?) that is associated with the score w.r.t. ?hg,p can be written as
1 ? ?1 ( )? (?) (? ? I )R (?) i ?W Sh. 1(?, ?) R?1(?)? (?)? tr(W Shgn n n g,G p,n n p,n
n n n n n
(?, ?)), (2.37)
and the element in mq,?g,n(?) that is associated with the score w.r.t. ?g,q can be written as
1 ? ?1?n(?) (? ? I ?n)(ig,Gig,G ?M R?1
1
q,n g,n(?g))? (?)? tr(M ?1n q,nRg,n(?g)). (2.38)n n
Proof of Lemma 1 is presented in Chapter A.2. Note that under heteroskedasticity of the
?n, i.e.,, E?n??n ?= (? ? In) the expected values of the moments defined in (2.36) - (2.38) are
generally not zero. For example, under heteroskedasticity it is readily seen that in general
1 ?1 ( ) 1
E ??n(? ? In)Rn(?) i h.g,G ? Sn (?0, ?0) R?1n (?0)?n ? tr(Shgn (?0, ?0)) =? 0,n n
observing that
1 ? ?1 ( )E (?n(? ? I
h. ?1
n)Rn(?) ig,G(? Sn (?0, ?0) Rn (?0)?nn )? ?1 )=tr E? ? (? ? I )R (?) i ? Sh.(? , ? ) R?1n n n n g,G n 0 0 n (?0)
? 1
( ) 1
= tr i h. hgg,G ? S
n n
(?0, ?0) = tr(Sn (?0, ?0)),n
42
where the inequality follows since under heteroskedasticity E?n??n ?= (?? In). This implies that
in general Emqg,n(?0) ?= 0 and hence the moment vector does not yield a set of valid moment
conditions under heteroskedasticity. Following the discussion at the beginning of this section, we
can make the quadratic moment conditions robust to heteroskedasticity by restricting the diagonal
elements of each of the matrices in the quadratic forms in (2.36),(2.37),and (2.38) to zeros.
Full Information Quadratic Moments: In light of (2.36) in Lemma 1, the set of heteroskedasticity-
robust quadratic moments originates from mq,?g,n(?) (i.e., the quadratic scores w.r.t. ? in (2.27))
can be written comp[actly as( [ ] ) ]G,G
mq,?
1 ?1
n,R(?) = L
?
?vec ?n(?)
?(? ? I h. ?1n)MATD Rn(?)(ig,G ? Sn (?, ?))Rn (?) ?n(?) ,n h=1,g=1
(2.39)
with L? = diagGg=1{L?,g}.
Analogously, the set of heteroskedasticity-robust quadratic moments originate from the
mq,?g,n(?) is
?? [? ( )
] ?
?1 [ ] P,G
? L? ? h. ?1? ?,1vec ?n(?) (? ? In)MATD Rn(?)(i1,G ?Wp,nSn (?, ?))Rn (?) ?n(?) ?? p=1,h=1 ?mq,? 1(?) = ?? [ ..n,R n ? ( [ . ) ] ???
??
.
?1 P,G
L? vec ? (?)?(? ? I )MAT R? (?)(i ?W Sh.(?, ?))R?
] ?
1
?,G n n D n G,G p,n n n (?) ?n(?)
p=1,h=1
(2.40)
Finally, the set of heteroskedasticity-robust quadratic moments originate from mq,?g,n(?) is
[( [ ] ) ]Q,G
mq,?
1 ?1
n,R(?) = L
?
?vec ?n(?)
?(? ? In)MATD i ?g,Gig,G ?M ?1q,nRn g,n
(?) ?n(?) ,
q=1,g=1
(2.41)
with L? = diagGg=1{L?,g}.
43
Note that each element in mq,?n,R(?) is of the generic form
1 ? (?)?n An?n(?) where the diagonaln
?1
elements of each the n?n blocks in An = (? (?0)??In)MA?TD (Bn) are zeros as dis{cussed}before
Lemma 1. Hence, at ?0, we have E 1 ?? 1
G G n
n n
An?n = n g=1 h=1 tr(Agh,ndiagi=1 ?ii,hg ) = 0
where ? q,?ii,hg = E?i,h,n?i,g,n. Thus Emn,R(?0) = 0 holds under heteroskedasticity and represents
valid quadratic moment conditions. Following analogous arguments, we also have Emq,?n,R(?0) =
0, Emq,?n,R(?0) = 0 and hence all represent valid moment conditions. Stacking m
q,?
n,R(?), m
q,?
n,R(?)
and mq,?n,R(?) together, we can obtain the vector of heteroskedasticity-robust (full information)
quadratic moments for the whole system as
[ ]?
mqn,R(?) = m
q,? ? q,? ? q,? ?
n,R(?) ,mn,R(?) ,mn,R(?) .
Limited Information Quadratic Moments: If one ignores the cross-equation error struc-
ture in mqn(?), we can then obtain the following limited information moment vector for the g-th
equation:
?? [( [ ] ) ] ?? G? L?? ?,gvec ? (? )
?MAT R (? )Shgg,n g d g,n g n (?, ?)R
?1
g,n(?g) ? ?g,n(?g)
? ?h=1 ???? [( ) ] ?? [ ] ??
?
? ?? P,G ?q 1m (?, g) = ? L? vec ? (? )?MAT R (? )W Shgg,n,R ? ?,g g,n g d g,n g p,n n (?, ?)R
?1
g,n(?g) ? ?g,n(?g) ? .n p=1,h=1 ?
???
?
??? [( ?[ ] ) ] ??Q ??
L??,gvec ?g,n(?g)
?MAT M R?1d q,n g,n(?g) ?g,n(? ) ?g
q=1
( (2.)42)
Observing that each element in mq (?, g) is of the generic form 1g,n(,R ) ( ?)g,n(?
?
g) MA{T Bn d }gg,n ?g,n(?g)
and thus at ?g,0, we have E 1 ??g,nMAT B ?
1
d gg,n g,n = tr(MATd B
n
n n gg,n
diagi=1 ?ii,gg ) = 0
44
where ?ii,gg = E?
q
i,g,n?i,g,n. Hence, Emg,n,R(?0, g) = 0 and thus represent valid quadratic mo-
ment conditions under heteroskedasticity.
As known to the literature, GMM estimators based on both the linear and quadratic moments
may remain consistent when linear moments alone are insufficient in identifying parameters.
When identification through linear moments are weak, GMM estimators based on both the linear
and the quadratic moments may outperform those utilize only the linear moments. As remarked,
to ensure our GMM estimators remain feasible when sample size n gets large, we would consider
the approximated version of these moments, as discussed in the following section.
2.4.2 Approximated Moments
Recall that E?yn in (2.32) involves inversion of an nG ? nG matrix I ?nG ? B?n, and thus
may not be computable when n is large. This issue also plagues ML estimators, in light of the
scores (2.16) - (2.19). In general, estimators involves inversion of matrices whose dimensions
depend on sample size n could be computationally infeasible when n becomes large. To cope
with this problem, we approximate for the inverse by a geometric series of finite order S, in
spirit to Kelejian, Prucha, and Yuzefovich (2004), and obtain approximated moment conditions
from the heteroskedasticity-robust linear and quadratic moments derived in above. We will adopt
these approximated versions of linear and quadratic moment conditions to construct the GMM
estimators, in order to overcome the computational issue involved with inversion of matrices
whose dimensions depend(on sample)size n.
Since ?1 ?1S (?, ?) = I ? ? ? ?n nG ? Bn with Bn = Bn ? In + (?n ? In)(IG ? Wn), one can
45
approximate S?1n (?, ?) with
SAn (?, ?) = I
? ? S
nG +Bn + ...+ (Bn) ,
for S being finite.24 Following our notational conventions, we let Shg,An (?, ?) denote the (h, g)
-th n ? n block of SA(?, ?) and Sh.,An n (?, ?) denote the h-th n ? nG block of SAn (?, ?), for
h, g = 1, . . . , G.
Thus, the series approximation to Ey = S?1n n (?, ?)vec(XnC) in defining the approximated
GSLIVE and GSFIVE estimators is expressed as
?S
ye,An (?) = (B
?
n)
svec(XnC).
s=0
e e
Note that by definition, Y eg,n(?) and Y g,n(?) in the matrix of instruments Z
e
g,n(?) = [Y
e
g,n(?), Y g,n(?), Xg,n]
e
depend on yen(?). Thus, we denote the approximated versions of Y
e
g,n(?) and Y
e,A
g,n(?) as Yg,n (?)
e,A e,A
and Y g,n(?) and the{approxim}ated version of Zeg,n(?) as Ze,A e,Ag,n (?) = [Yg,n (?), Y g,n(?), Xg,n] and
Ze,An (?) = diag
G
g=1 Z
e,A
g,n (?) . ( ? )
We also approximate R?1 ?1g,n(?) = In ? q?I ?q,gMq,n with some finite matrix polyno-q,g
mial as: ? ?
RAg,n(?) = In + ( ?q,gMq,n) + ...+ ( ?q,gM
S
q,n) .
q?Iq,g q?Iq,g
Approximated Linear Moments
In light of the linear moment conditions in Proposition 2 and equation (2.34), the (limited
information) linear moments with approximated instruments for the g-th equation can be written
24We find that a relatively low number of S would render rather accurate approximations. In our Monte Carlo
simulations, setting S = 15 makes approximation error to be at 10? 6 scale.
46
as
ml,A
1
(?, g) = Ze,A(?)?g,n g,n Rg,n(?
?
g) ?g,n(?g). (2.43)
n
Moreover, in light of equation (2.33), the vector of (full information) linear moments with ap-
proximated instruments is then
ml,A
1 e,A ?1
n (?) = Zn (?)
?Rn(?)
?(? ? In)?n(?). (2.44)
n
As will be shown in Section 2.5, ml,Ag,n(?, g) and ml,An (?) will be used to construct the approxi-
mated GSLIVE and the approximated GSFIVE estimators.
Approximated Limited Information Quadratic Moments
For the limited information quadratic moments mqg,n,R(?, g) in (2.42), its approximated ver-
sion is expressed a?s? [? ( [ ] )
] ?
G
? L? vec ? (? )?MAT R (? )Shg,A? ?,g g,n g d g,n g n (?, ?)R
A ?
g,n(?g) ?g,n(?g)
? ?? h=1 ?? ???? [( [ ] ) ]
?
????P,G
mq,A
1 ?
g,n,R(?, g) = ??? L?,gvec ? (? )
?MAT R (? )W Shg,A A ?g,n g d g,n g p,n
n n
(?, ?)Rg,n(?g) ?g,n(?g) ? .
?? p=1,h=1 ?? ?? ??? [( [ ] ) ] ??
?
Q ?
L??,gvec ? (? )
?
g,n g MATd M
A ?
q,nRg,n(?g) ?g,n(?g)
q=1
(2.45)
Approximated Full Information Quadratic Moments
To obtain the approximated version of, say
[ ]
?1
?n(?)
?(? ? I )MAT R (?)(i ? Sh.(?, ?))R?1n D n g,G n n (?) ?n(?)
47
( )
in mq,?n,R(?), we would replace MATD Rn(?)(i
h.
g,G ? Sn (?, ?))R?1n (?) with
[ ]
MATD Rn(?)(i ? Sh.,Ag,G n (?, ?))RAn (?) ,
where RAn (?) and S
h.,A
n (?, ?) are the approximated version of R
?1
n (?) and S
h.,
n (?, ?) introduced
above. The approximated version of mq,?[ n,R(?) in (2.39) is then given by(
1 [ ] ) ]G,G
mq,?,A ? ?
?1 h.,A A
n,R (?) = L?vec ?n(?) (? ? In)MATD Rn(?)(ig,G ? Sn (?, ?))Rn (?) ?n(?) ,n h=1,g=1
(2.46)
Analogously, to obtain the approximated version of, e.g.,
[ ]
? ?1?n(?) (? ? In)MATD Rn(?)(ig,G ?Wp,nSh.n (?, ?))R?1n (?) ?n(?)
in mq,?n,R(?), we would replace the ?middle matrix? with
[ ]
MATD Rn(?)(ig,G ?W Sh.,Ap,n n (?, ?))RAn (?) .
The approximated version of mq,?g,n(?) in (2.40) is then given by
mq?,?,An,R (?)? [? (? ?1 [ ] )
] ?
P,G
?? L?,1vec ?n(?)
?(? ? I )MAT R (?)(i h.,A An D n 1,G ?Wp,nSn (?, ?))Rn (?) ?n(?) ?
? p=1,h=11 ? ?
?
=
n ?? [
.. ?( [ .?1 ] ) ]
??? .
P,G
? ? ? ?L?,Gvec ?n(?) (? ? In)MATD Rn(?)(i h.,AG,G ?Wp,nSn (?, ?))RAn (?) ?n(?)
p=1,h=1
(2.47)
48
[ ]
?1
Finally, for the element ?n(?)?(? ? I ? ?1n)MATD ig,Gig,G ?Mq,nRg,n(?) ?n(?) in m
q,?
n,R(?), we
replace the ?middle matrix? with
[ ]
MATD i
? A
g,Gig,G ?Mq,nRg,n(?) ,
and the approximated ve[rsion of m
q,?
( g,n
(?) in (2.41) is then given by
[ ] ) ]Q,G
mq,?,A
1 ? ?1
n,R (?) = L?vec ?n(?)
?(? ? In)MAT ? AD ig,Gig,G ?Mq,nRg,n(?) ?n(?) ,n q=1,g=1
(2.48)
Stacking mq,?,A(?), mq,?,A q,?,An,R n,R (?) and mn,R (?) together, we can obtain the vector of approximated
heteroskedasticity-robust (full information) quadratic moments for the whole system as
[ ]?
mq,A q,?,A ? q,?,A ? q,?,A ?n,R(?) = mn,R (?) ,mn,R (?) ,mn,R (?) .
2.5 LIVE and FIVE Estimators for Network SEM
In this section, we define the limited and full information estimators based on the approxi-
mated linear moment conditions (2.43) and (2.44) as well as the approximated quadratic moment
conditions derived in the previous section. In particular, we define and present the implementation
steps of our GSLIVE and GSFIVE estimators for ?0 (along with the efficient GMM estimators
for ?0), as well as the One-Step GMM estimators LQ-GSLIVE and LQ-GSFIVE for ?0.
Generic forms of the GSLIVE and the GSFIVE Estimators
Recall that the underlying form of the generic moment conditions defined in (2.28). The
49
generic full information moment conditions are given by
1
E Zen(?0)
?R ? ?1n(?0) (? (?0)? In)Rn(?0)(yn ? Zn?0) = 0. (2.49)
n
The generic limited information moment conditions are obtained by replacing ??1(?0) with
the identity matrix:
1
E Ze ? ?g,n(?0) Rg,n(?0) Rg,n(?0)(yg,n ? Zg,n?0,g) = 0, g = 1, ..., G. (2.50)n
As discussed in Section 2.3.2, the generic forms of full information and limited information IV
estimator can then be derived by solving for the sample analogues of equations (2.49) and (2.50),
respectively.25
Our GSLIVE and GSFIVE estimators are special cases of the generic limited and full in-
formation IV estimator obtained in this fashion. They are obtained by solving the estimator
generating equations implied by the heteroskedasticity-robust linear moment conditions in (2.43)
and (2.44), respectively. Specifically, the generic form of the GSLIVE estimator is given by
[ ]?1
? e,A ?g,n = Z?g,n(?0) Z?g,n(? ) Z
e,A (? )?g,0 ?g,n 0 y?g,n(?g,0), (2.51)
and the generic form of the GSFIVE estimator is given by
[ ]?1
? = Ze,An ?n (?0)
?(??10 ? I e,An)Z?n(?0) Z?n (? )?(??10 0 ? In)y?n(?0). (2.52)
25Specifically, the resulting generic forms of the limited information and full information IV estimators are ??g,n =
[Ze?g,n(??n)
?Z ?1 e ??g,n] Z?g,n(??n) y?g,n(??g,n) and ??n = [Z
e
?n(??n)
?(???1n ? In)Z ]?1Ze (?? )?(???1?n ?n n n ? In)y?n(??n).
50
2.5.1 Limited Information Estimators
We now discuss, in a sequence of steps, the implementation details of the GSLIVE estimator
of ?g,0 and a GMM estimator of ?g,0 based on the first stage residuals. We will then also define the
One-Step GMM estimator of ?g,0 (i.e., the LQ-GSLIVE estimator) that utilizes both the (limited
information) linear and quadratic moments.
Step 1a: GSLIVE estimator of ?g
Let ??n be some consistent initial estimates of ?0, e.g., the GS2SLS estimator considered in
Drukker, Egger, and Prucha (2022). We can then compute the Cochrane-Orcutt transformed ma-
trices Z?g,n(??g,n) = Rg,n(??g,n)Zg,n, Ze,A e,A?g,n(??n) = Rg,n(??g,n)Zg,n (??n) and y?g,n(??g,n) = Rg,n(??g,n)yg,n.
In light of the generic form (2.51), our GSLIVE estimator for ?g,0 can be defined as
[ ]?1
?? = Ze,Ag,n ?g,n(?? )
?Z (?? ) Ze,A ?n ?g,n g,n ?g,n(??n) y?g,n(??g,n). (2.53)
We shall also utilize the following estimator for the variance of the limiting distribution of ??g,n:
[ ]?1
????gg,n(g) = ?? n
?1Ze,Agg,n ?g,n(?? )
?Ze,An ?g,n(??n) ,
where ?? 1 ?gg,n = ?? ?? with ??n g,n g,n g,n = y?g,n(??g,n)? Z?g,n(??g,n)??g,n.
Step 1b: Efficient GMM estimator of ?g based on GSLIVE residuals
Let u?g,n = yg,n?Zg,n??g,n denote the residuals of the gth equation based on GSLIVE estimate
??g,n obtained in Step 1a. In light of the moment vector mq,Ag,n,R(?, g) in (2.45), we can write the
51
(limited information) sample moments with first stage residuals u?g,n as
[( [ ] ) ]Q
m?n(?g, ??g,n, ??g,n, g) = L
?
?,gvec u?
?
g,nRg,n(?g)
?MAT Ad Mq,nRg,n(??g,n) Rg,n(?g)u?g,n .
q=1
(2.54)
The efficient GMM estimator of ?g is then defined as
?? = argminm?(? , ?? , ?? , g)?(???? ?1 ?g,n n g g,n g,n gg,n(g)) mn(?g, ??g,n, ??g,n, g), (2.55)
?g
where ????gg,n(g) is an estimator of the VC matrix of the limiting distribution of the normalized
sample moments m?n(?g, ??g,n, ??g,n, g). For r, s ? Ig,? (where Ig,? is the index set defined in
Assumption 2), the rs-th element of ????gg,n(g) is given by
2 (?? [ ]
????
gg,n
rs,gg,n(g) = ( tr MATd M
A
r,nRg,n(??g,n) )) (2.56)n [ ] [ ]?
MAT M RAd s,n g,n(??
A
g,n) + MATd Ms,nRg,n(??g,n)
+ ??? ????g,r,n gg,n(g)??g,s,n,
with
[ (
1 [ ] [ ] ) ]?
??g,r,n = ? Ze,A?g,n(??n)? MAT A Ad Mr,nRn g,n
(??g,n) + MATd Mr,nRg,n(??g,n) Rg,n(??g,n)u?g,n .
Above expression of ????rs,gg,n(g) is derived in light of Theorem 2 (and Theorem 5) in Drukker,
Egger, and Prucha (2022). Let ???gg,n(g) denote the asymptotic VC matrix of the sample moment
vector m?n(?g, ??g,n, ??g,n, g). As remarked in Drukker, Egger, and Prucha (2022), the second term
52
in (2.56) stems from the fact that the sample moment vector depends on estimated residuals u?g,n.
The LQ-GSLIVE Estimator
Complementing the approximated limited information linear moments (2.43) with the ap-
proximated quadratic moments mq,Ag,n,R(?, g) based on equation (2.45) and let ??n denote some
consistent initial estimate for ?0, the vector of the linear-quadratic sample moments can then be
formed as
mn?(?g, ??n, g) ?
??? Ze,A(?? )[?? g,n n R(g,n(?? )
?
g ?g,n(?g)
? [ ] ) ] ???
?
G
??? L
? ? hg,A A ?
?,gvec ?g,n(?g) MATd Rg,n(??g,n)Sn (??n, ??n)Rg,n(??g,n) ?g,n(?g)
? ?? h=1 ?? ?1 ??? [( [ ] ) ]= ? P,G ?
???
.
n ? ?L? vec ? (? )?MAT R (?? )W Shg,A A? ?,g g,n g d g,n g,n p,n n (??n, ??n)Rg,n(??g,n) ?g,n(? ?g)? ?? p=1,h=1 ????
?
? [( [ ] ) ] ?
???Q ?
L? vec ? (? )??,g g,n g MATd Mq,nR
A
g,n(??g,n) ?g,n(?g) ?
q=1
The corresponding efficient GMM estimator can be defined as
?? ? ?1g,LQ,n = argminmg,n(?g, ??n, g) (??gg,n(g)) mg,n(?g, ??n, g), (2.57)
?g
where ??gg,n(g) denotes an estimator for the asymptotic variance covariance matrix of normal-
?
ized sample moments nmn(?g, ??n, g). Specifically, let the residuals be ??g,n = y?g,n(??g,n) ?
53
Z?g,n(??g,n)??g,n and ??gg,n = 1 ???g,n??g,n. The estimated VCV matrix is thenn
? ?
l
???? ?? ?gg,n(g) 0 ???gg,n(g) = ?? ,
0 ??qgg,n(g)
where
l ??gg,n??gg,n(g) = Z
e,A
g,n (?? )
?
n Rg,n(?? )
?
g,n R
e,A
g,n(??g,n)Z
n g,n
(??n)
is the block corresponding to the linear moments and
? ?
?? ?? ??
q ???
? ??gg,n(g) ?? ?gg,n(g) ??gg,n(g) ???
?? (g) = ??? ????? (g) ???? (g) ????gg,n ? gg,n gg,n gg,n(g) ?????
???? ??
? ??
gg,n(g) ??gg,n(g) ??gg,n(g)
is the block corresponding to the quadratic moments.
For completeness, the explicit expression of each block in ??qgg,n(g) are discussed in Chapter
A.3. Due to limited space, we only present the general forms of the individual elements in
??qgg,n(g) in below.
A typical element in ????gg,n(g) is of the form
(
??2 [ ]( [ ]gg,n
tr MAT R (?? )Sr1g,Ad g,n g,n n (??n, ??n)R
A r2g,A A
n g,n
(??g),n) MATd Rg,n(??) g,n
)Sn (??n, ??n)Rg,n(??g,n)
[ ]?
+MATd Rg,n(??g,n)S
r2g,A
n (??n, ??n)R
A
g,n(??g,n) ,
54
where r1, r2 ? {1, . . . , G}; a typical element in ????gg,n(g) is of the form
( (
??2 [ ] [ ]gg,n
tr MATd Rg,n(??g,n)S
r1g,A
n (??n, ??
A
n)Rg,n(??g,n) )MATd Rg,n(??g,n)Wp,nS
r2g,A(?? , ?? )RAn n (??g,n)
n n g,n[ ] )?
+MATd Rg,n(??
r2g,A A
g,n)Wp,nSn (??n, ??n)Rg,n(??g,n) ,
where r1, r2 ? {1, . . . , G}, p ? {1, . . . , P}; a typical element in ????gg,n(g) is of the form
(
??2 [ ]( [ ] [ ] ))?gg,n
tr MAT R (?? )Srg,A(?? , ?? )RAd g,n g,n n n n g,n(??g,n) MATd Mq,nR
A
g,n(??g,n) + MATd M
A
q,nRg,n(??g,n) ,n
where r ? {1, . . . , G}, q ? {1, . . . , Q}; a typical element in ????gg,n(g) is of the form
(
??2 [ ]( [ ]gg,n
tr MATd Rg,n(??
r1g,A A r2g,A A
g,n)Wp
n 1
,nSn (??n, ??n)Rg,n(??g),n) MAT R) d g,n
(??g,n)Wp2,nSn (??n, ??n)Rg,n(??g,n)
[ ]?
+MATd Rg,n(??g,n)W S
r2g,A A
p2,n n (??n, ??n)Rg,n(??g,n) ,
where r1, r2 ? {1, . . . , G}, p1, p2 ? {1, . . . , P}; a typical element in ????gg,n(g) is of the form
??2
( [ ]
gg,n
( tr MAT R
rg,A
d g,n(??g,n)Wp,nSn (??n, ??n)R
A
g,n(??[ ] [ ]g,n )n
))
?
MAT M RAd q,n g,n(??g,n) + MATd M
A
q,nRg,n(??g,n) ,
where r ? {1, . . . , G}, p ? {1, . . . , P} and q ? {1, . . . , Q}; a typical element in ????gg,n(g) is of
the form
( (
??2 [ ] [ ] [ ] ))?gg,n
tr MAT A A Ad Mq1,nRg,n(??g,n) MATd Mq2,nRg,n(??g,n) + MATd Mq2,nRg,n(??g,n) ,n
55
where q1, q2 ? {1, . . . , Q}.
2.5.2 Full Information Estimators
We now define, in a sequence of steps, the implementation details of the GSFIVE estimator
of ?0 and a GMM estimator of ?0 based on the first stage residuals. We will then also define
the One-Step GMM estimator of ?0 (i.e., the LQ-GSFIVE estimator) that utilizes both the (full
information) linear and quadratic moments.
Step 2a: GSFIVE estimator of ?
As in above defining the limited information estimators, let ??n be some consistent initial
estimates of ?0. We can then compute the Cochrane-Orcutt transformed matrices Z?n(??n) =
Rn(??n)Zn, Ze,A?n (??n) = Rn(??n)Z
e,A
n (??n) and y?n(??n) = Rn(??n)yn. In light of the generic form
(2.52), our GSFIVE estimator for ?0 can be defined as
[ ]
?? ?1
? = Ze,A(?? )?(???1n ?n n n ? In)Z?n(?? e,A ? ?1n) Z?n (??n) (??n ? In)y?n(??n) (2.58)
where the gh-th element of ??n is given by ?? 1 ?gh,n = ??g,n??h,n and ??g,n = y?g,n(??n g,n)?Z?g,n(??g,n)??g,n.
We shall also utilize the following estimator for the variance of the limiting distribution of ???n:
[ ]?1
??
??? = n?1Ze,A(?? )?
??
n ?n n (?
?1
n ? I )Ze,An ?n (??n) ,
with the gh-th element of ???n being ??? 1 ??? ??gh,n = ?g,n?h,n with
??
???g,n = y?g,n(??g,n) ? Z?g,n(??g,n)?g,n.n
We denote the ??(g, h)-th blocks of ??? as ?????n gh,n.
Step 2b: Efficient GMM estimator of ? based on GSFIVE residuals
56
Let u??g,n = yg,n ? ??Zg,n?g,n denote the residuals of the gth equation based on the GSFIVE
estimate ???n obtained in Step 2a, and ???g,n(?g) = Rg,n(?g)u??g,n. In light of the moment vector
(2.45), we can then write the (limited information) sample moments with first stage residuals
u??g,n as
[( [ ] ) ]Q
??
m?n(?g, ?g,n, ?? , g) = L
? vec u??? ? A ??g,n ?,g g,nRg,n(?g) MATd Mq,nRg,n(??g,n) Rg,n(?g)ug,n .
q=1
(2.59)
The efficient GMM estimator of ?g is then defined as
?? ?? ?? ???g,n = argminm
?(? , ? , ?? , g)?(???n g g,n g,n gg,n(g))
?1m?n(?g, ?g,n, ??g,n, g), (2.60)
?g
where ?????gg,n(g) is an estimator of the VC matrix of the limiting distribution of the normalized
sample moments ??m?n(?g, ?g,n, ??g,n, g). For r, s ? Ig,? (where Ig,? is the index set defined in
Assumption 2), the rs-th element of ?????gh,n(g) is given by
(
???? ?
??2 [ ]
gh,n
?rs,gh,n(g) = ( tr MATd Mr,nR
A
g,n(??g,n) )) (2.61)n [ ] [ ]?
MAT M RAd s,n g,n(??g,n) + MAT
A
d Ms,nRg,n(??g,n)
??? ??+ ? ?? ??g,r,n?gh,n?h,s,n,
with
[ ( ) ]
?? 1
[ ] [ ]?
?g,r,n = ? Ze,A ??g,n(??n) MAT A A ??d Mr,nRg,n(??g,n) + MATd Mr,nRg,n(??g,n) Rg,n(??g,n)ug,n .n
57
The LQ-GSFIVE Estimator
Complementing the approximated full information linear moments (2.44) with the approx-
imated quadratic moments mq,?,An,R (?) based on (2.46) - (2.48) in Section 2.4.2, the vector of the
linear-quadratic sample moments can then be formed as
[ ]?
mn(?, ??n) = m
l (?, ?? ? ? ? ? ?n n) ,mn(?, ??n) ,mn(?, ??n) ,m
?(?, ?? )?n n ,
where ??n denotes some consistent initial estimates of ?0. The vector of (approximated) linear
sample moments is
1
ml (?, ?? ) = Ze,A(?? )?R (?? )?n n n n n n (??
?1
n ? In)?n(?),n
and the vectors of (approximated) quadratic sample moments are
[( [ ] ) ]G,G
m?
1
(?, ?? ) = L? vec ? (?)?n n ? R n (??
?1
n ? In)MAT R (?? h.,A AD n n)(ig,G ? Sn (??n, ??n))Rn (??n) ?n(?) ,n g=1,h=1
where the gh-th elements in ??n is given by ?? = 1 ?gh,n ??g,n??h,n with ??g,n = y?g,n(??n)?Z?g,n(??n)??n g,n;
m?n?(?, ??n)? [? ( [ ] )
] ?
P,G
?? L
?
?,1vec ?n(?)
?(???1n ? In)MAT R (?? )(i ?W Sh.,AD n n 1,G p,n n (??n, ??n))RAn (??n) ?n(?) ??
?? p=1,h=1 ?1 ??
?
. ?=
n ?? [( [
.. ] ) ] ???? ,P,G
L? vec ? (?)?(???1?,G n n ? In)MATD R?n(??n)(iG,G ?W h.,A A ?p,nSn (??n, ??n))Rn (??n) ?n(?)
p=1,h=1
58
and
[( ) ]G,Q
? 1
[ ]
m (?, ?? ) = L?n n ?vec ?
? ?1
R n(?) (??n ? In)MAT i ?D g,Gig,G ?M Aq,nRg,n(??n) ?n(?) .n g=1,q=1
The corresponding efficient GMM estimator can be defined as
??
? ?LQ,n = argminmn(?, ??n) ??
?1
n mn(?, ??n) (2.62)
?
where ??n denotes an estimator for the asymptotic variance covariance matrix of normalized
?
sample moments nmn(?, ??n). Specifically,
? ?
???? ??
l 0 ?n
??n = ??? ,
0 ??qn
where
l 1?? = Ze,A(?? )?R (?? )?(???1 ? I )R (?? )Ze,An n n n n n n n nn n
(??n)
is the block corresponding to the linear moments, and the block corresponding to the quadratic
moments is of the form ?? ??? ???? ????? n n ??
?? ?
n
??qn = ????
???
????
?
???? ???? ?n n n ???
????
?
????
?
????n n n
with each sub-matrix consists of G?G sub-blocks.
For completeness, the explicit expressions of each block in ??qn are given in Chapter A.3.
59
Due to limited space, we only present the general forms of the individual elements in ??qn in below.
A typical element in the gh-th block of ????n is of the form
( (
1 [ ] [ ]
tr MAT R (?? )(i ? Sr1.,A(?? , ?? ))RA(?? ) MAT R (?? )(i ? Sr2.,AD n n g,G n n n n n D n n h,G) n (??n, ??n))R
A
n (??n)n [ ] )?
+(?? r2.,An ? In)MATD Rn(??n)(ih,G ? Sn (??n, ??n))RAn (?? ) (???1n n ? In) ,
for r1, r2 ? {1, . . . , G}; a typical element in the gh-th block of ????n is of the form
(
1 [ ]( [ ]
tr MATD Rn(?? )(i ? Sr1.,An g,G n (?? , ?? ))RAn n n (??n) MATD R r2.,An(??n)(ih,G ?W)p,nSn (??n, ??n))R
A
n (??n)n [ ] )?
+(??n ? In)MATD Rn(?? )(i ?W Sr2.,An h,G p,n n (?? , ?? ))RAn n n (?? ) (???1n n ? In) ,
for r1, r2 ? {1, . . . , G}, p ? {1, . . . , P}; a typical element in the gh-th block of ????n is of the
form
( [ ](1 [ ]
tr MATD Rn(??n)(i ? Sr.,Ag,G n (?? , ?? ))RAn n n (??n) MAT i i?D )h,G h,G ?M R
A
) q,n h,n
(??h,n)
n [ ]?
+(?? ? I )MAT i i?n n D h,G h,G ?Mq,nRAh,n(?? ) (???1h,n n ? In) ,
for r ? {1, . . . , G}, q ? {1, . . . , Q}; a typical element in the gh-th block of ????n is of the form
( (
1 [ ] [ ]
tr MAT r1.,A AD Rn(??n)(ig,G ?Wp1,nSn (??n, ??n))Rn (??n) MATD Rn(??n)(i r2.,Ah,G)?Wp2,nSn (??n, ?? ))R
A
) n n
(??n)
n [ ]?
+(??n ? In)MATD Rn(?? )(i ?W Sr2.,An h,G p2,n n (??n, ??n))RAn (??n) (???1n ? In) ,
for r1, r2 ? {1, . . . , G}, p1, p2 ? {1, . . . , P}; a typical element in the gh-th block of ????n is of
60
the form
( (
1 [ ] [ ]
tr MATD R
r.,A A ? A
n(??n)(ig,G ?Wp,nSn (??n, ??n))Rn n
(??n) MA)TD i) h,G
ih,G ?Mq,nRh,n(??h,n)
[ ]?
+(??n ? In)MAT i i?D h,G h,G ?Mq,nRA ?1h,n(??h,n) (??n ? In) ,
for r ? {1, . . . , G}, p ? {1, . . . , P}, q ? {1, . . . , Q}; a typical element in the gh-th block of ????n
is of the form
( (
1 [ ] [ ]
tr MAT i i? ?M RA (?? ) MAT i i? ?M RAD g,G g,G q1,n g,n g,n D h,G h,G )q2,n h,n(??) h,n
)
n [ ]?
+(??n ? I ?n)MATD ih,Gih,G ?M Aq2,nRh,n(??h,n) (???1n ? In) ,
for q1, q2 ? {1, . . . , Q}.
2.6 Identification Condition
To discuss the identification conditions via the linear moment conditions, we consider the
following general system of G equations with spatial lags in both the endogenous variables and
the disturbance process. Some parts of the discussion follow closely the relevant discussion in
the Appendix F of Drukker, Egger, and Prucha (2022). Recall from (2.4) that the stacked system
can be written as
yn = B
?
0yn + C
?
0xn + un,
un = P
?
0 un + ?n,
61
where B?0 = (B
?
0 ? In) + (??0 ? In)(IG ?Wn), C?0 = C ? ? I , and P ? = (P ?0 n 0 0 ? I 26n)(IG ?Mn).
Assuming invertability of Sn(? ?0, ?0) = InG ? B0 and Rn(?0) = I ?nG ? P0 , the reduced form of
the system is given by
yn = S
?1
n (?0, ?
?
0) (C0xn + un) ,
un = R
?1
n (?0)?n.
The assumption on ?n implies E?n = 0 and E?n??n = ?0?In, the expected value and the variance
of yn are given by
Eyn = S
?1
n (?0, ?0)C
?
0xn, (2.63)
? ?
Var(y ) = S?1n n (?0, ? )R
?1
0 n (?0)(? ? I ?1 ?10 n)Rn(?0) Sn(?0, ?0) . (2.64)
[ ] [ ]
Recall that Zg,n = Yg,n, Y g,n, Xg,n and thus the best instruments for Zg,n is EZg,n = EYg,n,EY g,n, Xg,n .
In light of the linear moments mlg,n(?) in (2.26) as well as those in Drukker, Egger, and Prucha
(2022), we let Hg,n denote (generically) the matrix of instruments for the linear moments corre-
sponding to the g-th equation. The linear moments conditions then follow the general form
EH ?g,nug,n = 0.
26As before, we let yn = [y?1,n, . . . , y
? ?
G,n] , u = [u
?
n 1,n, . . . , u
? ? ? ? ?
G,n] and ?n = [?1,n, . . . , ?G,n] be the stacked
vector of the endogenous variables, the disturbances of the structural equation and the disturbances, respectively.
Let x ? ? ?n = [x1,n, . . . , xK,n] denote the stacked vector of the K exogenous variables. The parameter matrices B0 =
(blg,0)G?G (simultaneous effects), ?0 = (?lg,s,0)PG?G (spatial autoregressive parameters), C0 = (clg,0)K?G
(parameters on exogenous variables), and P0 = (?g,q,0)QG?G (spatial autoregressive parameters in the disturbance
process) are defined conformably.
62
Recall that ug,n = yg,n?Zg,n?g,0, where ?g,0 denotes the vector of (true) structural parameters
appear in equation g, it then follows that the linear moments can be written as EH ?g,nug,n(?g) =
EH ?g,n(yg,n ? Zg,n?g) = EH ? ?g,n(ug,n ? Zg,n(?g ? ?g,0)) = Hg,nEZg,n(?g,0 ? ?g). In line with
Kelejian and Prucha (1998), the identification based on linear moment conditions for the g-th
equation requires H ?g,nEZg,n to have full column rank, which in turn requires that EZg,n to be of
full column rank.27 To provide guidance on where this condition may fail, we next discuss two
such scenarios.
2.6.1 Scenario I
In this scenario, let us first consider the extreme case that the model does not contain any
exogenous variables, i.e., C0 = 0[. As such, Eyn = 0] by (2.63) and thus EYg,n = 0, EY g,n =
0. It then follows that EZg,n = Eyg,n,EY g,n, Xg,n is not of full column rank. This in turn
implies that H ?g,nEZg,n is not of full column rank and thus the linear moments cannot identify
all structural parameters ?g,0. Apart from a complete failure of identification by linear moments
under this scenario, we expect the estimators based only on linear moment conditions to perform
poorly, when the parameters of the exogenous variables are ?small?. Since the values of the
elements in C0 depend on the chosen units of measurement of the exogenous variables, it seems
intuitive that ?small? is best interpreted as to correspond to a small ratio of the variance (signal)
stemming from the exogenous variables to the variance (noise) of the disturbances. As such, we
expect the identification via linear moments to be strong (weak) when the signal to noise ratio is
large (small).
In addition to the linear moments, the quadratic moments may help identify (some) of the
27To avoid under-identification cases, we assume implicitly rank(Hg,n) ? rank(Zg,n).
63
parameters. To see this, note that Var(yn) in (2.64) is still a function of structural parameters
even if C0 = 0. For contributions on identification with the help of quadratic moment conditions,
see, e.g., Lee (2007) and Kuersteiner and Prucha (2020) within a single equation framework, and
see , e.g., Liu (2014, 2019, 2020), Liu and Saraiva (2019), and Yang and Lee (2017, 2019) for
contributions within a systems framework.
2.6.2 Scenario II
In addition to low signal-to-noise ratio described in Scenario I, one may also encounter the
following weak identification scenario in empirical applications when allowing for spatial lags
on Xn. For the ease of illustration, we consider the following two-equation model without spatial
lags in the first equation
y1,n =b21y2,n + [c11x1,n + c21x2,n + c31x]3,n + ?1,n (2.65)
y2,n =b12y1,n + ?22,1W1,n + ?22,2W2,n y2,n + c72x4,n + c82x5,n + c92x6,n + ?2,n
Pre-multiplying both sides of the first equation by (In ? ?11,1W1,n) and collect the terms, we can
obtain
y1,n =b21y2,n + ?11,1W1,ny1,n + ?21,1W1,ny2,n
+ c11x1,n + c21x2,n + c31x3,n + c41W1,nx1,n + c51W1,nx2,n + c61W1,nx3,n + v1,n (2.66)
64
with v1,n = (In??11,1W1,n)?1,n and the following parameter restrictions satisfied simultaneously
?21,1 = ??11,1b21, c41 = ??11,1c11, c51 = ??11,1c21, c61 = ??11,1c31. (2.67)
The first equation of model (2.65) and (2.66) are observationally equivalent given above parame-
ter restrictions and hence the spatial parameters ?11,1 and ?21,1 are not identifiable. In connection
to our discussion at the beginning of Section 2.6, note that under model (2.66), the best instru-
ment for Z1,n is
[ ]
EZ1,n = Ey2,n,W1,nEy1,n,W1,nEy2,n, X1,n,W1,nX1,n ,
[ ]
where X1,n = x1,n, x2,n, x3,n . Under (2.65), W1,nEy1,n = b21W1,nEy2,n +W1,nX1,n?1 where
?1 = [c11, c
?
21, c31] , and thus EZ1,n is in general not of full column rank.
In light of above discussion, any point in the parameter space satisfying the parameter re-
strictions in (2.67) constitutes a ?non-identification? point. It is of interest to explore the finite
sample performance of the estimators under both the strong and the weak identification cases,
i.e., when the true parameters values are far away from and close to these ?non-identifications?
points. To do so, let
?21,1 = ?(?11,1 + ?)b21
c41 = ?(?11,1 + ?)c11
c51 = ?(?11,1 + ?)c21
c61 = ?(?11,1 + ?)c31,
where the parameter ? governs the size of deviation from the corresponding ?non-identitfication?
65
points. Larger value of ? corresponds to a point in the parameter space further away from the
?non-identification? point and hence the identification through linear moments based on Xn and
spatial lags of Xn are expected to be stronger than those cases with smaller ??s. Specifically,
?21,1 = ?(?11,1 + ?)b21 indicates a point deviates from the ?non-identification? in the ?negative?
direction of ?21,1 with size ?b21; ck1 = ?(?11,1+?)ck?3,1 indicates a point deviates from the ?non-
identification? in the ?negative? direction of ck1 with size ?ck?3,1, for k = 4, 5, 6. In general, the
strength of identification via linear moments depends on the size of deviation and much less so
on the sign of deviation.
2.7 Monte Carlo Simulations
We investigate the finite sample property of the proposed estimators and compare their per-
formance with several existing estimators in the literature, under both the strong and weak identi-
fication cases. Corresponding to the discussion in Section 2.6, we consider two scenarios where
weak identification issues could arise. Scenario I is designed to compare the performance be-
tween estimators under high and low signal-to-noise ratio. We do so by varying the size of
parameters on the exogenous variables Xn. Note that Scenario I is in line with the weak IV
cases that arise in empirical applications. Scenario II extends on the case considered in, e.g.,
Kuersteiner and Prucha (2020), that identification becomes weak when near the ?singular point?
in the parameter space. Due to space limitation, we will focus on the strong and weak identifica-
tion cases under homoskedasticity as well as the strong identification case with heteroskedastic
disturbances in this section. In Chapter A.4, we document results on additional robustness tests,
including cases with an alternative design of spatial weight matrices as well as correlated exoge-
66
nous variables.
For the purpose of comparison, in addition to the GSLIVE, the GSFIVE, the LQ-GSLIVE
and the LQ-GSFIVE estimators considered in this dissertation, we also report the finite sample
performance of the (quasi)-maximum likelihood estimator (MLE), the generalized spatial 2SLS
(GS2SLS) that was first introduced in Kelejian and Prucha (1998) and the generalized spatial
3SLS (GS3SLS) that was first introduced in Kelejian and Prucha (2004) but extended to the
simultaneous equation SARAR model of higher order spatial lags in Drukker, Egger, and Prucha
(2022). We would also implement the linear quadratic GS2SLS (LQ-GS2SLS) and the linear
quadratic GS3SLS (LQ-GS3SLS) that were considered in Drukker, Egger, and Prucha (2022).
2.7.1 Data Generation Process
We next describe the way we generate the exogenous variables xk,n, the disturbance vector
?n, as well as the spatial weight matrices.
Exogenous Matrix: Let n denote the sample size. We generate a n ? 6 exogenous matrix
Xn = [x1,n, x2,n, . . . , x6,n] as follows. Elements in Xn are generated with i.i.d. normal distri-
bution with mean ?x = 1 and variance ?x = 1 for j = 1, . . . , 6 and the columns in Xn?s are
uncorrelated. The Xn is generated once for all Monte Carlo experiments.
Disturbances: In the Monte Carlo simulations, we co[nsider tw]o equation systems (i.e.,
G = 2). We generate the n ? 2 disturbance matrix En = ?1,n, ?2,n as follows. Let ?i.,n =
[?i1,n, ?i2,n] be the i-th row of En. The ?i.,n are generated i.i.d. normal in i, with mean ?? = 0
and variance ?? = 1. In addition, we set the covariance between the i-th element in ?1,n and ?2,n
to be 0.5 for i = 1, ..., n (i.e., cov(?i1, ?i2) = 0.5 for i = 1, . . . , n). We generated 500 different
67
[ ]
En = ?1,n, ?2,n for the 500 Monte Carlo trials in total.
Spatial Weight Matrices: We consider the North-East modified-rook design in Arraiz,
Drukker, Kelejian, and Prucha (2010) but adapt to the current model with second order spatial
lags. For the convenience of readers, we repeat the details of their design in the following.
First, consider a matrix in terms of a square grid with both the x and y coordinates only
taking on the discrete values 1, 1.5, 2, 2.5, . . ., m. Let the units in the northeast quadrant of
this matrix be at the indicated discrete coordinates m ? x ? m and m ? y ? m, where m
can be seen as a cut-off value. Let the remaining units be located only at integer values of the
coordinates: x = 1, 2, . . . ,m ? 1 and y = 1, 2, . . . ,m ? 1. Under this construction, the number
of units located in the northeast quadrant is inversely related to the cut-off value m. As such, we
refer to this matrix as a north-east modified rook matrix. For clarity, we illustrate such a matrix
for the case in which m = 2 and m = 5. and with the units indicated by the starts:
5.0 ? ? ? ? ? ? ?
4.5 ? ? ? ? ?
4.0 ? ? ? ? ? ? ?
3.5 ? ? ? ? ?
3.0 ? ? ? ? ? ? ?
2.5
2.0 ? ? ? ? ?
1.5
1.0 ? ? ? ? ?
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
68
To generate spatial weights matrices W1,n and W2,n, we first define the measure of Euclidean
distance between any two units, i1 and i2, that have coordinates (x1, y1) and (x2, y2), respectively,
as [ ]1/2
d(i1, i2) = (x ? x )2 + (y ? y )21 2 1 2 .
Given this distance measure, we define the (i, j)-th element of our row normalized weights matrix
W1,n as ?n
w ? ?ij,1 = wij/ wil,
l=1
where w?ij = 1 if 0 < d(i1, i2) ? 1 and w?ij = 0 otherwise; (i, j)-th element of our row normalized
weights matrix W2,n as ?n
wij,2 = w
?
ij/ w
?
il,
l=1
where w?ij = 1 if 1 < d(i1, i2) ? 2 and w?ij = 0 otherwise. We further set M1,n = W1,n and
M2,n = W2,n. For our experiment, we consider the case in which m = 5 and m = 15 and thus
the sample size is n = 486. In this specification, the North-East (NE) sector accounts for about
75% of the units in the sample.
2.7.2 Implemented Estimators
To avoid confusions, we list the considered estimators in the experiments along with their
notations appear in Table 2.1 - Table 2.7.
1. ML estimator, ??ML;
2. GS2SLS ?? and GMM estimator ?? considered in Drukker, Egger, and Prucha (2022), de-
noted as ?? ? ? ?GS2SLS = [?? , ?? ] ;
69
3. GS3SLS ??? and GMM estimator ??? considered in Drukker, Egger, and Prucha (2022), de-
noted as ??GS3SLS = [?
???, ????]?;
4. Approximated GSLIVE ??GSLIV E in (2.53) and the GMM estimator for ?g in (2.55);
5. Approximated GSFIVE ??GSFIV E in (2.58) and the GMM estimator for ?g in (2.60);
6. LQ-GS2SLS ??LQ?GS2SLS considered in Drukker, Egger, and Prucha (2022);
7. LQ-GS3SLS ??LQ?GS3SLS considered in Drukker, Egger, and Prucha (2022);
8. Approximated Linear-Quadratic GSLIVE ??LQ?GSLIV E in (2.57);
9. Approximated Linear-Quadratic GSFIVE ??LQ?GSFIV E in (2.62).
As discussed, the (LQ-) GSLIVE and (LQ-) GSFIVE estimators considered in this disserta-
tion require some initial estimates. In the Monte Carlo simulations, we use the GS2SLS estimates
as the initial estimates for the GSLIVE, and then use the GSLIVE estimates as the initial esti-
mates when implement the GSFIVE. Analogously, we use the LQ-GS2SLS estimates as the initial
estimates for the LQ-GSLIVE, which in turn serves as the initial estimates for the LQ-GSFIVE.
2.7.3 Performance of Estimators under Strong and Weak Identifications Sce-
nario I
In light of our discussion in Section 2.6.1, it is of interest to compare the performance of the
proposed estimators under both the strong and weak identification set-ups. In the following, we
suppress subscript n in the following where not necessary.
70
Model Specification: The model considered in the experiment is given by
[ ]
y1 =b21y2 + [?11,1W1 + ?11,2W2] y1 + c11x1 + c21x2 + c31x3 + u1,
y2 =b[12y1 + ?22,1W1 ]+ ?22,2W2 y2 + c42x4 + c52x5 + c62x6 + u2,
ug = ?g1W1 + ?g2W2 ug + ?g, g = 1, 2.
For simplicity, we denote c.1 = [c11, c21, c ? ?31] and c.2 = [c42, c52, c62] . In the simulations,
we experiment with both c.1 = c.2 = [1, 1, 1]
? and c.1 = c.2 = [0.4, 0.4, 0.4]
?, corresponding
to the high and low signal-to-noise ratio cases. We expect the identification power of linear
moments would be weakened in the latter case. Therefore, we refer to the former case as the
strong identification case and the latter case as the weak identification case in Table 2.1 - Table
2.3 below.
Parameter Space:
?? ???? ?11,1W1 + ?11,2W2 b21In ?Recall that ?S(?, ?) = I ? ?2n ? B where B = ??
? b12In ?2?2,1W1 + ?22,2W2?? ?11W1 + ?12W2 0 ?and ?R(?) = I ? ?2n ? P where P = ?? ??. For the reduced
0 ?21W1 + ??22W2
?form (2.5) to exist, we need to ensure the existence of S(?, ?)
?1 = ?h=0(B
?)h and R(?)?1 =
?
h=0(P
?)h, which in turn requires ||B?|| < 1 and ||P ?|| < 1 for some induced norm. Row-
normalized W1 and W2 imply ||W1||? = 1 and ||W2||? = 1, where ||A||? gives the maximum
absolute row sum of An. To ensure ||B?||? < 1, we need to restrict
71
||b21In + ?11,1W1 + ?11,2W2||? ? |b21|||In||? + |?11,1|||W1||? + |?11,2|||W2||?
= |b21|+ |?11,1|+ |?11,2| < 1,
||b12In + ?22,1W1 + ?22,2W2||? ? |b12|||In||? + |?22,1|||W1||? + |?22,2|||W2||?
= |b12|+ |?22,1|+ |?22,2| < 1.
This is equivalent to require
max{|b21|+ |?11,1|+ |?11,2|, |b12|+ |?22,1|+ |?22,2|} < 1. (2.68)
The inequalities follows from triangular inequalities for matrix norms. Similarly, to ensure
||P ?||? < 1, we need to restrict
||?11W1 + ?12W2||? ? |?11|||W1||? + |?12|||W2||? = |?11|+ |?12| < 1
and
||?21W1 + ?22W2||? ? |?21|||W1||? + |?22|||W2||? = |?21|+ |?22| < 1
This is equivalent to require
max{|?11|+ |?12|, |?21|+ |?22|} < 1. (2.69)
In light of above, we consider the following parameter combinations
72
1. b21 ? {?0.15, 0.15}, b12 = 0.3
2. ?11,1 ? {?0.3, 0, 0.3, 0.5}, ?11,2 ? {?0.2, 0, 0.2}
3. ?22,1 = 0.3, ?22,2 = 0.15
4. ?11 ? {?0.2, 0.2}, ?12 = 0.1
5. ?21 = 0.1, ?22 = 0
Note that the inequalities (2.68) and (2.69) are satisfied with above parameter sets. These
parameter choices constitute 48 different experiment settings for each of the weak and the strong
identification cases and thus 96 experiments in total. Due to the limited space, we report the
Monte Carlo results for each case of c.1 = c.2 = [1, 1, 1]
? (i.e., the strong identification case) and
c ?.1 = c.2 = [0.4, 0.4, 0.4] (i.e., the weak identification case) with the three parameter constella-
tions in the tables below
Table 2.1 Table 2.2 Table 2.3
b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3
?11,1 = 0.3 ?11,2 = 0.2 ?11,1 = 0.5 ?11,2 = 0.2 ?11,1 = ?0.3 ?11,2 = ?0.2
?22,1 = 0.3 ?22,2 = 0.15 ?22,1 = 0.3 ?22,2 = 0.15 ?22,1 = 0.3 ?22,2 = 0.15
?11 = 0.2 ?12 = 0.1 ?11 = 0.2 ?12 = 0.1 ?11 = 0.2 ?12 = 0.1
?21 = 0.1 ?22 = 0 ?21 = 0.1 ?22 = 0 ?21 = 0.1 ?22 = 0
We report the Median and the RMSE of the obtained estimates based on 500 Monte Carlo
repetitions. Following, e.g., Kelejian and Prucha (1998), the RMSE is calculated as RMSE =
73
[ ]2
bias2 + IQ/1.35 , where the bias is the absolute difference between the median of the em-
pirical distribution and the true parameter value, and IQ is an inter-quantile range. That is,
IQ = c1 ? c2 where c1 is the 0.75 quantile and c2 is the 0.25 quantile. Note that if the dis-
tribution is normal, the median is equal to the mean and IQ/1.35 is approximately equal to the
standard deviation. As mentioned in Kelejian, Prucha, and Yuzefovich (2004), an important fea-
ture of this modified RMSE measure is that it is based on quantiles which always exist. The
standard measure of the root mean square error is based on the first and second moments which,
as pointed out by Kelejian and Prucha (1999) among others, may not always exist. Thus the
standard measure may not be well defined.
74
75
Table 2.1: Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 1
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.149 0.024 0.165 0.029 0.153 0.026 0.150 0.026 0.150 0.025 0.151 0.026 0.147 0.025 0.151 0.025 0.151 0.025
?11,1 0.300 0.301 0.052 0.307 0.067 0.303 0.061 0.303 0.057 0.300 0.053 0.309 0.065 0.300 0.059 0.304 0.057 0.302 0.054
?11,2 0.200 0.201 0.050 0.186 0.068 0.197 0.061 0.198 0.057 0.201 0.052 0.191 0.060 0.206 0.058 0.198 0.057 0.200 0.052
?11 0.200 0.190 0.099 0.182 0.112 0.191 0.110 0.187 0.106 0.193 0.102 0.175 0.110 0.190 0.105 0.192 0.109 0.195 0.100
?12 0.100 0.069 0.123 0.101 0.128 0.092 0.128 0.067 0.131 0.073 0.128 0.096 0.123 0.091 0.130 0.097 0.136 0.094 0.118
b12 0.300 0.298 0.021 0.312 0.024 0.299 0.022 0.299 0.021 0.299 0.022 0.298 0.022 0.298 0.022 0.300 0.021 0.302 0.021
?22,1 0.300 0.299 0.052 0.295 0.068 0.298 0.057 0.293 0.061 0.297 0.052 0.300 0.067 0.292 0.057 0.295 0.063 0.296 0.054
?22,2 0.150 0.157 0.056 0.145 0.066 0.152 0.063 0.157 0.063 0.156 0.057 0.151 0.063 0.160 0.058 0.155 0.065 0.155 0.056
?21 0.100 0.089 0.096 0.084 0.121 0.092 0.113 0.087 0.106 0.085 0.104 0.086 0.117 0.096 0.117 0.097 0.118 0.099 0.102
?22 0.000 -0.043 0.150 -0.018 0.162 -0.026 0.163 -0.043 0.161 -0.043 0.165 -0.016 0.162 -0.020 0.168 -0.022 0.168 -0.021 0.144
Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.149 0.059 0.227 0.093 0.180 0.069 0.153 0.064 0.154 0.064 0.157 0.060 0.138 0.067 0.163 0.060 0.156 0.062
?11,1 0.300 0.308 0.125 0.347 0.214 0.347 0.235 0.307 0.142 0.307 0.137 0.325 0.166 0.320 0.162 0.305 0.138 0.305 0.129
?11,2 0.200 0.197 0.123 0.089 0.216 0.141 0.222 0.200 0.146 0.191 0.132 0.154 0.152 0.186 0.147 0.180 0.142 0.188 0.127
?11 0.200 0.183 0.149 0.120 0.239 0.132 0.241 0.183 0.180 0.191 0.166 0.127 0.191 0.156 0.184 0.190 0.174 0.195 0.153
?12 0.100 0.077 0.163 0.194 0.219 0.140 0.212 0.061 0.195 0.074 0.182 0.127 0.184 0.097 0.191 0.108 0.172 0.102 0.165
b12 0.300 0.296 0.052 0.366 0.082 0.314 0.061 0.299 0.053 0.302 0.056 0.302 0.054 0.296 0.064 0.308 0.055 0.306 0.054
?22,1 0.300 0.294 0.122 0.296 0.196 0.307 0.213 0.284 0.152 0.291 0.137 0.298 0.148 0.292 0.151 0.286 0.153 0.288 0.124
?22,2 0.150 0.158 0.131 0.095 0.191 0.120 0.203 0.169 0.150 0.158 0.146 0.145 0.141 0.164 0.140 0.148 0.157 0.158 0.133
?21 0.100 0.092 0.163 0.078 0.229 0.078 0.260 0.100 0.213 0.091 0.185 0.092 0.196 0.108 0.204 0.107 0.210 0.102 0.167
?22 0.000 -0.051 0.194 0.018 0.208 -0.009 0.220 -0.053 0.228 -0.050 0.228 -0.017 0.192 -0.044 0.205 -0.024 0.222 -0.026 0.193
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
76
Table 2.2: Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 2
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.149 0.025 0.165 0.029 0.153 0.026 0.150 0.026 0.150 0.025 0.151 0.026 0.147 0.025 0.151 0.025 0.151 0.025
?11,1 0.500 0.501 0.050 0.521 0.067 0.513 0.062 0.503 0.057 0.503 0.051 0.520 0.065 0.507 0.060 0.502 0.055 0.502 0.052
?11,2 0.200 0.202 0.047 0.171 0.068 0.187 0.057 0.201 0.054 0.200 0.049 0.180 0.061 0.196 0.054 0.198 0.055 0.200 0.047
?11 0.200 0.191 0.099 0.160 0.122 0.177 0.116 0.184 0.111 0.196 0.102 0.163 0.116 0.181 0.112 0.193 0.116 0.197 0.099
?12 0.100 0.070 0.122 0.110 0.124 0.098 0.125 0.070 0.129 0.072 0.127 0.107 0.123 0.099 0.125 0.097 0.128 0.091 0.117
b12 0.300 0.299 0.021 0.310 0.023 0.298 0.021 0.298 0.021 0.299 0.021 0.299 0.021 0.298 0.022 0.300 0.021 0.302 0.020
?22,1 0.300 0.298 0.053 0.295 0.070 0.302 0.057 0.295 0.062 0.298 0.051 0.300 0.067 0.294 0.058 0.296 0.063 0.297 0.053
?22,2 0.150 0.155 0.055 0.142 0.066 0.149 0.060 0.159 0.060 0.154 0.055 0.148 0.064 0.157 0.058 0.153 0.062 0.154 0.054
?21 0.100 0.087 0.098 0.081 0.120 0.085 0.116 0.084 0.109 0.084 0.103 0.086 0.117 0.094 0.121 0.097 0.116 0.099 0.104
?22 0.000 -0.045 0.148 -0.015 0.160 -0.021 0.163 -0.042 0.163 -0.040 0.167 -0.014 0.158 -0.019 0.166 -0.016 0.166 -0.024 0.151
Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.149 0.061 0.226 0.092 0.179 0.065 0.151 0.064 0.155 0.065 0.136 0.061 0.142 0.063 0.164 0.062 0.156 0.065
?11,1 0.500 0.504 0.119 0.631 0.219 0.633 0.227 0.505 0.152 0.516 0.131 0.542 0.176 0.535 0.162 0.507 0.136 0.503 0.121
?11,2 0.200 0.197 0.111 0.018 0.241 0.051 0.220 0.205 0.146 0.186 0.134 0.166 0.176 0.162 0.151 0.184 0.129 0.190 0.114
?11 0.200 0.187 0.152 0.025 0.265 0.039 0.262 0.184 0.200 0.186 0.175 0.148 0.231 0.149 0.216 0.187 0.187 0.192 0.157
?12 0.100 0.070 0.150 0.223 0.203 0.180 0.185 0.062 0.186 0.082 0.162 0.105 0.169 0.113 0.168 0.101 0.158 0.105 0.155
b12 0.300 0.298 0.051 0.356 0.074 0.307 0.057 0.297 0.054 0.303 0.055 0.287 0.051 0.294 0.060 0.307 0.055 0.305 0.051
?22,1 0.300 0.300 0.121 0.298 0.201 0.332 0.221 0.278 0.159 0.297 0.139 0.312 0.149 0.299 0.153 0.288 0.159 0.294 0.127
?22,2 0.150 0.155 0.125 0.073 0.192 0.094 0.201 0.176 0.160 0.155 0.154 0.153 0.143 0.152 0.145 0.148 0.154 0.151 0.127
?21 0.100 0.092 0.161 0.062 0.234 0.048 0.269 0.109 0.226 0.082 0.190 0.076 0.199 0.096 0.208 0.106 0.211 0.104 0.168
?22 0.000 -0.051 0.190 0.034 0.206 0.008 0.215 -0.065 0.229 -0.050 0.222 -0.020 0.185 -0.031 0.203 -0.022 0.219 -0.024 0.186
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
77
Table 2.3: Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 3
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.147 0.023 0.165 0.028 0.153 0.024 0.150 0.025 0.148 0.024 0.150 0.025 0.145 0.025 0.151 0.024 0.149 0.023
?11,1 -0.300 -0.297 0.055 -0.321 0.063 -0.313 0.062 -0.298 0.059 -0.301 0.055 -0.304 0.061 -0.306 0.058 -0.298 0.057 -0.299 0.056
?11,2 -0.200 -0.200 0.059 -0.198 0.064 -0.185 0.063 -0.203 0.065 -0.198 0.060 -0.192 0.067 -0.179 0.065 -0.204 0.063 -0.202 0.060
?11 0.200 0.189 0.090 0.216 0.104 0.208 0.101 0.184 0.100 0.191 0.102 0.190 0.101 0.199 0.100 0.191 0.102 0.195 0.092
?12 0.100 0.069 0.125 0.079 0.146 0.072 0.145 0.067 0.139 0.066 0.132 0.074 0.134 0.076 0.139 0.095 0.140 0.086 0.130
b12 0.300 0.297 0.022 0.313 0.025 0.301 0.022 0.298 0.022 0.299 0.022 0.299 0.022 0.300 0.024 0.301 0.023 0.300 0.022
?22,1 0.300 0.296 0.053 0.293 0.067 0.293 0.060 0.294 0.061 0.295 0.052 0.295 0.068 0.291 0.059 0.296 0.062 0.294 0.052
?22,2 0.150 0.157 0.056 0.154 0.067 0.157 0.059 0.157 0.062 0.156 0.054 0.153 0.067 0.164 0.059 0.155 0.064 0.155 0.056
?21 0.100 0.091 0.092 0.094 0.118 0.099 0.116 0.088 0.102 0.086 0.101 0.089 0.115 0.102 0.113 0.099 0.116 0.098 0.095
?22 0.000 -0.045 0.151 -0.023 0.162 -0.034 0.170 -0.046 0.164 -0.043 0.165 -0.019 0.162 -0.021 0.171 -0.027 0.164 -0.025 0.144
Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.143 0.057 0.228 0.093 0.185 0.069 0.152 0.064 0.152 0.060 0.154 0.059 0.137 0.063 0.162 0.057 0.154 0.058
?11,1 -0.300 -0.291 0.127 -0.421 0.203 -0.417 0.201 -0.297 0.154 -0.320 0.136 -0.331 0.146 -0.324 0.148 -0.311 0.140 -0.307 0.129
?11,2 -0.200 -0.202 0.124 -0.203 0.176 -0.148 0.187 -0.218 0.162 -0.187 0.155 -0.166 0.159 -0.170 0.168 -0.222 0.147 -0.218 0.133
?11 0.200 0.176 0.146 0.295 0.197 0.281 0.192 0.184 0.180 0.210 0.160 0.230 0.153 0.220 0.149 0.200 0.154 0.193 0.148
?12 0.100 0.067 0.182 0.050 0.216 0.020 0.230 0.064 0.231 0.048 0.217 0.048 0.204 0.050 0.221 0.093 0.197 0.095 0.190
b12 0.300 0.293 0.059 0.378 0.092 0.328 0.065 0.298 0.058 0.302 0.056 0.304 0.056 0.306 0.065 0.311 0.056 0.302 0.053
?22,1 0.300 0.291 0.126 0.305 0.187 0.280 0.201 0.284 0.152 0.283 0.139 0.286 0.140 0.280 0.149 0.279 0.159 0.286 0.128
?22,2 0.150 0.163 0.131 0.135 0.188 0.164 0.201 0.160 0.155 0.163 0.141 0.159 0.140 0.169 0.144 0.160 0.160 0.162 0.129
?21 0.100 0.096 0.162 0.090 0.222 0.123 0.247 0.089 0.200 0.102 0.177 0.106 0.189 0.121 0.199 0.113 0.219 0.104 0.164
?22 0.000 -0.056 0.200 -0.008 0.218 -0.046 0.243 -0.054 0.231 -0.050 0.224 -0.028 0.199 -0.050 0.212 -0.026 0.227 -0.031 0.200
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
2.7.4 Performance of Estimators under Strong and Weak Identifications Sce-
nario II
Model Specification: In light of our discussion in Section 2.6.2, we consider the following
model for Scenario II:
y1 =b21y2 + ?11,1W1y1 + ?21,1W1y2
+ c11x1[+ c21x2 + c31x3 + c]41W1x1 + c51W1x2 + c61W1x3 + ?1 (2.70)
y2 =b12y1 + ?22,1W1 + ?22,2W2 y2 + c72x4 + c82x5 + c92x6 + ?2
with the following parameter restrictions:
?21,1 = ?(?11,1 + ?)b21
c41 = ?(?11,1 + ?)c11
(2.71)
c51 = ?(?11,1 + ?)c21
c61 = ?(?11,1 + ?)c31.
Parameter Space: To test on parameter values close to and away from the ?non-identification?
points, we consider ? = 0.4, 0.8, 1 where ? = 0.4 corresponds to the case closest to the ?non-
identification? point and thus the weak identification case; ? = 1 corresponds to the strong
identification case.
?? ??11,1W1 b21In + ?21,1W1 ?
Recall that ? ?S(?, ?) = I2n ? B?, where B? = ?? ??. Again,
b12In ?22,1W1 + ??22,2W2
for the reduced form to exist, we need to ensure the existence of S(?, ?)?1 = ?h=0(B
?)h which
in turn requires ||B?|| < 1 for some induced norm. Again, row-normalized W1 and W2 imply
78
||W1||? = 1 and ||W2||? = 1. To ensure ||B?||? < 1, we need to restrict
||b21In + ?11,1W1 + ?21,1W1||? ? |b21|||In||? + |?11,1|||W1||? + |?21,1|||W1||?
= |b21|+ |?11,1|+ |?21,1| < 1,
||b12In + ?22,1W1 + ?22,2W2||? ? |b12|||In||? + |?22,1|||W1||? + |?22,2|||W2||?
= |b12|+ |?22,1|+ |?22,2| < 1.
This is equivalent to require
max{|b21|+ |?11,1|+ |?21,1|, |b12|+ |?22,1|+ |?22,2|} < 1. (2.72)
The inequalities follows from triangular inequalities for matrix norms. In light of this, we con-
sider the following parameter combinations
1. Deviation parameter ? ? {0.4, 0.8, 1}
2. b21 ? {?0.15, 0.15}, b12 = 0.3
3. ?11,1 ? {?0.3, 0, 0.3, 0.5}, ?21,1 depends on ?11,1, b21 and ? as specified in (2.71)
4. ?22,1 = 0.3, ?22,2 = 0.15
5. c11, c21, c31 = 1; c41, c51, c61 depend on c11,c21, c31, respectively, as well as ?11,1 and ? as
specified in (2.71)
6. c72, c82, c92 = 1
Note that the inequality (2.72) is satisfied with above parameter sets. These parameter com-
binations constitutes 24 different experiment settings. Due to limited space, we report Monte
79
Carlo results for the following three parameter constellations:
Table 2.4 Table 2.5 Table 2.6
b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3
?11,1 = 0.3 ?11,1 = 0.5 ?11,1 = ?0.3
?22,1 = 0.3 ?22,2 = 0.15 ?22,1 = 0.3 ?22,2 = 0.15 ?22,1 = 0.3 ?22,2 = 0.15
and (1) ?21,1 depends on ?11,1, b21 and ? as specified in (2.71); (2) c11, c21, c31 = 1; c41, c51, c61
depend on c11,c21, c31, respectively, as well as ?11,1 and ? as specified in (2.71); (3) c72, c82, c92 =
1 for all cases.
For each parameter constellation, we report results on ? = 1 (the strong identification case)
and ? = 0.4 (the weak identification case).
80
81
Table 2.4: Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 1
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.150 0.026 0.164 0.030 0.155 0.027 0.150 0.026 0.150 0.027 0.153 0.026 0.156 0.026 0.152 0.026 0.152 0.026
?11,1 0.300 0.295 0.048 0.328 0.081 0.319 0.070 0.302 0.072 0.300 0.069 0.297 0.056 0.311 0.053 0.300 0.053 0.298 0.048
?21,1 -0.195 -0.198 0.038 -0.194 0.044 -0.191 0.042 -0.197 0.046 -0.198 0.045 -0.199 0.044 -0.198 0.038 -0.200 0.039 -0.199 0.038
c41 -1.300 -1.298 0.090 -1.307 0.105 -1.307 0.098 -1.299 0.107 -1.297 0.099 -1.286 0.115 -1.303 0.097 -1.298 0.108 -1.300 0.092
c51 -1.300 -1.295 0.101 -1.315 0.113 -1.304 0.102 -1.303 0.118 -1.296 0.104 -1.303 0.127 -1.300 0.102 -1.300 0.119 -1.299 0.105
c61 -1.300 -1.292 0.088 -1.301 0.094 -1.303 0.092 -1.288 0.102 -1.295 0.095 -1.287 0.117 -1.298 0.089 -1.291 0.098 -1.294 0.089
b12 0.300 0.299 0.025 0.314 0.028 0.306 0.025 0.300 0.025 0.300 0.025 0.301 0.025 0.307 0.027 0.302 0.025 0.303 0.024
?22,1 0.300 0.294 0.043 0.299 0.061 0.299 0.056 0.294 0.062 0.295 0.057 0.302 0.049 0.300 0.045 0.299 0.043 0.297 0.043
?22,2 0.150 0.152 0.043 0.164 0.059 0.153 0.053 0.158 0.057 0.152 0.054 0.151 0.050 0.155 0.044 0.153 0.043 0.152 0.044
Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.149 0.026 0.165 0.030 0.156 0.027 0.149 0.026 0.149 0.026 0.153 0.026 0.156 0.026 0.152 0.026 0.152 0.026
?11,1 0.300 0.288 0.068 0.414 0.207 0.382 0.175 0.286 0.186 0.295 0.171 0.294 0.078 0.317 0.079 0.297 0.073 0.296 0.068
?21,1 -0.105 -0.106 0.034 -0.112 0.039 -0.108 0.037 -0.109 0.040 -0.108 0.036 -0.110 0.044 -0.114 0.036 -0.110 0.039 -0.108 0.034
c41 -0.700 -0.690 0.108 -0.787 0.164 -0.767 0.139 -0.695 0.183 -0.707 0.155 -0.689 0.134 -0.706 0.107 -0.690 0.121 -0.694 0.109
c51 -0.700 -0.685 0.111 -0.788 0.178 -0.759 0.151 -0.683 0.183 -0.700 0.159 -0.700 0.139 -0.711 0.109 -0.701 0.119 -0.691 0.110
c61 -0.700 -0.687 0.095 -0.781 0.167 -0.761 0.143 -0.687 0.190 -0.696 0.159 -0.682 0.117 -0.712 0.100 -0.684 0.106 -0.691 0.097
b12 0.300 0.299 0.023 0.311 0.025 0.304 0.023 0.300 0.023 0.300 0.024 0.300 0.024 0.306 0.025 0.302 0.024 0.304 0.023
?22,1 0.300 0.296 0.041 0.289 0.063 0.299 0.055 0.294 0.061 0.295 0.054 0.299 0.047 0.297 0.044 0.298 0.043 0.298 0.043
?22,2 0.150 0.153 0.041 0.160 0.058 0.148 0.054 0.156 0.060 0.153 0.055 0.151 0.048 0.152 0.043 0.152 0.044 0.152 0.041
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
82
Table 2.5: Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 2
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.150 0.025 0.164 0.030 0.155 0.027 0.150 0.026 0.149 0.027 0.153 0.026 0.156 0.027 0.152 0.027 0.152 0.026
?11,1 0.500 0.495 0.038 0.526 0.059 0.518 0.050 0.501 0.055 0.502 0.048 0.500 0.046 0.511 0.042 0.500 0.041 0.498 0.039
?21,1 -0.225 -0.229 0.040 -0.219 0.047 -0.214 0.047 -0.230 0.047 -0.226 0.048 -0.229 0.045 -0.225 0.039 -0.231 0.040 -0.230 0.040
c41 -1.500 -1.500 0.093 -1.492 0.098 -1.502 0.098 -1.498 0.105 -1.501 0.100 -1.486 0.116 -1.502 0.093 -1.497 0.103 -1.501 0.094
c51 -1.500 -1.498 0.104 -1.503 0.108 -1.503 0.103 -1.500 0.114 -1.499 0.105 -1.499 0.127 -1.499 0.104 -1.496 0.119 -1.500 0.106
c61 -1.500 -1.493 0.092 -1.488 0.100 -1.496 0.090 -1.489 0.105 -1.495 0.089 -1.489 0.121 -1.495 0.094 -1.489 0.105 -1.494 0.089
b12 0.300 0.299 0.023 0.317 0.029 0.309 0.026 0.302 0.026 0.301 0.026 0.302 0.025 0.309 0.026 0.303 0.024 0.304 0.023
?22,1 0.300 0.294 0.044 0.302 0.061 0.304 0.058 0.297 0.063 0.298 0.059 0.304 0.049 0.303 0.046 0.299 0.044 0.298 0.044
?22,2 0.150 0.151 0.044 0.170 0.063 0.155 0.055 0.155 0.062 0.155 0.055 0.152 0.057 0.157 0.048 0.150 0.046 0.153 0.046
Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.150 0.026 0.166 0.030 0.157 0.027 0.149 0.027 0.150 0.026 0.153 0.026 0.157 0.026 0.152 0.026 0.152 0.025
?11,1 0.500 0.486 0.057 0.617 0.168 0.586 0.141 0.489 0.140 0.501 0.125 0.500 0.064 0.522 0.066 0.497 0.056 0.494 0.055
?21,1 -0.135 -0.137 0.035 -0.132 0.036 -0.130 0.038 -0.139 0.040 -0.136 0.039 -0.139 0.043 -0.143 0.036 -0.141 0.039 -0.140 0.034
c41 -0.900 -0.891 0.097 -0.974 0.133 -0.956 0.114 -0.892 0.145 -0.906 0.123 -0.885 0.127 -0.905 0.095 -0.895 0.111 -0.895 0.100
c51 -0.900 -0.890 0.110 -0.971 0.146 -0.947 0.121 -0.892 0.156 -0.899 0.124 -0.901 0.130 -0.907 0.104 -0.900 0.115 -0.893 0.106
c61 -0.900 -0.885 0.094 -0.963 0.124 -0.954 0.119 -0.892 0.148 -0.899 0.124 -0.886 0.113 -0.908 0.094 -0.888 0.104 -0.891 0.093
b12 0.300 0.298 0.023 0.313 0.027 0.306 0.026 0.300 0.024 0.300 0.024 0.301 0.024 0.308 0.026 0.303 0.024 0.305 0.023
?22,1 0.300 0.296 0.042 0.292 0.062 0.304 0.056 0.295 0.061 0.295 0.055 0.300 0.048 0.298 0.045 0.299 0.043 0.298 0.043
?22,2 0.150 0.153 0.041 0.160 0.059 0.145 0.054 0.156 0.058 0.153 0.055 0.151 0.049 0.152 0.044 0.151 0.043 0.152 0.042
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
83
Table 2.6: Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 3
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.151 0.025 0.161 0.028 0.154 0.027 0.150 0.026 0.150 0.026 0.153 0.026 0.155 0.026 0.152 0.027 0.152 0.026
?11,1 -0.300 -0.304 0.061 -0.314 0.092 -0.315 0.078 -0.299 0.101 -0.304 0.084 -0.313 0.070 -0.310 0.063 -0.302 0.065 -0.300 0.060
?21,1 -0.105 -0.107 0.035 -0.115 0.040 -0.109 0.038 -0.107 0.041 -0.106 0.038 -0.108 0.044 -0.111 0.035 -0.111 0.039 -0.110 0.036
c41 -0.700 -0.692 0.102 -0.699 0.134 -0.691 0.113 -0.697 0.132 -0.694 0.117 -0.687 0.134 -0.692 0.106 -0.695 0.118 -0.696 0.103
c51 -0.700 -0.696 0.106 -0.702 0.127 -0.689 0.115 -0.698 0.130 -0.702 0.117 -0.702 0.138 -0.694 0.105 -0.702 0.115 -0.699 0.110
c61 -0.700 -0.692 0.091 -0.686 0.118 -0.682 0.114 -0.695 0.129 -0.691 0.117 -0.676 0.121 -0.691 0.096 -0.686 0.107 -0.695 0.091
b12 0.300 0.298 0.023 0.310 0.025 0.302 0.023 0.299 0.023 0.300 0.023 0.300 0.023 0.305 0.024 0.301 0.022 0.302 0.024
?22,1 0.300 0.295 0.042 0.294 0.057 0.289 0.055 0.293 0.060 0.295 0.051 0.300 0.047 0.296 0.043 0.296 0.043 0.297 0.041
?22,2 0.150 0.153 0.041 0.159 0.056 0.160 0.052 0.156 0.059 0.156 0.053 0.152 0.050 0.156 0.043 0.153 0.043 0.151 0.041
Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.150 0.026 0.161 0.028 0.154 0.026 0.150 0.027 0.150 0.026 0.153 0.026 0.155 0.026 0.152 0.026 0.152 0.026
?11,1 -0.300 -0.306 0.079 -0.355 0.201 -0.368 0.185 -0.302 0.238 -0.319 0.198 -0.318 0.092 -0.318 0.090 -0.305 0.086 -0.302 0.081
?21,1 -0.015 -0.015 0.036 -0.020 0.039 -0.017 0.037 -0.019 0.040 -0.018 0.039 -0.017 0.046 -0.018 0.037 -0.019 0.041 -0.018 0.034
c41 -0.100 -0.094 0.119 -0.058 0.216 -0.057 0.176 -0.102 0.243 -0.088 0.213 -0.082 0.146 -0.086 0.123 -0.091 0.133 -0.091 0.121
c51 -0.100 -0.091 0.117 -0.054 0.213 -0.043 0.184 -0.090 0.248 -0.094 0.207 -0.091 0.150 -0.087 0.120 -0.096 0.126 -0.097 0.119
c61 -0.100 -0.096 0.102 -0.050 0.206 -0.051 0.183 -0.093 0.238 -0.086 0.208 -0.074 0.130 -0.085 0.110 -0.085 0.120 -0.093 0.103
b12 0.300 0.298 0.023 0.309 0.023 0.302 0.022 0.299 0.022 0.300 0.022 0.300 0.023 0.304 0.024 0.302 0.022 0.302 0.024
?22,1 0.300 0.296 0.040 0.288 0.061 0.286 0.057 0.295 0.061 0.294 0.053 0.299 0.047 0.293 0.045 0.297 0.043 0.296 0.042
?22,2 0.150 0.153 0.041 0.161 0.058 0.161 0.053 0.157 0.060 0.155 0.052 0.153 0.050 0.155 0.044 0.152 0.042 0.152 0.042
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
2.7.5 Heteroskedasticity
The ML estimator is in general inconsistent under heteroskedasticity and could exhibit con-
siderable bias. The moment conditions of the GMM estimators considered in this dissertation are
robust to heteroskedasticity and hence the GMM estimators remain consistent. To document this
advantage of the considered GMM estimators over the ML estimator, we conduct the following
experiments.
Model Specification
We consider a model specification similar to that considered in Scenario I but with one less
exogenous variables in both equation 1 and 2:
[ ]
y1 =b21y2 + [?11,1W1 + ?11,2W2] y1 + c11x1 + c21x2 + u1,
y2 =b[12y1 + ?22,1W1 ]+ ?22,2W2 y2 + c32x4 + c42x5 + u2,
ug = ?g1W1 + ?g2W2 ug + ?g, g = 1, 2.
The xk?s (k = 1, 2, 4, 5) are columns of the matrix of exogenous variables (X) that we generated
before.
Spatial Weights Matrices:
We adopt the ?dumbbell-shaped? design of the weight matrix that considered in Arraiz,
Drukker, Kelejian, and Prucha (2010) with n = 500 but adapted to the current model of second
order spatial lags. To do so, we first generate a Wn and let (approximately) the first 100 and the
last 100 units to have 10 neighbors ahead and 10 neighbors behind; we let (approximately) the
middle 300 units to be 3-ahead and 3-behind. We then adopt the ?ring? design and generate W1,n
with the first 100 and the last 100 units to have 5 neighbors ahead and 5 neighbors behind with
84
the middle (approximately) 300 units to be 1-ahead and 1-behind. To generate W2,n, we net out
W1,n from Wn and thus each of the first and the last 100 units in W2,n has the 6th to 10th unit
ahead and behind as neighbors while the middle 300 units has the 2nd and the 3rd unit ahead and
behind as neighbors.
For clarity, we present an example of ?dumbbell-shaped? W1,n with n = 30 as follows:
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.2 0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2 0.2
0.5 0.5
0.5 0.5
0.5 0.5
0.5 0.5
0.5 0.5
0.5 0.5
0.5 0.5
0.5 0.5
0.2 0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2 0.2
0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
Disturbances
85
To generate heteroskedastic disturbances, we take the i?th element of the innovation vector
?g,n as
?g,n,i = ?n,i?g,n,i
? d? = ? n,in,i n
j=1 dn,j/n
where ?g,n,i is i.i.d. N(0, 1) for g = 1, 2 and the correlation between ?1,n and ?2,n is 0.5. We
denote dn,i as the number of neighbors the i-th unit has, which depends on the relative position of
the i-th unit in the network. We note that the average standard deviations of the elements of ?g,n
is ?. By setting ? = 1, we maintain the average standard deviation of ?i,g to be 1 for i = 1, ..., n
and g = 1, 2, which is the same as the ?? in the homoskedasticity case.
Parameter Space
We report the Monte Carlo results for the case of [c11, c21]? = [1, 1]? and [c ?32, c42] = [1, 1]?
(i.e., the strong identification case) with the four parameter constellations in the tables below:
Para 1 Para 2
b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3
?11,1 = 0.3 ?22,1 = 0.3 ?11,1 = 0.3 ?22,1 = 0.3
?11,2 = 0 ?22,2 = 0.15 ?11,2 = 0.2 ?22,2 = 0.15
?11 = ?0.2 ?21 = 0.1 ?11 = ?0.2 ?21 = 0.1
?12 = 0.1 ?22 = 0 ?12 = 0.1 ?22 = 0
Para 3 Para 4
b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3
?11,1 = 0.3 ?22,1 = 0.3 ?11,1 = 0.3 ?22,1 = 0.3
?11,2 = 0 ?22,2 = 0.15 ?11,2 = 0.2 ?22,2 = 0.15
?11 = 0.2 ?21 = 0.1 ?11 = 0.2 ?21 = 0.1
?12 = 0.1 ?22 = 0 ?12 = 0.1 ?22 = 0
86
To save on space, we only report the estimates of the parameters in the first equation of
the model. We document a sample of Monte Carlo results under heteroskedasticity in Table 2.7
below.
2.7.6 Remarks
Under homoskedasticity and strong identification cases, all estimators considered in the
study perform reasonably close. As expected, MLE is the most efficient estimator and shows the
smallest RMSE under nearly all scenarios and parameter constellations. Both LQ-GS3SLS and
LQ-GSFIVE perform very close to MLE in these scenarios. In addition, under strong identifi-
cation cases, estimators utilize only the linear moments also perform close to their counterparts
utilize both the linear and the quadratic moments. Specifically, GS3SLS and GSFIVE show sim-
ilar RMSE in comparision to, e.g., LQ-GS3SLS and LQ-GSFIVE. Finally, since the disturbance
terms ?g,n and ?h,n are correlated across equations (i.e., g ?= h), full information estimators in
general outperform their limited information counterparts throughout the experiments. Thus, we
see that GSFIVE and LQ-GSFIVE outperform GSLIVE and LQ-GSLIVE, respectively.
As remarked, MLE is in general inconsistent under heteroskedastic disturbances. MLE
shows significant bias when estimating for the spatial autoregressive parameters in the regression
equation (i.e., the ??s), as well as the spatial autoregressive parameters in the disturbances (i.e.,
the ??s). Since the other estimators that were implemented are robust to heteroskedasticity, they
show much smaller bias than MLE.
Under the weak identification cases, LQ-GSFIVE performs very close to MLE in both Sce-
narios I and II that we considered. In particular, we note that LQ-GSFIVE shows considerable ef-
87
ficiency gain over LQ-GS3SLS, and that LQ-GSLIVE outperforms LQ-GS2SLS under the weak
identification too. In general, similar observations hold when we compare GS2SLS with GSLIVE
and GS3SLS with GSFIVE. These results highlight the advantage of using approximated optimal
instruments when constructing the moments and thus better exploits the underlying structure of
parameters in the reduced form model. In addition, we see that LQ-GSLIVE (LQ-GSFIVE) in
general outperforms GSLIVE (GSFIVE) under weak identification cases. This observation is in
line with existing literature that quadratic moments could help with identification when linear
moments are weak. Not surprisingly, such efficiency gain in finite sample is not significant under
strong identification cases.
In Chapter A.4, we document additional simulation results of settings with the ?dumbbell-
shaped? design of spatial weights matrices and correlated xn?s. In general, the above observations
also hold in those cases.
88
89
Table 2.7: Median and RMSE of Scenario I under Heteroskedasticity, Parameter Constellation 1-4
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Para 1 Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.162 0.028 0.163 0.029 0.154 0.025 0.152 0.028 0.152 0.026 0.162 0.030 0.155 0.028 0.155 0.028 0.154 0.025
?11,1 0.300 0.227 0.078 0.303 0.045 0.307 0.043 0.302 0.047 0.301 0.039 0.298 0.046 0.300 0.042 0.301 0.046 0.300 0.038
?11,2 0.000 0.042 0.053 -0.015 0.039 -0.010 0.034 -0.003 0.035 -0.003 0.032 -0.011 0.037 -0.006 0.035 -0.006 0.035 -0.005 0.032
?11 -0.200 -0.089 0.119 -0.233 0.144 -0.234 0.145 -0.211 0.097 -0.212 0.096 -0.219 0.127 -0.218 0.139 -0.218 0.136 -0.210 0.122
?12 0.100 0.035 0.109 0.099 0.123 0.092 0.121 0.085 0.132 0.085 0.125 0.090 0.134 0.091 0.141 0.095 0.134 0.095 0.121
TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Para 2 Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.164 0.030 0.163 0.030 0.155 0.026 0.152 0.028 0.152 0.027 0.162 0.030 0.155 0.030 0.154 0.029 0.154 0.026
?11,1 0.300 0.233 0.072 0.304 0.045 0.307 0.043 0.303 0.044 0.301 0.038 0.299 0.043 0.300 0.041 0.302 0.044 0.301 0.039
?11,2 0.200 0.242 0.051 0.185 0.039 0.190 0.036 0.196 0.036 0.197 0.031 0.190 0.039 0.194 0.035 0.194 0.036 0.195 0.033
?11 -0.200 -0.092 0.117 -0.233 0.147 -0.233 0.145 -0.211 0.097 -0.213 0.097 -0.219 0.130 -0.217 0.138 -0.216 0.139 -0.208 0.124
?12 0.100 0.042 0.105 0.097 0.125 0.090 0.119 0.084 0.131 0.084 0.124 0.088 0.133 0.090 0.144 0.099 0.133 0.100 0.123
TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Para 3 Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.155 0.032 0.165 0.035 0.155 0.033 0.152 0.035 0.152 0.033 0.164 0.036 0.155 0.037 0.156 0.035 0.154 0.034
?11,1 0.300 0.269 0.042 0.302 0.040 0.305 0.037 0.301 0.040 0.300 0.037 0.300 0.041 0.302 0.037 0.299 0.040 0.299 0.037
?11,2 0.000 0.018 0.039 -0.018 0.044 -0.010 0.041 -0.003 0.042 -0.003 0.040 -0.015 0.043 -0.008 0.042 -0.005 0.043 -0.004 0.041
?11 0.200 0.150 0.076 0.177 0.121 0.178 0.119 0.205 0.099 0.199 0.098 0.183 0.111 0.189 0.123 0.193 0.121 0.198 0.103
?12 0.100 0.062 0.089 0.104 0.110 0.095 0.107 0.109 0.109 0.096 0.110 0.099 0.109 0.102 0.118 0.098 0.121 0.095 0.106
TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Para 4 Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.159 0.034 0.166 0.036 0.155 0.035 0.152 0.035 0.152 0.034 0.165 0.037 0.156 0.039 0.156 0.036 0.154 0.034
?11,1 0.300 0.275 0.037 0.303 0.040 0.305 0.038 0.302 0.040 0.300 0.036 0.301 0.039 0.303 0.038 0.300 0.041 0.299 0.036
?11,2 0.200 0.210 0.035 0.183 0.045 0.190 0.041 0.197 0.042 0.197 0.040 0.185 0.044 0.191 0.041 0.194 0.043 0.196 0.039
?11 0.200 0.146 0.078 0.177 0.122 0.179 0.122 0.205 0.102 0.199 0.097 0.183 0.113 0.189 0.124 0.196 0.119 0.199 0.106
?12 0.100 0.072 0.086 0.102 0.109 0.094 0.105 0.108 0.108 0.094 0.107 0.097 0.109 0.101 0.118 0.101 0.118 0.100 0.112
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
2.8 Concluding Remarks
In this chapter, we proposed a new class of generalized method of moments (GMM) esti-
mators for the simultaneous equation models (SEM) with higher order network interdependence.
In essence, these estimators utilize approximations to the optimal instruments towards construct-
ing the linear moments, in the same spirit to Lee (2003) and Kelejian, Prucha, and Yuzefovich
(2004). We also considered the GMM estimators that utilize both the linear moments and the
quadratic moments that originate from the scores of the log-likelihood function. Towards deriv-
ing the estimators, we showed that (1) the linear parts of the ML scores can be viewed as a set of
estimator generating equations from which a generic form of instrumental variable (IV) estima-
tors can be derived. This result extends on its relevant counterparts in Hausman (1975), Hendry
(1976) and Prucha and Kelejian (1984) in the context of classical SEMs; (2) The new estimators
incorporate the underlying ideas of the LIVE and the FIVE estimators proposed by Brundy and
Jorgenson (1971) in the context of classical SEMs. Towards constructing the instruments, the
new estimators take into account the nonlinear a prior parameter restrictions in the reduced form
when estimating for the expected value of the endogenous components. Furthermore, our new
GMM estimators that utilize both the linear and the quadratic moments remain robust to het-
eroskedasticity of unknown form and computationally feasible even when sample size (i.e., the
size of the networks) becomes large. The Monte Carlo results show that the new GMM estimators
outperform their existing counterparts, e.g., the 2SLS-type and 3SLS-type estimators considered
in Drukker, Egger, and Prucha (2022), when the instruments are weak.
90
Chapter 3: Empirical Application: Demand Estimation for Retail Gasoline Mar-
ket with Network Dependence
3.1 Introduction
In this chapter, we illustrate the empirical relevance of the considered model with spa-
tial/network interactions and estimation methods with approximated optimal instruments, i.e.,
the GSLIVE and the GSFIVE estimators. Specifically, we estimate a demand system with a
spatial network component. The example we choose is the retail gasoline market of several sub-
regions of Greater Vancouver, Canada. We seek to estimate the station-level demand elasticities
as well as the (spatial) elasticity of substitution under a variety of popular network structures
based on different proximity measures. Demand elasticity for gasoline at aggregate level are well
documented in the literature, while estimates at station level are relatively scarce, perhaps due
to data limitations.1 We also compute the impact measures in spirit of Anselin et al. (2001) that
interprets the estimates of the coefficients on the exogenous regressors in network models.
Changes in gasoline prices attract a lot of attention from consumers and regulators for sev-
eral reasons. Firstly, households spend a considerable share of their income on gasoline products.
According to U.S. Energy Information Administration (EIA), in 2017, the average U.S. house-
1For some of the recent works on aggregate demand elasticity, see Hughes et al. (2008), Park and Zhao (2010),
Levin et al. (2017), among others.
91
hold expenditure on gasoline is $1, 977, which is equivalent to about 4% of annual household
expenditure. For Canada, gasoline (including diesel) also accounts for about 50% of total energy
expenditure per household. Second, pricing of gasoline in the retail market is relatively transpar-
ent and but changes frequently. It is common to observe intra-day fluctuations in retail prices for
over 5%.
Classical spatial competition models, e.g., the Hotelling model, typically assume that the
cost of switching stations is related to the physical location of firms. Thus firm locations are
often treated as the unique aspect of product differentiation in these models. Our application fits
the context of spatial competition models reasonably well. Regular retail gasoline is a nearly
homogeneous good in terms of chemistry contents. However, gas stations differ in terms of ge-
ographical location as well as station attributes, e.g., retail brand and menu of services offered
at the stations, and thus create product differentiation. In the context of retail gasoline markets,
consumers face travel cost or search cost when switching between gasoline stations. This allows
retailers to exercise local market power and generate price dispersion. In other words, these costs
lead consumers to consider nearby gas stations as close substitutes after controlling for station
characteristics or perceived product quality (e.g., brand name). Given that spatially differenti-
ated gas stations compete with the neighboring stations in price, we think that the equilibrium
prices and sales volume of all stations are simultaneously determined in a competitive system.
Therefore, it could be desirable to account for the network structure explicitly when modeling
the demand system for the retail gasoline markets. We consider a theoretical model of spatial
competition based on Pinkse and Slade (2004), in which sellers are downstream firms and buyers
are households or individuals. This model is simple yet flexible enough to allow for spatial dif-
ferentiation among individual stations and thus captures the main feature of spatial competition
92
in this market. We then deduce the econometric specification for the demand equation that maps
to the simultaneous equation model with spatial dependence (SE-SARAR) that considered in the
previous chapters of the dissertation.
For the study of competition in the retail gasoline market, properly defining the extent of
market is important for obtaining consistent estimates of structural parameters. For our applica-
tion, this issue is equivalent to properly specify the spatial network matrix Wn that appear in the
demand equation. However, as discussed in the theoretical chapters of this dissertation, construc-
tion of the spatial network matrix often requires prior knowledge of the market as well as certain
assumptions about the nature of the competition. For example, in the literature, empirical re-
searchers typically assume that individual stations only compete with their immediate neighbors
but not stations that located further than certain threshold distance, e.g., 2-miles radius. In this ap-
plication, we consider six different metrics in constricting the spatial network matrices that based
on different measures of closeness. For example, we consider measures like common boundaries,
same street dummies, travel distance, nearest neighbors and other hybrid measures. We find that
metrics based on related measures often yield similar results for the own-price elasticity as well
as the spatial elasticity of substitution.
As noted in, e.g., Anselin et al. (2001), interpreting the estimated coefficients of regres-
sors of exogenous variables are less direct in the presence of network structure Wn?s than the
regressions without network interactions at individual level. For example, if the change in k-th
exogenous variable of station i has an effect via spatial correlation on other stores and they, in
turn, feedback to store i, the corresponding coefficient say ?k does not denote the total effect
of a unit change in xk on the endogenous variables, e.g., sales volume. Therefore, we compute
the direct and the indirect impact measures that have been adopted by the spatial literature for
93
consistent interpretations of the coefficients of station and market characteristics in our spatial
network model.
The rest of this chapter is organized as follows: in Section 3.2, we discuss the literature
on price and spatial competition in retail gasoline markets. We highlight how our application
relates to and differs from these existing works. In Section 3.3, we present our adaptation of the
theoretical model considered in Pinkse and Slade (2004) as well as the econometric specification
that will be used for estimation. We also describe in details the six different measures of closeness
and their corresponding spatial network matrices in this section. Due to limited space however,
we focus on the common boundary measure in the main text and document results associated with
other metrics to the Chapter B. In Section 3.4, we describe the data set that we assembled from
various sources, including the online platform Gasbuddy.com, the marketing agency Kalibrate,
as well as the 2016 Canadian Census. In Section 3.5, we discuss the instruments that we will be
using to address the endogeneity of the prices. Section 3.6 documents the estimation results as
well as the impact measures computed based on these estimates. Section 3.7 concludes.
3.2 Related Literature
The literature on price competition in retail gasoline markets is vast.2 One major line in the
literature concerns the observed dynamics of retail prices. Eckert (2003), Noel (2007) and Doyle,
Muehlegger, and Samphantharak (2010) find that Edgeworth cycles in retail gasoline prices are
associated with lower market concentration and greater presence of independent stations that
operate convenience stores. Noel (2007) and Atkinson (2009) use high frequency station-level
2We only provide a partial review here due to limited space. For a more comprehensive review, please refer to,
e.g., Eckert (2013).
94
retail price data and find that price cuts in an Edgeworth cycle are typically initiated by smaller
stations and price restorations are lead by larger brands. Barron, Umbeck, and Waddell (2008)
and Slade (1992) focus on dynamic price responses between neighboring stations. Atkinson,
Eckert, and West (2014) studied the price pattern and volatility changes in Canadian cities with
high-frequency price data obtained from GasBuddy.com. Compared to these studies, our empiri-
cal study abstracts from the dynamic aspect of stations? pricing strategies but with an emphasize
on the network generated cross-sectional interdependence in demand. As discussed in below,
our study also differs from the existing empirical works that focuses on spatial price competition
and/or strategic pricing behaviors among individual stations.
There have been many works studied determinants of price dispersion and uniformity, uti-
lizing station level price data. With city level data, Sen (2003, 2005) reveal that wholesale prices
and share of smaller firms in the market play major roles in determining retail price levels. There
are mixed results about the effect of local concentration measure on retail prices. Van Meer-
beeck (2003), Barron, Taylor, and Umbeck (2004), Eckert and West (2004) and Go?tz and Gugler
(2006), among others, find that higher station density, measured by number of stations within a
radius of certain travel distance, is associated with lower prices and a lower level of price dis-
persion. While Hosken, McMillan, and Taylor (2008) report no association between local station
density and price when all brands are included in the regression, but distance to the closest sta-
tions shows a positive sign on price. With a spatial lag model Pennerstorfer (2009) finds that
competition-increasing effect of independent retails is muted by a ?composition effect?, which
implies branded stations can charge higher prices when a local market is populated by unbranded
stations that are perceived to be of lower quality. There are also a number of works adopt con-
centration measure not necessarily related to travel distance, but based on spatial adjacency.
95
Pennerstorfer and Weiss (2013) document that increase in spatial clustering of same brand sta-
tions reduces degree of competition between firms and increases equilibrium prices. Clemenz
and Gugler (2006) find evidence that concentration within a station?s ZIP code is associated with
higher margins. Using price data in Sheffield, England, Ning and Haining (2003) also find ev-
idence of positive association between a station?s price and prices of stations in the same local
cluster. These works motivate the specifications of the spatial network matrices Wn in our empir-
ical study. In general, we construct our Wn?s with measures of closeness based on the presence
of immediately adjacent stations (common boundary), a radius of 2-mile travel distance, or the
presence of other stations on the same street. Overall, we find that the spatial parameter is positive
and significant for the specifications based on common boundaries or same street measures, but
tend to be smaller for measures based on travel distance.3 We interpret these results as evidence
that stations tend to compete more rigorously with their neighbors or those stations located along
a main commuter route, rather than all stations within an arbitrary area. However, as discussed,
the econometric model we adopt differs from these works so this should not be interpreted as
a direct comparison. Eckert and West (2004, 2005), Ning and Haining (2003), Barron, Taylor,
and Umbeck (2004) and Hosken, McMillan, and Taylor (2008), among others, also find evidence
between the services offered at retail stations and price levels. We would control for these factors
as station characteristics in this empirical study.
Instead of aggregate demand elasticity, we focus on the demand estimation at the level of
individual stations. However, results at station level are scarce, possibly due to data limitation.4
Houde (2012) used a relatively long panel of station-level prices and sales volume data for Quebec
3To see this, one may compare results in Table 3.5 and Table B.7 in the Chapter B.4.
4For example, it is often costly to obtain accurate data on sales volume at station level even at monthly frequency.
96
City to estimate a structural spatial model taking into account road-network and commute patterns
of residents.5 6 In the current study, we compliment the price data with sales data which enables
us to estimate the price elasticities. In comparison to Houde (2012), we use a much simplified
framework with network structure being explicitly specified. We note that our results imply that
demand is less elastic than that in Houde (2012). One possible explanation is that his multi-
address model allows consumers to purchase gasoline from stations along commuting route in
addition to those near their home and thus each station in the market faces more competition.
From modeling perspective, his model allows for a more flexible substitution pattern. In contrast,
we adopt, e.g., the common boundary measure when construct network matrix W , and thus the
underlying assumption is more align with the single address model in Houde (2012)?s terms.
However, one redeeming merit of the current framework is much less demanding on data for
road structure and auxiliary information, e.g., traffic flows, commute pattern of residents, etc.,
that may be unavailable for many interested markets. Of course, the current model is also much
less computational expensive. Therefore, the class of network models considered in this paper
may still be appealing in early stages of policy analysis on a differentiated product market. We
will provide more details in below after presenting our econometric model.
5For work involves estimation of a structural model, see also Manuszak (2010) uses data on volumes, prices and
characteristics for stations in Maui and Kauai to estimate a model designed for upstream merger.
6Houde (2012) focus primarily on the multi-address model mainly since it allows consumers to purchase gasoline
from stations along commuter routes or shopping paths. Thus it is reasonable to assume that multi-address model
is more realistic than the traditional single-address model which assumes consumers to purchase gasoline only from
stations near residence.
97
3.3 Model
3.3.1 Theoretical Motivation
Following Pinkse and Slade (2004), the demand model is based on a linear-quadratic indirect-
utility function in which the prices of the differentiated products as well as individual incomes
have been normalized by the price of the outside good. For the ease of aggregation to station-
level demands, we assume the individual indirect-utility functions are in Gorman polar form and
thus aggregation does not depend on the distribution of (unobserved) consumer heterogeneity or
income levels. In this setting, the aggregate-demand equation can be shown to be linear in both
(log) normalized prices and income.7 Demand for station i is then given by
?
ln(qi,n) = ai,n + ? wij,nln(pj,n) + ?ln(yi,n), i = 1, . . . , n (3.1)
j
where wij,n is the ij-th element in n?n matrix Wn and pj,n is the j-th element in the (normalized)
vector of prices pn, yi,n denotes the (aggregate) income of the consumers within the census tract
on which the station i resides. For generality, it can be assumed that ai,n is a linear function of
the station i?s characteristics xi,n, i.e. ai,n = xi,n?x.
As suggested in, e.g., Pinkse et al. (2002) and Pinkse and Slade (2004), the diagonal ele-
ments of Wn, which can be interpreted as the own-price elasticities, are also assumed to depend
on the station characteristics, or wii,n(xi,n). The off-diagonal elements of Wn represent cross-
price elasticities and are assumed to be functions of a vector of measures of the distance between
stations by some metric. In Pinkse et al. (2002)?s semi-parametric settings, wij,n is assumed to
7Please refer to Chapter B.1 for more details.
98
?
be a series of the form k ?kdk,ij where dk,ij is the k-th basis function of some distance measure
and ?k is its parameter to be estimated. Hence, to adapt their model to our parametric scheme,
we normalize Wn = (wij,n) such that w ?ii,n = b and wij,n = ?wij,n for i =? j, where b, ? are
parameters and w?ij,n?is a (nonlinear) function of some distance measu?re. In other words, we de-
compose the term ? j wij,nln(pj,n) into the sum of bln(pi,n) and ?
?
j ?=i wij,nln(pj,n). Thus b
represents the own-price elasticity of station i and ? is the spatial autoregressive parameter that
represents the cross-station elasticity of substitution. In the following, we assume elements in
Wn are exogenous and w 8ii,n = 0 for i = 1, . . . , n by abuse of notation. Collecting ai,n = xi,n?x
and ?ln(yi,n) into Xi,n?, we can then stack the demand schedule (3.1) for individual stations over
i and rewrite the demand system with matrix notation:
( )
ln qn(pn) = bln(pn) + ?Wnln(pn) +Xn? + un, (3.2)
where Xn includes a constant term and the set of station as well as local market characteristics.
In addition, un captures unobserved station or regional characteristics.
We note that un can be heteroskedastic and correlated across stations. To cope with these
possibilities, we assume that un follows a spatial autoregressive process:
un = ?1Wnun + ?n,
where Wnun captures the cross-sectional correlation between stations. Of course, the estima-
tion theory we proposed in the previous chapters accommo?date this specification. We assume8Note that in (3.1), wii,n ?= 0 in general. In the decomposition of ? ?j wij,nln(pj,n), wii = 0 and thus the Wn
in (3.2) in fact should be denoted as W ? ?n . We drop the for the presentation that follows.
99
E[?i,n|Xn,Wn] = 0, which in turn implies that the unobserved characteristics un are mean inde-
pendent of the observed characteristics, i.e., E[ui,n|Xn,Wn] = 0.
For the supply side, we assume each station faces a marginal cost of the form
ci = exp(x
c
i,n? + vi), (3.3)
where xci,n represent factors that shifts marginal costs of station i. In principle, we can allow for
the marginal cost factors to be spatially correlated. In other words, let xk,n be one ?basic? factor
and one column in xcn could be Wnxk,n.
Above formulation allows for the existence of a Nash equilibrium in pure strategies, such
that the prices satisfy the first-order condition
dpi,n
pi,n = ci,n ? qi,n, (3.4)
dqi,n
which in turn can be derived as the price response function from a profit-maximizing problem of
standard form:
maxp ?i(pn) = (pi,n i,n ? ci,n)qi,n(pn)? Fi,n
with the quantity of demand is given by
? ? ?
qi,n(p, n) = exp??i + bln(p ?i,n) + ? wij,nln(pj,n) + xi,n? + ui,n ,
j
as discussed and the term Fi,n denotes the fixed cost. Note that
dpi,n q = p /b under our
dq i,n i,ni,n
100
specification, and thus one may alternatively work with the log-transformed (3.4):
ln(pi,n) = ? + ln(ci,n) = ? + x
c
i,n? + vi,n, (3.5)
where ? = ln(1 + 1).
b
3.3.2 Econometric Specification
In light of above discussion and (3.2), we consider the following model specification of the
supply and demand system with network component
ln(qn) = bln(pn) + ?Wnln(pn) +Xn? + un, (3.6)
un = ?1Wnun + ?n.
For easier interpretation, we adopt a log-log specification of the demand equation. We let
ln(qn) denote the (log) quantity of demand at price ln(pn); Xn collects the common regressors
that are expected to shift the demand equation and it includes the constant. The parameters of
primary interest are b and ?. As in the classical system of supply and demand, the b in this
case can be interpreted as the (market-average) own-price elasticity. As we will explain below, ?
captures the competition intensity in the market. In spirit of Pinkse and Slade (2004), ?Wn can
be viewed as an approximation for the cross-price elasticities between stations. As in most of the
empirical IO studies, we rely on the supply-side instruments to address the endogeneity of (log)
prices, ln(pn). 9 Details of the instruments will be discussed in Section 3.5 below.10
9See e.g., Berry, Levinsohn, and Pakes (1995) and Nevo (2001). See also MacKay and Miller (2021) for a
more detailed discussion of identification strategies via supply-side instruments, demand-side instruments and the
covariance structure of a supply and demand system.
10The price equation is modeled in a reduced form fashion as ln(pn) = X1,n?1 + Z1,n?2 + vn, where X1,n
101
The deviation from a classical supply and demand system comes with the inclusion of spatial
lags Wnln(pn) and the spatial coefficients ?. As remarked before, the spatial weights matrix Wn
is specified with some measure of proximity between units. We experiment with six different
metrics that appear in the literature:11
1. A binary W1 = (w1,ij) with w1,ij = 1 is i and j are sharing a common boundary (explained
below);
2. A binary W2 = (w2,ij) with w2,ij = 1 is i and j are with 2-miles of travel distance and
wij,n = 0 otherwise;
3. A numerical W3 = (w3,ij) with w3,ij being the reciprocal of the travel distance between
station i and station j, within the adjacent census blocks of i. The implicit assumption is
that stations that are geographically close compete more vigorously;
4. A binary W4 = (w4,ij) with w4,ij = 1 if j is i?s nearest neighbor, where closeness is
measured by travel distance. Note that this is the most local measure and the relationship
need not to be symmetric. By construction, a station competes directly with only one rival;
5. A binary W5 = (w5,ij) with w5,ij = 1 if stations i and j are on the same street and zero
otherwise. This construction reflects the empirical findings that most competition occurs
along main streets and commuter routes;12
denotes the included instruments (with a column of constant) and Z1,n denotes the excluded instruments. Note the
supply schedule of individual firms are typically unobserved and not possible to specify. Hence, a supply equation
in tradition form ln(qsn) = b
sln(psn) + X1,n?1 + Z1,n?2 + vn is not always justifiable in oligopoly settings and
thus bs is not readily interpretable. Hence we abstract from modeling the supply-side in a structural way.
11For example, Pinkse and Slade (1998) considered six different metrics based on both Euclidean distance as well
as common street/boundary measures. The metrics considered in this application resembles those considered in their
paper. However, we consider the travel distance between stations rather instead of the Euclidean distance, with the
former better reflects the local traffic conditions and road structures.
12See, e.g., Houde (2012) and Pennerstorfer (2009), among others.
102
6. A numerical W6 = (w6,ij) based on combined measures of W3 and W5. Under this scheme,
w6,ij is the product of w3,ij and w5,ij , i.e., the reciprocal of the travel distance between
station i and j that are located on the same street with pairwise travel distance less than
(or equal to) 2-miles. The underlying assumption concerning competition is that stations
primarily compete with other stations located on the same street, and that the intensity of
competition diminishes with travel distance.
For the ease of presentation, we focus on the specification with W1,n in the main text. Addi-
tional results associated with other weight matrices are documented in the Chapter B.4. Figure
3.1 below illustrates the first construction for the market of Vancouver area. Each individual
station is marked by a red cross. The edges are segments that bi-secting the distance between
any two stations, conditional on there is no other stations in between. Therefore, two stations
sharing the same edges are treated as closest neighbors in the first specification of Wn. We color
each sub-market area with respect to the (log of) local population density, measured by counts of
residence per square kilometer. A polygon with darker color means an area of higher population
density.
In second specification of Wn, wij,n = 1 if the travel distance between station i and j is
smaller than 2-miles, and wij,n = 0 otherwise. In the third specification, we consider some
continuous measure and let elements in Wn to depend on the pair-wise travel distance between
stations located in the adjacent census blocks. The fourth specification is the most local one, as
we assume stations only compete with their closest neighbors. In other words, each station only
has one competitor under this construction. The fifth specification requires additional information
about the local road structure. In particular, we treat station i and j located along the same street
103
Figure 3.1: Market Area based on Common Boundaries
(up to 3 blocks away) as competitors and assign wij,n = 1. We let wij,n = 0 otherwise. The
last specification builds on W3 and W5. Specifically, we restrict competition to stations located
on the same street as in W5 but in addition allow such competition to decay with travel distance
between station i and j. As a common practice in the empirical literature, we normalize these
Wn by its maximum row sum when comes to estimation. We emphasize that all Wn?s need not
to be symmetric by construction.
Specification of the demand equation is in line and extended based on the network demand
equation specified in Pinkse and Slade (2004), where they analyze the effects of mergers on brand
competition and pricing in the UK brewing industry. In their specification, the own-price and the
104
cross-price elasticities are captured by a n ? n matrix Bn. The term Bnln(pn) in their model is
equivalent to bln(pn) + ?Wnln(pn) in above model (3.6). They estimated the cross-product de-
mand elasticity matrix semi-parametrically and utilized the information on the proximity between
brands based on product similarities (e.g., alcohol content, flavor, etc.). Our construction shares
the same spirit in that Wn can be viewed as being constructed with some measure of inverse travel
distance between stations. Note that this proximity measure is the key factor of differentiating
products in the retail gasoline market, given that gasoline is largely homogeneous in terms of con-
tent. The difference is that, we reparameterize the elasticity of cross-station substitution matrix
as ?Wn and thus ? can be viewed as an measure of competition intensity between neighboring
stations. Indeed, our construction is more restrictive than theirs as b is a single parameter and thus
captures only the average own-price elasticity of the entire market. Also, since Wn is pre-defined
based on some distance measure, it is likely to be less flexible than the semi-parametric construc-
tion considered in Pinkse and Slade (2004) and thus miss some of the substitution pattern. These
observations suggest potential interest to develop semi-parameter or non-parametric estimation
methods for this class of general network SEMs in future research.
3.4 Data
The data set consists of several components from different sources and we now describe
each in turn.
We collect retail prices of regular gasoline at station level from GasBuddy.com. Each gas
station is associated with a unique ID number that enables us to collect price information at a
set frequency with an algorithm. The gasoline prices are in Canadian cents. We collect daily
105
data for 151 retail stations that belong to several sub-regions in the Greater Vancouver area,
including Vancouver City and adjacent suburbs.13 The sample covers two periods: 09/12/2019
to 10/16/2019 and 03/11/2020 to 04/08/2020 . Approximately 91% of the stations appear in our
sample get one or more report on price information within a day. The rest 9% of the stations are
typically located on roads with less traffic and thus price information could only be updated every
a few days for some stations.14 The sales volume data is proprietary and purchased from Kalibrate
Ltd., a leading marketing and consulting firm.15 The observations are the (monthly) sales volume
of the 151 stations for the survey period of September 2019 and March 2020. Existing works
rely on survey data often rely on the price information in the survey. Since these surveys are
often conduct only at monthly frequencies, such price information is incomplete and may be
misleading. The issue could be mitigated if one works with a long panel span many months or
the distribution of prices among stations are unchanged in terms of ranking. To alleviate such
concerns, we compute for the average of the daily prices for stations in the sample matched for
the survey periods of the volume data. The final data set is a cross-sectional monthly price and
volume data for the 151 stations in the sample. Table 3.1 below provides some summary statistics
on the price and sales variables in the sample.
Since our price data is an average over the sampling period, one may worry about the effect
of price movements on consumers? decisions and thus the demand. One such example is the
Edgeworth cycle generate by price wars between gas stations in the local market. If consumers
13The complete list of regions include Vancouver City, Burnaby, New Westminster, North Vancouver, Port Co-
quitlam, Port Moody, and Richmond.
14Atkinson (2008) discussed sample selection issues with consumer-reported data (e.g., Gasbuddy). In general, he
finds that consumer reported data are in general reliable for answering questions that require daily and major brand
station prices. However, it is risky to use them to study issues concerning within-day price dynamics.
15Kalibrate acquired Kent Marketing Ltd. recently, which was a consulting firm specialized in Canadian petroleum
industry.
106
perceive that stations are actively undercutting their prices to match their competitors, they may
delay purchases and wait for the price to be lower. On the other hand, if consumers expect the
price war will end soon or prices would be increasing in near future, they may stock up some
inventory now while prices are low. In Figure 3.2 below, we plot the average station level retail
price in this market during the period of August 1st to November 30th, 2019 as well as the
period of February 1st to May 31st, 2020. The shaded area indicates the time periods of our
empirical exercise, i.e., the time period for which we collect the sales volume data. During the
two periods, although there are significant trends of prices, there is no significant pattern of the
Edgeworth cycle that has been widely documented in the literature during late 90s and early
2000.16 In Chapter B.2, we provide a more detailed discussion and additional result on margins.
As suggested in, e.g., Eckert (2003) and Noel (2007), greater presence of independent stations
should lead to more cycling activity and less sticky pricing, as they often initiate a undercutting
process and generate price movements in markets akin to an Edgeworth cycle. In light of this,
one possible explanation for the price pattern showed in Figure 3.2 is the declining share of
independent stations in Vancouver since 1990s. In the current sample, only 10% of the stations
are of minor brands or being independent.17 In addition, since the upward trend is reasonably
mild during the period, we assume that intertemporal effect on consumers? purchases is of minor
concern.
In constructing the instruments for prices discussed below, we also collect the rack prices
for Suncor and Shell that are sold in local distribution terminals.18 In the retail gasoline market of
16For relevant literature, see Eckert (2003), Noel (2007), Noel (2007), Noel (2008), Atkinson (2009), among
others.
17Table 3.1 documents the list of major brands that operate in Vancouver.
18Parkland is now the only operating refinery in this area, which supplies Chevron and Esso, along with several
other minor brands. Unfortunately, rack prices for Parkland is not available.
107
Vancouver, stations typically are supplied by major refineries under some type of contract.19 For
example, Petro-Canada stations are operated and supplied directly by Suncor. We assume most
independent stations purchase supplies on spot market.20
Table 3.1: Summary Statistics of Retail Prices, Sales Volume
Mean Std Dev Mean Std Dev
September 2019
Retail Price Sales
All 155.10 3.15 12.91 0.59
Shell 155.84 2.80 13.11 0.53
Suncor 155.38 2.23 12.92 0.41
Chevron 156.33 1.14 12.89 0.57
Husky 154.84 2.05 12.48 0.72
Rack Price
Suncor 90.35
Shell 90.47
March 2020
Retail Price Sales
All 97.30 3.68 12.49 0.73
Shell 96.42 2.81 12.70 0.52
Suncor 98.03 3.93 12.52 0.44
Chevron 97.05 2.68 12.39 0.94
Husky 96.97 4.16 12.26 0.88
Rack Price
Suncor 49.50
Shell 49.55
1 Prices are in Canadian cents; sales volume are in logged liters.
We collect information on station-characteristics such as availability of car wash, service
station, size of convenience store from both Gasbuddy.com and the data provided by Kent Mar-
keting Ltd. Table 3.2 reports the fraction of stations overall and of the four major brands in our
19Slade (1998) lists four different vertical arrangements between a supplier and a station displaying its brand of
gasoline in Canadian markets: (1) company owned and operated stations; (2) commissioned agent stations; (3)lessee
dealer stations and (4) dealer owned stations.
20See Houde (2012) for a similar assumption.
108
(a) August - November, 2019
(b) February - April, 2020
Figure 3.2: Dynamics of Average Retail Price
109
sample that offer each type of service. As suggested in the literature, these variables of station
characteristics could partially explain the price levels observed at individual stations. We would
control for car wash and service station in the regression as dummy variables and size of con-
venience store as a continuous one. These three variables are treated as the common regressors
in both the demand equation and the price equation. We abstract from controlling for brand-
fixed effects as they appear to be less important in the another set of preliminary results and the
relatively small sample we am working with.
Table 3.2: Summary Statistics of Station Characteristics
All Shell Suncor Chevron Husky
Character variable
C-store 90.07 90.91 87.93 92.86 92.86
Car Wash 12.58 22.73 20.69 0.00 14.29
Service Station 9.27 9.09 17.24 0.00 0.00
1 Fraction of stations with each type of facility; numbers in percentage.
For the variables that potentially shift consumer demand, we collect data of the number of
drivers (unit of measure is 1, 000 driver per census tract), median income of residents (in 1, 000
Canadian Dollars), fraction of long-distance commuters as well as a measure of transportation
mode for each census tract, from the 2016 Canada Census Database. We note that there are 126
census tracts in the sample and there are at most 4 gasoline stations within a single census tract.
Moreover, 71% of the census tract in the data set has only one station and 37% of the stations have
no neighboring station within the same tract. Therefore, regional variables at census tract level
may provide some additional ?quasi-station specific? variations in cross-section. In the data,
there are four level of commuting distance: (1) commute within census subdivision (CSD) of
residence; (2) commute to a different CSD within census division (CD) of residence; (3) commute
110
to a different CSD and CD of residence and (4) commute to a different province or territory, with
increasing distance of daily transportation. With the number of commuters of each type for each
census tract (or census subdivision (CSD)), we construct an index assigning weight of 1 to the
first type commuter, and 2 to the second type commuter, etc. Thus the index can be viewed as
a proxy of average daily commuting distance for an census tract. The presumption is that long-
distance commuters naturally demand more gasoline if they drive. A drawback of this argument
is that the commuters do not necessarily refuel their tanks near their place of residence, and thus
preferably a multi-address model should be considered, as showed in Houde (2012). In principle,
such additional information could be embodied in the network matrix Wn or a sequence of Wn?s
that represent different layers of networks. Unfortunately, due to data availability, we abstract
from this extended analysis. Finally, we also computed the fraction of commuters that drive to
work (?Travel Mode?), given that some of them would choose alternative mode of transportation
and thus demand less gasoline (e.g., take public transportation or just walk). Another limitation of
the data is that we do not observe the transportation mode taken by the long-distance commuters.
However, given that public transportation is usually available at some densely populated areas
and such areas accounts for only a small fraction of census tracts in the sample, we argue that
this last concern could be minor.
3.5 Instruments and Identification
In the model, we assume managers of gasoline stations know the unobservable (to econo-
metricians) station specific tastes and the unobservable cost factors before they choose the retail
prices, which is thus likely to be correlated with the unobservables. Instrumental variables are
111
used to address this endogeneity concern. Price is determined by marginal costs and markups,
and valid instruments may shift either component. In light of this, we consider two types of
supply-side instruments.
The IVs belong to the first type may be viewed as related to some measure of market power.
Specifically, we include number of competitor and the average size of competitor (measured by
the number of pumps), following Houde (2012). As discussed in, e.g., Pennerstorfer and Weiss
(2013), stations with fewer neighboring competitors tend to exhibit larger (local) market power
and hence higher ability to exercise higher markup. In addition, inspecting the sample suggests
that larger stations (measured by the number of pumps) tend to be vertically integrated and thus
may have stronger local market power.21 Moreover, as suggested in the 2019 market report by
Kent Marketing Ltd. (now part of Kalibrate), larger stations also tend to be newer and have better
amenities, which may also contribute to their local market power.22
In the context of retail gasoline market, station i?s direct rival(s) often compete with another
set of stations simultaneously. The sub-market served by station i may also (partially) overlap
with the sub-market served by its direct rivals. As such, rivals? characteristics (and potentially
through interaction with their prices) may affect station i?s demand through network interactions.
For example, high price set by a high quality rival (e.g., station with high brand loyalty) will have
a different effect on station i than a high rival price set by a low quality rival (e.g., an independent
station). The model we considered does not rule out this endogeneity and we need additional
instruments. The instruments we consider is the station/sub-market characteristics of the indirect
21The average number of pumps for Shell, Chevron, Husky and independent brands in the sample are 8.5, 8.8,
7.4, and 6.6, respectively.
22Report can be purchased at https://kalibrate.com/insights/report/data-intelligence/
2019-fuel-census/.
112
rivals of station i, i.e., the direct rivals of station i?s rivals that do not compete directly with station
i. This approach follows closely Fan (2013) in the context of newspaper market, in which both
prices and product characteristics are assumed to be endogenous. One main similarity between
the retail gasoline market and the newspaper market considered in Fan (2013) is that, firm A has a
(partially) overlapping sub-market with firm B, which in turn compete with its direct rivals in the
other overlapping (sub)-markets. The intuition for why the characteristics of indirect rivals can
be used as instruments is as follows. Consider three stations A, B and C, such that A compete
directly with B and B compete directly with C, but A does not compete directly with C. The
variation in station/sub-market characteristics of station B influence the demand for station B
and thus affect its prices. Because station A and B are competitors, B?s decision on prices affects
A?s decision. Station C?s characteristics would also affect B?s decision in an analogous fashion,
but since C and A are not compete directly, C?s characteristics would not affect A?s demand. In
summary, we assume that variation in C?s characteristics would shift B?s prices, in a way that
should not affect A?s demand.23 Specifically, the instruments we consider include the station
characteristics (dummies of car wash and service station) of A?s indirect rivals, as well as their
count and average sizes.
Table 3.3: Correlation of Station Characteristics in Neighboring Markets
Car Wash Service No. nb Avg. Size Nb
Correlation 0.252 0.311 0.591 0.046
Note that the availability of car wash and service station of station i are included instruments.
Table 3.3 reports the correlation between the included and the excluded instruments. Specifically,
we report the correlation between the availability of car wash and service station of station i?s
23In spatial/social network literature, this type of instruments is often referred to as neighbor?s neighbor?s charac-
teristics.
113
indirect rivals and those of station i, as well as the correlations between the count (and average
size) of station i?s indirect rivals and those of station i?s direct rivals. In general, the included
instruments are not highly correlated with the excluded instruments. Heuristically, these results
give us some confidence that the excluded instruments affect endogenous prices differently than
the included counterparts and thus affect the sales volume primarily through prices.
The second type of supply-side IVs can be viewed as cost-based ones, as they are related to
the rack prices at which independent stations could purchase inputs from local outlets operated
by major brands. This set of instruments include the interaction terms of presence of branded
stations with the rack price of those brands as well as an interaction term of brand and travel
distance to the corresponding distribution terminals.24 Following Houde (2012), we also exploit
the fact that there exists a small amount of cross-sectional dispersion in the posted rack prices
of the Shell and the Suncor outlets located in this area. We construct instrumental variables that
focus only on Shell and Suncor rack prices, assuming unbranded retails purchase gasoline from
the spot market from a supplier with lowest price. In particular, for each station i, we construct
an instrumental variable that interacts Shell?s rack price with a dummy variable equal to one if
another Shell station was located within the same neighborhood. Note that the definition of local
market depends on the specification of spatial matrices Wn or specifically, wij = 1 if station i
and j are located within the same neighborhood. A similar instrumental variable is constructed
for Petro-Canada stations, which are operated by Suncor. These two sets of instruments capture
two sources of variation that are correlated with price: the presence of Shell and other vertically
integrated stations in the same locality, and the dispersion of their rack prices. Finally, the inter-
action term of brand and travel distance to the corresponding refinery exploit the cross-sectional
24Shell still operates a local terminal for distribution of petroleum products.
114
variation in travel distance from refineries to retail stations. If a station belongs to a brand, it may
get favorable prices of inputs than the independent stations that are assumed to purchase inputs
from spot market. The source of lower costs could either be pre-planned and optimized route of
transportation or other types of internal discounts that are unobserved to econometricians.
Table 3.4 documents the first-stage OLS results and F-statistics for the main specification
for all six constructions of Wn?s. In the main specification, we includes all of the supply-side
instruments described above, including characteristics of station i?s direct rivals as well as the
direct rivals of i?s direct rivals net of i?s direct rivals (i.e., station i?s indirect rivals) and cost-
based IVs discussed above. In the Chapter B.3, we also report the results for specification (2) that
focuses on the characteristics of i?s direct rivals and the cost-based IVs as well as Specification
(3) that focuses on the characteristics of i?s indirect rivals, along with results based on alternative
specifications of W ?s.25n In general, we see that selected instruments are reasonably strong in
light of the F-statistics.26 27
3.6 Estimation Results and Impact Measures
Table 3.5 below documents the main estimation results based on the first specification of
the spatial network matrix Wn discussed above. We compare the estimates with different sets of
supply-side instruments, under the columns of specifications (1) - (3). Table B.6 - Table B.10 in
the Chapter B.4 document the estimation results based on the rest of the specifications of the
spatial networks listed above, i.e., W2 to W6. More detailed definitions of variables are reported
25The same set of specifications are adopted in reporting the estimation results in the next section.
26Note that in the settings with a single endogenous variable the Kleibergen and Paap (2006) Wald statistic is
equivalent to a heteroskedasticity-robust F-statistic. The explicit expression is given in the Chapter B.3.
27To the best of our knowledge, tests for over-identifying restrictions (e.g., Hansen?s J-test) have not been formally
developed for spatial SEMs at this moment.
115
Table 3.4: First-stage OLS Regression and tests for IV power
W1 W2 W3 W4 W5 W6
No. Nb -0.041 -0.081?? -0.020?? -0.041 -0.041
(0.039) (0.020) (0.004) (0.037) (0.037)
Avg. Nb Size 0.030 -0.034?? 0.012 0.065?? -0.013 -0.013
(0.043) (0.005) (0.011) (0.020) (0.027) (0.027)
No. Indirect Nb -0.142?? -0.050?? 0.008? 0.010 0.010
(0.017) (0.010) (0.004) (0.023) (0.023)
Avg. Indirect Nb Size 0.104 -0.197?? -0.144 0.056? 0.151?? 0.151??
(0.075) (0.054) (0.099) (0.020) (0.024) (0.024)
Indirect Nb. Car Wash -0.066?? -0.104?? -0.033?? 0.034?? -0.016 -0.016
(0.022) (0.012) (0.004) (0.008) (0.011) (0.011)
Indirect Nb. Service 0.024? 0.035?? -0.052?? 0.026?? -0.203?? -0.203??
(0.009) (0.005) (0.011) (0.003) (0.033) (0.033)
Suncor X Rack 0.163?? 0.137?? 0.034? 0.092?? 0.189?? 0.189??
(0.052) (0.037) (0.013) (0.027) (0.030) (0.030)
Shell X Rack 0.055? 0.102?? 0.104 0.157?? -0.028 -0.028
(0.024) (0.026) (0.062) (0.033) (0.023) (0.023)
Dist Refinery 0.012?? 0.044 -0.315 0.145?? 0.141?? 0.141??
(0.002) (0.200) (0.206) (0.018) (0.017) (0.017)
Period Dummy Yes Yes Yes Yes Yes Yes
Weak IV (F-stat) 36.49 59.81 65.24 46.54 27.80 27.80
?2 crit-val (5%) 20.53 20.53 20.53 19.86 20.53 20.53
Sample size 302 302 302 302 302 302
1 Hetero-robust SEs are in parenthesis; * (p < 0.05), ** (p < 0.01).
2 We report effective fist-stage F-statistics based on Olea and Pflueger (2013) and Stock-Yogo weak
ID test critical values.
in Table B.11. In addition, we test for the presence of the network spillovers in the data, based
on the test proposed in Liu and Prucha (2018). The details of this test statistics and relevant
results are reported in Table B.13. In general, we find strong evidence of the presence of spatial
interdependence in prices in the demand equation.
3.6.1 Main Estimation Results
With specification (1), the OLS estimate of the demand elasticity is bias downward in terms
of magnitude and the spatial autoregressive parameter ?1 (on term Wln(price)) in the demand
116
system is not significant. The IV estimates of demand elasticity and spatial auto-regressive pa-
rameters are close in magnitude and are in general significant. This gives additional confidence
to the estimates as both ??LQ?GS2SLS and ??LQ?GSLIV E are expected to yield similar results at least
when IVs are reasonably strong. The demand elasticity is estimated to be ranging from ?5 to
?6, depending on the set of IVs used in the estimation. These estimates are lower than those
reported by Houde (2012) in magnitude. Recall that Houde (2012) reports an average store-level
price elasticity of demand between ?22 and ?15 for the multi-address model, depending on IV
choices. This could relate to two differences in the model specification. First, towards construct-
ing the network matrix W , we do not account explicitly for the possibility of purchasing gasoline
along the route of daily commute or shopping trips. In Houde (2012)?s terminology, the current
model should be viewed as a variant of ?single-address? model.28 Although this specification is
popular in the literature, it likely to bias the demand elasticity downwards (in magnitude) since
it only allows for substitution among closest rivals and exclude those located further away but
along a connected street or highway of high traffic flows. For the single-address model, Houde
(2012) reports estimates ranging from ?15 to ?6, and thus closer to our estimates. Second,
Houde (2012) adopts demand specification of random coefficient logit, which allows for a more
flexible substitution pattern than the log-linear form we assumed.29
Recall that in specification (3), we focused on the characteristics of indirect rivals when in-
strument for prices at own stations. One question is that whether we have included the proper
set of indirect rivals when constructing the instruments. With W1,n, such a concern could be
28In essence, the ?single-address? model assumes a consumer? trip to gas stations always starts from his/her home.
This construction omits the possibility of re-routing to a gas station located near daily commute route or shopping
routes.
29To the best of our knowledge, accommodating flexible forms of demand schedule in a spatial model has not
been well explored in the literature.
117
reasonable, observing that the demand elasticity in specification (3) is somewhat smaller than
that obtained under specification (2), in which we focus on the characteristics of direct rivals.
One possible explanation is that the number of indirect rivals? of own station under W1,n is larger
than the number of stations that indirectly affect station i?s demand in reality. In our sample,
the average number of indirect rivals of each station is 14. Hence, the variations of the average
characteristics of these indirect rivals may partially masked the variations that fits our story about
using the characteristics of indirect rivals. In the context of current model, such ?over-couting?
could be driven by the fact that W1,n partially ignores the traffic and road conditions in local
markets, e.g., i and j may share a common boundary by our construction but they do not com-
pete with each other since there is a highway/river between them. Obviously, such hypothesis
could be tested by varying construction of Wn?s, which controls the scope of direct and indirect
neighbors of station i. We notice that under alternative constructions of Wn, above observations
could be reversed. For example, the scope of indirect competition under W3,n is more local than
W1,n, in the sense that it takes into account the travel distance between stations and thus put
more ?weights? on the close neighbors in modeling the competition. Also, W4,n is constructed
based on the closest neighbor only and thus the scope of competition is much smaller, with the
number of direct competitors being 1. Under these constructions, the aforementioned concern of
over-counting indirect rivals is much alleviated. The results (see Table B.7 and B.8) show that de-
mand estimates using characteristics of indirect rivals (i.e., specification (3)) are larger than those
obtained by using only characteristics of direct rivals (i.e., specification (2)). Above discussion
calls for a systematic way of identifying sub-markets. In the context of retail gasoline markets,
recent contributions include, Perdiguero and Borrell (2012), Bantle et al. (2018), and Ulrick et al.
(2020). However, to our knowledge, an universally accepted approach, especially in the literature
118
of spatial econometrics, has not been developed.
As remarked above, parameter b can be interpreted as the market average own-price elastic-
ity. In the spatial literature, ? typically captures the intensity of spillovers among cross-sectional
units. In model (3.6), the matrix ?Wn is equivalent to the cross-price elasticity matrix in Pinkse
and Slade (2004). Furthermore, recall that we normalize Wn by its maximum row sum, wij,n in
general is not a binary entry. For example, if the maximum row sum of Wn is 10, then wij,n = 0.1
of i and j are neighbors by the selected metric (i.e., wij,n ?= 0). Hence, ? can then be viewed
as a market weighted-average measure of cross-price elasticity between stations. Also note that
wi,.,npn gives the weighted average prices of stations that are neighboring to station i. For the
W1,n, its ij-th element wij,n = 0.083. Hence, an estimate of ??1 = 3.72 (e.g., ??LQ?GS2SLS in
specification (1)) implies that, on average, 1% price increase in a neighboring station j would
imply, on average, a 0.31% (3.72%? 0.083) increase in sales to station i.
3.6.2 Impact Measures
Interpreting the estimates on non-price regressors are less direct in the presence of network
structure Wn?s than the regressions without network interactions at individual level. Specifically,
if the change in k-th exogenous variable of station i has an effect via spatial correlation on sales
at the other stores and they, in turn, feedback to store i, the corresponding coefficient, say ?k,
does not denote the total effect of a unit change in xk on the sales of station i. The final effect is
sometimes referred to as the Average Total Direct Impact, following LeSage and Pace (2009)?s
terms. The direct impact measure o?n the (log) sales of station i associated with its k-th exogenous
variable in X is computed as 1 n ?Eln(qi)n i=1 . Table 3.6 reports the estimated Average Totaln ?xi,k
119
Direct Impact for variables of station/neighborhood characteristics that appear in the demand
equation, for both W1,n and W2,n. In light of the formulae, it should be clear that the magnitude
of impact measures depend on the specific construction of Wn?s. For interpretation, we focus
on the results obtained with W1,n as an example. Under the W1,n, adding car wash service at
station i would decrease the demand at station i by 3.8% ? 4.0% (depending on the estimator)
and adding a service station is associated with 1.38% ? 1.51% lower demand. In additional,
larger convenient stores seems to be correlated with lower sales volume, with about 4.3% lower
in sales for 100-square meter (about 1, 000 square feet) space increase of C-store. The direct
impact of 1000 more drivers in the census-tract that station i locates on is associated 3.4% - 4.4%
increase in sales volume. In contrast, the direct impact of 1, 000 more Canadian dollar income in
the census-tract that station i locates on implies 0.6%?3.2% decrease in sales volume for station
i. One possible explanation for this seemingly contradictory result is that the price charged by
stations located in neighborhoods of higher income tend to be higher and hence some of the
residents tend to opt for stations located along their commuter route for lower prices.30
In addition, one can computed the impact on sales at station i if xk of j-th (j ?= i) station
changed and propagates through the network. Following LeSage and Pace (2009), we refer to this
as the Average Total Impact from an Observation. In our context, one may interpret this measure
as the average impact of the a (unit) change in a station characteristic at station j to demand at
station i. The impact on th?e (log)?sales of station i associated with the k-th exogenous variable of
station j is computed as 1 n n ?Eln(qi)j=1 i=1 . Under W1,n, when station i competes with stationn ?xj,k
j that offers car wash service, it would lower the demand at station i by about 10%?13%, taking
into account network propagation. Similarly, addition a service station at j is associated with
30See Houde (2012) for a similar argument.
120
10% ? 15% deduction demand at station i. In additional, demand at station i is 3.8% ? 4.0%
lower if it competes with a station with 100 square meter larger C-store. Sales volume at station i
is 3.4% - 4.4% higher if the census tract of its competitor has 1000 more drivers, after accounting
for the propagation through the network. In contrast, the indirect impact of 1, 000 more Canadian
dollar income in the census-tract that station i?s competitors located on implies 0.6% ? 3.2%
decrease in sales volume for station i. As remarked, these estimated impact measures depend
crucially on the construction of network matrix Wn.
121
122
Table 3.5: Estimation Results with W based on common boundaries
(1) (2) (3)
OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E
ln(price) -3.807?? -5.945? -5.952? -6.063? -6.036? -5.082? -5.012?
(1.242) (2.432) (2.382) (2.714) (2.689) (2.768) (2.429)
W*ln(price) 2.090 3.723? 3.734? 4.242 4.224? 3.190? 3.210?
(1.361) (1.811) (1.725) (2.179) (1.847) (1.437) (1.429)
Car wash 0.038 0.026 0.048 0.028 0.075 0.047 0.052
(0.111) (0.126) (0.157) (0.127) (0.150) (0.116) (0.139)
Service Station 0.051 0.077? 0.081? 0.082? 0.089? 0.060 0.057
(0.037) (0.038) (0.036) (0.038) (0.037) (0.038) (0.036)
C-store Size 0.050 0.046 0.047 0.047 0.048 0.050 0.054
(0.035) (0.035) (0.034) (0.036) (0.035) (0.035) (0.033)
No. Drivers 0.025? 0.044? 0.034? 0.043? 0.026 0.048? 0.050??
(0.012) (0.015) (0.014) (0.015) (0.015) (0.015) (0.014)
Med Income 0.005 -0.032 -0.006 -0.035 0.000 -0.095 -0.025
(0.233) (0.246) (0.281) (0.247) (0.287) (0.246) (0.276)
Commute Dist. -1.877? -1.769 -1.793 -1.762 -1.827 -1.897 -1.893
(0.902) (1.018) (1.455) (1.028) (1.496) (1.020) (1.430)
Travel Mode -0.016? -0.017? -0.019? -0.016? -0.014? -0.020? -0.022?
(0.006) (0.007) (0.010) (0.007) (0.011) (0.007) (0.010)
?1 0.261?? 0.127 0.445 0.130 0.444 0.127 0.445
(0.074) (5.749) (2.351) (4.041) (2.247) (4.258) (2.517)
Period Dummy Yes Yes Yes
Sample size 302 302 302
R-square 0.15
1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), **
(p < 0.01). wij = 0.13 if i and j are neighbors
Table 3.6: Impact Measures
??LQ?GS2SLS ??LQ?GSLIV E
Direct Indirect Direct Indirect
W1
Demand Side Var.
Car wash -0.0401 -0.0988 -0.0378 -0.1260
Service Station -0.0138 -0.1048 -0.0151 -0.1518
C-store Size -0.0433 -0.0392 -0.0440 -0.0379
No. Drivers 0.0439 0.0439 0.0340 0.0340
Med Income -0.0315 -0.0315 -0.0056 -0.0056
W2
Demand Side Var.
Car wash -0.0926 -0.0924 -0.0671 -0.0669
Service Station 0.0132 0.0133 -0.0261 -0.0261
C-store Size -0.0975 -0.0973 -0.1548 -0.1546
No. Drivers 0.0532 0.0532 0.0522 0.0522
Med Income -0.1592 -0.1592 -0.1235 -0.1235
1 W1:network based on closest neighbors ;W2: network based on neighbors with 2-
miles radius.
2 Estimates are based on regression specification (1).
3 Direct refers to the the ?Average Total Direct Impact?; Indirect refers to the ?Average
Total Impact from an Observation?.
3.7 Concluding Remarks
In this application, we estimate a demand equation with network dependence for (a sub
region of) the retail gasoline market in Vancouver, Canada. The data set includes observations
on both the sales volume and the station level prices, along with station and census tract level
characteristics. We find that the own-price elasticity is between ?12 and ?4, depending on the
set of IVs and the specific construction of the network matrices Wn, which governs the degree
of competition to some extent. The cross-station price elasticity is in general between 0.6 ?
6, depending on the network specification. These results are largely consistent with economic
theory. However, Houde (2012)?s multi-address model reports own-price demand elasticity to be
123
within the range of ?22 to ?15, which is much more elastic than our estimates. If we ignore the
possible unobserved city level or market level fixed effects, one possible explanation is that we
build networks based on different sources of information. Houde (2012) exploits heavily on the
local traffic pattern and road structure and thus allow a station to compete with another located
far away if they are located on a main commute route or a segment of highway. Our approach of
constructing the spatial network matrices lies in line with the majority of existing literature. 31
Our proximity measures exploit the information of stations? neighborhoods or the pair-wise travel
distance between close-by stations. Thus, we implicitly assume that the competition is largely
local. Allowing for a more flexible network structure in the current econometric framework is
one possible direction for future works. We also computed the direct and indirect measures
associated with the station and neighborhood characteristics. In general, the number of drivers in
local neighborhood and the portion of drive to work residents are important factors for demand
in retail gasoline. Finally, we do find that the estimation results are heavily influenced by the
construction of network matrices, which is a well documented phenomenon in empirical spatial
literature.
31See, e.g., Pennerstorfer (2009), Pennerstorfer and Weiss (2013), Pinkse et al. (2002), among others.
124
Appendix A: Appendix to Chapter 2
A.1 Appendix: Example expression of EYn
For the ease of presentation, we drop the subscripts n on matrices when appropriate and
subscript 0 on parameters in below. Consider the following SE-SARAR model consists of two
equations, i.e., G = 2:
[ ]
y1 =b21y2 + [?11,1W1 + ?11,2W2] y1 + c11x1 + c21x2 + ?1,
y2 =b12y1 + ?22,1W1 + ?22,2W2 y2 + c32x3 + c42x4 + ?2,
?? ? ? ?? ?11,1W1 + ?11,2W2 b21In ?? ?? c11In c21In 0 0 ?and thus ? ?B = ?? ??, C? = ?? ??.
b12In ?22,1W1 + ?22,2W2 0 0 c32In c42In
We consider an approximation to Ey = [Ey?
?
1,Ey
?
2]
? up to the second order: 2 (B?)sC?s=0 x,
125
where x = vec(X) and X = [x1, x2, x3, x4]. In particular, note that
?? ?? c B?11 11 c B?? ? 21 11 b21c32In b21c42In ??B C = ?? ?? ,
? b12c11In b12c21In c32B?22 c B?42 22
?? ( ) ( )
?
2 2
c11 b21b
?
12In +B11 c21 b21b12In +B
? ?
11 b21c32 (B11 +B
? ? ?
? ? 22
) b21c42 (B ?
2 11
+B22) ?
(B ) C = ?? ( ) ( ) ? ,
b c (B? +B?
2 2 ?
12 11 11 22) b
?
12c21 (B11 +B
? ) c ? ?22 32 b21b12In +B22 c42 b21b12In +B22
with
B?11 = ?11,1W1 + ?11,2W2, (A.1)
B?22 = ?22,1W1 + ?22,2W2, (A.2)
?2B = ?2 W 2 + ?2 211 11,1 1 11,2W2 + ?11,1?11,2(W1W2 +W2W1), (A.3)
2
B?22 = ?
2 2 2 2
22,1W1 + ?22,2W2 + ?22,1?22,2(W1W2 +W2W1).
Consequently
? ?
?2 ?
? ? ??? D11 D12 D13 D14 ??D = (B )sC = ?? , (A.4)
s=0 D21 D22 D23 D24
126
with
??? ?2 sD11 = B? ?11 + b12b21In c11
( s=0 )
=?In + ?11,1W1 + ? ?W + ?2 2 2 211,2 2 11,1W1 + ?11,2W2 + ?11,1?11,2(W1W2 +W2W1) c11??2 sD12 = B? ?11 + b12b21In c21
( s=0 )
= In + ?11,1W1 + ?11,2W + ?
2 W 22 ( 11,1 1 + ?
2 2
11,2W2 + ?11,1?11,2(W1W2 +W) 2W1) c21
D13 =(In +B11 +B22) b21c32 = (In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2) b21c32
D14 =(In +B11 +B22) b21c42 = (In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2) b21c42
D21 =(In +B11 +B22) b12c11 = (In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2) b12c11
D22 =?(In +B11 +B22) b12c?21 = In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2 b12c21?2
s
D23 =? B?11 + b ?12b21In c32
( s=0 )
=?In + ?22,1W1 + ? 2 2 2 222,?2W2 + ?22,1W1 + ?22,2W2 + ?22,1?22,2(W1W2 +W2W1) c32??2 ?sD ?24 = B11 + b12b21In c42
( s=0 )
= I 2 2 2 2n + ?22,1W1 + ?22,2W2 + ?22,1W1 + ?22,2W2 + ?22,1?22,2(W1W2 +W2W1) c42.
Given above results and recall that x = vec(X) with X = [x1, x2, x3, x4], it follows that the
second order approximation to Ey = [Ey? ? ?1,Ey2] can be expressed as
?
? ?? (? )
?
2 ? 2 ?s? s=0B11 + b12b21In (c11x1 + c
? ?
? ? 21
x2) +((?In +B11 +B22) (b21c)32x3 + b21c42x4) ??(B )sC x = ?? .
s=0 ? ? s(In +B11 +B22) (b c x + b c x ) +
2
12 11 1 12 21 2 s=0B
?
22 + b12b21In (c32x3 + c42x4)
127
With expressions in (A.4) and the expanded terms, we thus can express the approximated Ey1
and Ey?2 as?? ?2 sEy1 ? B?11 + b12b21I ?n (c11x1 + c21x2) + (I ? ?n +B11 +B22) (b21c32x3 + b21c42x4)
[ s=0 ]
= (1 + b12b21)In +[?11,1W + ?
2 2 2 2
1 11,2W2 + ?11,1W1 + ?11,2W2 +]?11,1?11,2(W1W2 +W2W1)
(c11x1 + c21x2) + In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2 (b21c32x3 + b21c42x4)
=c11(1 +( b12b21)x1 + c21(1 +) b12b21)x2(+ b21c32x3 + b21c42)x4 (A.5)
+ c11 ?1[1,1W1 + ?11,2W2 x1 + c21 ?11,1W1 +]?11,2W2 x2
+ b21c32 [(?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2]x3
+ b21c[42 (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2 x4 ]
+ c11 [?
2 W 211,1 1 + ?11,1?11,2(W1W2 +W W ) + ?
2 W 22 1 11,2 2 ]x1
+ c ?2 221 11,1W1 + ?
2 2
11,1?11,2(W1W2 +W2W1) + ?11,2W2 x2,
128
and
?? ?2
Ey2 ?
s
(In +B
?
11 +B
?
22) (b12c ? ? ?11x1 + b12c21x2) + B22 + b12b21In (c32x3 + c42x4)
[ ]s=0 [
= In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2 (b12c11x1 + b12c21x2) + (1 +] b12b21)In
+?22,1W1 + ?22,2W2 + ?
2
22,1W
2 + ?2 21 22,2W2 + ?22,1?22,2(W1W2 +W2W1) (c32x3 + c42x4)
=b12c11x1 +[ b12c21x2 + c32(1 + b12b21)x3 + c42(1 +] b12b21)x4 (A.6)
+ b12c11 [(?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2]x1
+ b12c(21 (?11,1 + ?22,1)W)1 + (?11,2 (+ ?22,2)W2 x2 )
+ c32 [?22,1W1 + ?22,2W2 x3 + c42 ?22,1W1 + ?22,2W2 ]x4
+ c 2 2 2 232 [?22,1W1 + ?22,1?22,2(W1W2 +W2W1) + ?22,2W2 ]x3
+ c 2 2 2 242 ?22,1W1 + ?22,1?22,2(W1W2 +W2W1) + ?22,2W2 x4.
In light of (A.5) and (A.6), we see that the second order approximation for Ey1 and Ey2 are
sums over terms of the generic form
?(?)W s1W s2l x or equivalently W
s1W s2x ?(?), (A.7)
1 l2 k l1 l2 k
with ?(?) denotes a (scalar) function of structural parameters, s1, s2 = 1, 2, l1, l2 = 1, 2 and
k = 1, . . . , 4. Recall that X = [x1, x2, x3, x4], the second order approximation for Ey1 in (A.5)
129
can be written as
? ?
Ey 0 01 ? W1W2X?1,(1,0),(2,0) + W 1W 01 2X?1,(1,1),(2,0)
l1=1,l2?=2 l1=1,l2?=2
+ W 01W
1
2X?
2 0
1,(1,0),(2,1) + W1W2X?1,(1,2),(2,0)
l1=?1,l2=2 l1=?1,l2=2
+ W 0 21W2X?
1 1
1,(1,0),(2,2) + W1W2X?1,(1,1),(2,1)
l1=?1,l2=2 l1=1,l2=2
+ W 1W 12 1X?1,(2,1),(1,1)
?l1=?2,l2=?1
= W s1 s2? ?? ? l W1 l X?2 1,(l1,s1),(l2,s2), (A.8)l1,l2 s1=0 s2=0
s1+s2?2
with
[ ]?
?1,(1,0),(2,0) = [(b12b21 + 1)c11, (b12b21 + 1)c21, b21c32, b21c42 , ]?
?1,(1,1),(2,0) = [?11,1c11, ?11,1c21, b21c32(?11,1 + ?22,1), b21c42(?11,1 + ?22,1)] ,?
?1,(1,0),(2,1) = [?11,2c11, ?11,2c21, b21c]32(?11,2 + ?22,2), b21c42(?11,2 + ?22,2) ,?
? 2 21,(1,2),(2,0) = [?11,1c11, ?11,1c21, 0, 0] ,?
? 2 21,(1,0),(2,2) = [?11,2c11, ?11,2c21, 0, 0 , ]?
?1,(1,1),(2,1) = [?11,1?11,2c11, ?11,1?11,2c21, 0, 0]?
?1,(2,1),(1,1) = ?11,1?11,2c11, ?11,1?11,2c21, 0, 0 ,
and other ?1,(l1,s1),(l2,s2)?s being restricted to zeros. One can check that, in above, the generic
form of a summand is
?4
W s1W s2 s1 s2l1 l X?2 g,(l1,s1),(l2,s1) = Wl Wl xk?1 2 g,(l1,s1),(l2,s1),k,
k=1
130
where ?g,(l1,s1),(l2,s1),k denotes the k-th element in ?g,(l1,s1),(l2,s1), s1, s2 denotes the power of each
Wp and index g corresponds to yg that we are approximating for. We note that the summands
W s1 s2l W1 l x2 k?g,(l1,s1),(l2,s1),k appear in the right hand size of above equation conforms with the
generic form given in (A.7). We also note that, above expression allow for both W1W2xk?g,(1,1),(2,1),k
and W2W1xk?g,(2,1),(1,1),k as summands that appear in the second order approximations.
Analogously, the second order approximation to Ey2 can then be expressed as
? ?
Ey2 ? W 0 01W2X? 1 02,(1,0),(2,0) + W1W2X?2,(1,1),(2,0)
l1=1,l2?=2 l1=1,l2?=2
+ W 0W 1X? + W 2W 01 2 2,(1,0),(2,1) 1 2X?2,(1,2),(2,0)
l1=?1,l2=2 l1=?1,l2=2
+ W 0 2 1 11W2X?2,(1,0),(2,2) + W1W2X?2,(1,1),(2,1)
l1=?1,l2=2 l1=1,l2=2
+ W 1 12W1X?2,(2,1),(1,1)
?l1=?2,l2=?1
= W s1 s2? ?? ? l W1 l X?2 2,(l1,s1),(l2,s2), (A.9)l1,l2 s1=0 s2=0
s1+s2?2
131
with
[ ]?
?2,(1,0),(2,0) = [b12c11, b12c21, (b12b21 + 1)c32, (b12b21 + 1)c42 , ]?
?2,(1,1),(2,0) = [b12c11(?11,1 + ?22,1), b12c21(?11,1 + ?22,1), ?22,1c32, ?22,1c42] ,?
?2,(1,0),(2,1) = [b12c11(?11,2 + ?22,2),]b12c21(?11,2 + ?22,2), ?22,2c32, ?22,2c42 ,?
? 2 22,(1,2),(2,0) = [0, 0, ?22,1c32, ?22,1c42] ,?
? 2 22,(1,0),(2,2) = [0, 0, ?22,2c32, ?22,2c42 , ]?
?2,(1,1),(2,1) = [0, 0, ?22,1?22,2c32, ?22,1?22,2c42]?
?2,(2,1),(1,1) = 0, 0, ?22,1?22,2c32, ?22,1?22,2c42 ,
and other ?2,(l1,s1),(l2,s2)?s being restricted to zeros.
Combing (A.8) and (A.9), the second order approximation to EY is then a special case of
?? ?
EY ? W s1l W
s2
l X?1 2 (l1,s1),(l2,s2), (A.10)
l1,l2 ?s1=?0?s2=?0
s1+s2?2
with ?(l1,s1),(l2,s2) = [?1,(l1,s1),(l2,s2),?2,(l1,s1),(l2,s2)] and some of the ?(l1,s1),(l2,s2)?s are restricted
to zeros. Note that above expression conforms with and is a special case of (2.31) as stated in
Section 2.3.3.
Since we can cover the cases where l1 = 1, l2 = 2 and l1 = 2, l2 = 1 with choices of s1 and
s2, we can write (A.10) more explicitly by
? ? ? ?
EY ? s1 s2 s1 s2n ? ?? ?W1 W2 X?(1,s1),(2,s2) +s1=0 s2=0 ?s1=?0?s2=?W2 W1 X?(2,s1),(1,s2).0
s1+s2?2 s1+s2?2
132
A.2 Proofs of Chapter 2
A.2.1 Preliminary Results
In addition to the notations introduced in the main text, we introduce the following two types
of matrices that are needed in deriving the scores of the log-likelihood functions.
Commutation Matrix: For any matrix A of size n ? m, the commutation matrix Kn?m is
defined by vec(A?n) = Kn?mvec(An). It is readily checked that
? ?
?
???? Im ? i ?1,n ??
K = ??? . ?n?m .? . ????
Im ? i?n,n
observing that A? = [a? , . . . , a? ] and (I ? i? ?n 1.,n n.,n m j,n)vec(An) = aj.,n. For simplicity, we would
refer this Kn?m as the commutation matrix on An.
Stacked Blocked Matrices: Let An = (Agk,n) be a nG ? nK matrix with blocks Agk,n
of dimension n ? n, and let Bn = [B? , . . . , B?1,n H,n] be an nH ? n matrix with blocks Bh,n of
dimension n? n, then
vecn(An, Bn) = [tr(A11,nB1,n), . . . , tr(A11,nBH,n), . . . , tr(AG1,nB1,n), . . . , tr(AG1,nBH,n()A, ..1. .1,)
tr(A ?1K,nB1,n), . . . , tr(A1K,nBH,n), . . . , tr(AGK,nB1,n), . . . , tr(AGK,nBH,n)]
133
denotes the GKH ? 1 vector composed of the traces of the n? n blocks of the matrix
?? ???? A11,nB1,n ... A1K,nB1,n ???? ???? .. . . .. ??? . . .
???
? ?
???
A11,nBH,n ... A1K,nBH,n ??
..
? .
. . . ...
? ?
?
? ?
??
? ?? A ?G1,nB1,n ... AGK,nB1,n? ?? ?? .. . . . ..? . . ???
?
?
AG1,nBH,n ... AGK,nBH,n
As a special case, let Bn = In and thus
vecn(An, In) = vecn(An) = [tr(A11,n), . . . , tr(AG1,n), . . . , tr(A1K,n), . . . , tr(AGK,n)]
?
denotes the GK ? 1 vector composed of the trace of the n ? n blocks. Note that for n = 1
we have Agk,n = agk, and vec1(An) = vec(An) where vec(.) denotes the standard operator of
vectorization. We also note that vec (A?1n n , Bn) with A
?1
n = (A
gk
n ) is covered as a special case of
the above definition, with Agk,n replaced by Agkn .
To save on notations, we will suppress the subscript n when not necessary.
We used the following lemmata in deriving the scores of the log-likelihood function.
Lemma 2. Let A be some matrix whose elements depend on some vector ? = [?1, ..., ? ?m] .
?
Assume A to be nonsingular, then ?ln|A| = vec(A?1 )? ?vec(A) .
?? ??
Proof
134
By Corollary 29 of Dhrymes and Guerard (1978), ?|A| ?= |A|vec(A?1 )? ?vec(A) . Further note
?? ??
that ?ln|A| = 1 ?|A|| | , the statement of the Lemma then follows by combining these two results.?? A ??
Lemma 3. Let A be a matrix of size p? q then
?vec(A? ? In) [ ]
= Ip ? (Kn?q ? In)(Iq ? vec(In)) Kp?q.
?vec(A)
Proof
By the formula on page 55 of Magnus and Neudecker (2019) 1, we can write
vec(A? ? In) = [Ip ? (K ?n?q ? In)(Iq ? vec(In))]vec(A ) (A.12)
= [Ip ? (Kn?q ? In)(Iq ? vec(In))]Kp?qvec(A).
It then follows that
?vec(A? ? In) [ ]
= Ip ? (Kn?q ? In)(Iq ? vec(In)) Kp?q
?vec(A)
as claimed.
Lemma 4. With ? being the column vector consists of nonzero upper diagonal elements of
??1(?), we have
?u?(?)R?(?)(??1(?)? In)R(?)u(?)
= vec(E(?)?E(?))?L?,
??
1The formulae states: let A be p ? q and B be m ? n. We then have vec(A ? B) = (Iq ? T )vec(A) =
(H ? Im)vec(B), where T = (Kn?p ? Im)(Ip ? vec(B)), H = (Iq ?Kn?p)(vec(A)? In).
135
where E(?) = vec(?(?)) with ?(?) = R(?)u(?).
Proof
We would utilize the following two propositions in Dhrymes and Guerard (1978):
? Proposition 89: Let A1, A2, A3 be suitably dimensioned matrices. Then tr(A1A2A3) =
vec(A? ?1) (A
?
3 ? I)vec(A2).
? Proposition 98: Let A be m? n, and X be n?m; then ?tr(AX) = A?. If X is a function of
?X
the elements of the vector ?, then ?tr(AX) = ?tr(AX) ?vec(X) = vec(A?)? ?vec(X) .
?? ?vec(X) ?? ??
Applying Proposition 89 in Dhrymes and Guerard (1978), we then can write
u?(?)R?(?)(??1(?)? In)R(?)u(?)
=vec(E(?))?(??1(?)? In)vec(E(?))
=tr(E(?)?E(?)??1(?))
We next apply Proposition 98 in Dhrymes and Guerard (1978) to obtain
?tr(E(?)?E(?)??1(?)) ?tr(E(?)?E(?)??1(?)) ?vec(??1(?))
=
?? ?vec(??1(?)) ??
= vec(E(?)?E(?))?L?
?1
where by definition L = ?vec(? (?))? . 2 The statement of the Lemma now follows.??
Lemma 5. Let A = (Aij) be a nG ? nG matrix with blocks Aij of dimension n ? n and let
2Note that the selector matrix w.r.t. ?, L? , is of dimension G2 ?G(G+ 1)/2.
136
B = (Bh) be a nH ? n matrix with blocks Bh of dimension n? n. We then have
? [ ] [ ]KH[G?G IHG ? (I ? vec(I )
?
G n )(K
? ?
n?G ? In) (IG ?B)? InG vec(A )
= tr(A11B1), ..., tr(A11BH), ..., tr(AG1B1), ..., tr(AG1BH), ]?
..., tr(A1GB1), ..., tr(A1GBH), ..., tr(AGGB1), ..., tr(AGGBH)
= vecn(A,B),
where KHG?G is the commutation matrix on any HG?G matrix T such that vec(T ?) = KHG?Gvec(T ).
Proof
By definition of commutation matrix Kn?G, we have
?? ??? I ?? G ? i ?1,n
Kn?G = ????
.
.. ????
???
IG ? i?n,n
where ii,n is the ith column of In. Hence, K ?n?G = [IG ? i1,n, ..., IG ? in,n]. Therefore,
(IG ? vec(I )?n )(K ? ?n?G ? In) = (IG ? vec(In) )[IG ? i1,n ? In, ..., IG ? in,n ? In]
= [IG ? vec(In)?(i1,n ? In), ..., IG ? vec(I )?n (in,n ? In)]
= [IG ? i?1,n, ..., IG ? i?n,n]
where the last equality follows because (i?i,n ? In)vec(In) = vec(Inii,n) = ii,n.
137
Also note that
[IG ? i?1,n, ..., IG ? i?n,n](Bh ? InG)
=[(I ? i?G 1,n)(b11,h ? InG) + (IG ? i?2,n)(b21,h ? InG) + ...+ (IG ? i?n,n)(bn1,h ? InG), ...,
(IG ? i? ? ?1,n)(b1n,h ? InG) + (IG ? i2,n)(b2n,h ? InG) + ...+ (IG ? in,n)(bnn,h ? InG)]
=[I ? ? ?G ? b.1,h, IG ? b.2,h, ..., IG ?B.n,h]
where b.i,h denotes the i-th column of Bh.
138
It then follows that
? [ ] [ ]KHG?G IHG ? (I ? vec(I )? ?G n )(Kn?G ? In) (IG ?B)? InG v?ec(A?)? ??? B1 ? InG ??? ?? ??? ??
?? . ?.. ???
? ?? IG ? i
?
1,n, ..., IG ? i? ??n,n ?????? BH ? InG ??
=K ? . .HG?G ??? . ???????
?
. . . ????
?
vec(A?)
? ?
? ?? I ?G ? i1,n, ..., IG ? i? ?n,n ???? B ?1 ? InG ??. ?
HG diagonal blocks ????
.. ????
? ?? BH ? InG? G? ?diagonal blocks
??? I ?B? ... I ?? G 1.,1 G ?B ?n.,1 ????? .. . . .. ?? . . . ??? ?
?
? I ?B? ?? G ?
?
1.,H ... IG ?Bn.,H
=K ? ? ?
?
??
?
. . ?
HG G ? . ??? vec(A
?
? )???? IG ?B
?
1.,1 ... I ?B? ?G n.,1
? ?? ?? .. . . ?? . .
... ???
? ?? I ?B? ?
?
G 1.,H ... IG ?Bn.,H
? ? ?G di?agonal blocks ?
??? tr(A?(i i?? 1,G 1,G ?B
? ? ? ?
1)) ???? ???? tr(A (i1,Gi1,G ?B1)) ????? .? .
. ?
? ??
?
??? ... ???? ?? tr(A? ? ? ?? (i1,Gi
?
G,G ?B?1)) ???? ??? tr(A
?(i ? ?1,Gi1,G ?BH)) ?
??
?
? ..? . ?? ????
?
? ? ?
??? . ?? .
. ????
tr(A?(i i? ?B? ? ? ?1,G ? ?1,G H)) tr1(A39(iG,Gi1,G ?B1))
??? ??? ?? ? ?? .. ? ???? .. ?
??
? . . ???? ? ? ?? ?? tr(A
?(i ? ? ? ? ?1,GiG,G ?BH)) ??? ??? tr(A (iG,Gi ?1,G ?BH)) ??
=K ?HG?G ??????
... ??
????
= ???? ..? . ??
?
?? (A.13)?? tr(A
?(i ? ? ? ? ? ? ?G,Gi1,G ?B1)) ?? ?? tr(A (i1,GiG,G ?B1))? ???
?
..
? . ??? ?
? ?? .. ?. ?
? tr(A?(i i? ?B?? )) ??
?? ??? ??
?
? ?? ??G,G G,G 1 tr(A (i i? ?B? ?1,G G,G H))? ?????
.. ?. ???? ??
?
??
. ?
.. ??? ??? ?? tr(A
?(iG,Gi
?
1,G ?B?H))
? ????
? ? ? ? ?
?? .. ?? ?
?? tr(A (i ?G,GiG,G ?B1)) ??
? . ? ????
.. ?. ????
tr(A?(i i? ?B? ?G,G G,G H)) tr(A (iG,Gi? ?G,G ?BH))
The second to the last equality of above follows because, for i, j = 1, ..., G and h = 1, ..., H:
[0, ..., i?? ? b?i,G .1,h, ?..?., i? ? ?i,G ? b.n,?h, ..., 0]vec(A )
jth 1? n2G block
=vec(i i? ? ?i,G j,G ?Bh) vec(A )
=tr(A?(i ?j,Gii,G ?B?h))
where the last equality follows from, e.g., Proposition 88 in Dhrymes and Guerard (1978).3
Finally, note that tr(A?(ij,Gi?i,G ?B?h)) = tr((i ?i,Gij,G ?Bh)A) = tr(AjiBh), and thus
K ?
[
I ? (I ? vec(I )?
] [ ]
H[G?G HG G n )(K
?
n?G ? In) (IG ?B)? I ?nG vec(A )
= tr(A11B1), ..., tr(A11BH), ..., tr(AG1B1), ..., tr(AG1BH), ]?
..., tr(A1GB1), ..., tr(A1GBH), ..., tr(AGGB1), ..., tr(AGGBH)
= vecn(A,B)
as claimed.
A.2.2 Proof of Proposition 1
Let x denote a n? 1 vector with elements being functions of the r-element vector ?. Let A
be n? n matrix and elements in A are independent of ?. Then for B = x?Ax,
?B ? ?x= x (A? + A) .
?? ??
3The Proposition 88 in Dhrymes and Guerard (1978) states that: Let A, B be suitably dimensioned matrices.
Then tr(AB) = vec(A?)?vec(B) = vec(B?)?vec(A)
140
If A is symmetric, it then follows that
?B ? ?x= 2x A .
?? ??
Alternatively, if the elements of A depend on some vector ? and A is nonsingular, Lemma
2 shows that ?ln|A| = vec(A?1?)? ?vec(A) .
?? ??
Using these results, we first write the score vectors of the log-likelihood function (2.15) as
?lnL(?)
= vec(S(?, ?)?1
? ??vec(S(?, ?))) ? u? ?S(?, ?)y(?)R?(?)(??1(?)? In)R(?)
??g ??g ??g
(A.14)
?lnL(?) ?C?(?)x
= u?(?)R?(?)(??1(?)? In)R(?) (A.15)
??g ??g
?lnL(?) ?1? ??vec(S(?, ?)) ?S(?, ?)y= vec(S(?, ?) ) ? u?(?)R?(?)(??1(?)? In)R(?)
??g ??g ??g
(A.16)
?lnL(?) ?1? ??vec(R(?)) ? ? ?R(?)u(?)= vec(R(?) ) u (?)R?(?)(??1(?)? In) (A.17)
??g ??g ??g
?lnL(?) n ??vec(?
?1(?)) ? 1 ?u
?(?)R?(?)(??1(?)? In)R(?)u(?)
= vec(?(?)) (A.18)
?? 2 ?? 2 ??
We start off by simplifying the first terms in the score vectors.
Recall that S(?, ?) = InG ? (B? ? In) ? (?? ? In)(IG ? W ) and R(?) = InG ? (P ? ?
In)(IG ?M). Apply Proposition 86 of Dhrymes and Guerard (1978) 4, we can obtain
4The Proposition 86 of Dhrymes and Guerard (1978) states: Let A, B be n ? m, m ? q respectively. Then
vec(AB) = (B? ? In)vec(A) = (Iq ?A)vec(B).
141
[ ]
vec[(?? ? In)(IG ?W )] = [(IG ?W
?)? I ?nG] vec(? ? In) (A.19)
vec[(P ? ? I ? ?n)(IG ?M)] = (IG ?M )? InG vec(P ? In) (A.20)
Also, by the definition of selection matrices,
?vec(B) ?vec(?) ?vec(P )
= ig,G ? L?,g, = ig,G ? L?,g, and = ig,G ? L?,g. (A.21)
??g ??g ??g
In light of (A.19), (A.20), (A.21) and apply Lemma 3, we can obtain the following
?vec(S(?, ?)) ?vec(B? ? In) ?vec(B)
= ?
??g [ ?vec(B) ??g ]
= ? IG ? (Kn?G ? In)(IG ? vec(In)) KG?G(ig,G ? L?,g) (A.22)
?vec(S(?, ?)) ???vec[(? ? In)(IG ?W )] ?vec(?)=
??g [ ?vec(?) ] ??g
? ? ? ? ?vec(?
? ? In) ?vec(?)
= (IG W ) InG
[ ] [ ?vec(?) ??g ]
= ? (IG ?W ?)? InG IPG ? (Kn?G ? In)(IG ? vec(In)) KPG?G(ig,G ? L?,g)
(A.23)
?vec(R(?)) ?vec[(P ?? ? In)(IG ?M)] ?vec(P )=
??g [ ?vec(P ) ] ??g
? ? ? ? ?vec(P
? ? In) ?vec(P )
= (IG M ) InG
[ ] [ ?vec(P ) ??g ]
= ? (I ?G ?M )? InG IQG ? (Kn?G ? In)(IG ? vec(In)) KQG?G(ig,G ? L?,g)
(A.24)
where KG?G, KPG?G and KQG?G are the commutation matrices on B, ? and P , respectively.
Now, to further simplify the above expressions, we[take the followin]g steps: First, we apply
Lemma 5 taking A = S?1 and B = In and, noting that (IG ? In)? InG = In2G2 , to obtain
142
? [ ]
v[ec(S(?, ?)
?1 )? IG ? (Kn?G ? In)(IG ? vec(In)) KG?G(ig,G ? L?,g)]
= tr(S(?, ?)11), ..., tr(S(?, ?)G1), ..., tr(S(?, ?)1G), ..., tr(S(?, ?)GG) (ig,G ? L?,g)
=vec (S?1n (?, ?))
?(ig,G ? L?,g)
=vecn(S(?, ?)
.g)?L?,g, (A.25)
where S(?, ?)ij denotes the ijth n? n block of S?1(?, ?).
Next, take A = S?1 and B = W , it then follows from Lemma 5 that
? [
vec(S(?, ?)?1 )? (I ?W ?
] [ ]
[ G )? InG IPG ? (Kn?G ? In)(IG ? vec(In)) KPG?G(ig,G ? L?,g)
= tr(S(?, ?)11W1), ..., tr(S(?, ?)
11WP ), ..., tr(S(?, ?)
G1W1), ..., tr(S(?, ?)
G1WP ), ]
..., tr(S(?, ?)1GW1), ..., tr(S(?, ?)
1GWP ), ..., tr(S(?, ?)
GGW1), ..., tr(S(?, ?)
GGWP ) (ig,G ? L?,g)
=vec (S?1n (?, ?),W )
?(ig,G ? L?,g)
=vecn(S(?, ?)
.g,W )?L?,g. (A.26)
Next, let A = R?1 and B = M ,it then follows from Lemma 5 that
? [
v?ec(R(?)?1 )? (I ?M ?
] [ ]
G )? InG IQG ? (Kn?G ? In)(IG ? vec(In)) KQG?G(ig,G ? L?,g)
? ?
=?? ?t?r(R?1 ?11 (?)M1), ..., tr??(R1 (?)MQ), 0, ..., 0?, ..., 0?, ..., 0, tr(R?1 ?1G (?)M?1?), ..., tr(RG (?)MQ?)?? (ig,G ? L?,g)
1st 1?QG block Gth 1?QG block
=vecn(R
?1(?),M)?(ig,G ? L?,g)
=vec ?1n(Rg (?g),M)
?L?,g. (A.27)
Now equations (A.25) - (A.27) represents the simplified first terms of the scores w.r.t. ?g, ?g and
143
?g.
We now simplifying the second terms in the scores. First, we note that
?C?(?)x ?(IG ?X)vec(C) ?vec(C)
= = (IG ?X)(ig,G ? L?,g) = ig,G ?Xg (A.28)
??g ?vec(C) ??g
Next, recalling that vec(Y ) = (IG?W )vec(Y ) and vec(U) = (IG?M)vec(U) and applying
again Proposition 86 of Dhrymes and Guerard (1978), we have
(B? ? In)vec(Y ) = (IG ? Y )vec(B) (A.29)
(?? ? In)(IG ?W )vec(Y ) = (?? ? In)vec(Y ) = vec(Y ?) = (IG ? Y )vec(?), (A.30)
(P ? ? In)(I ?G ?M)vec(U) = (P ? In)vec(U) = vec(UP ) = (IG ? U)vec(P ).
Consequently
?S(?, ?)y ?[InG ?B? ?n]y ?(B ? In)vec(Y )= = = ?(IG ? Y ) (A.31)
?vec(B) ?vec(B) ?vec(B)
?S(?, ?)y ?[I ? ?nG ?Bn]y ??[(?n ? In)(IG ?Wn)]vec(Y )= = = ?(IG ? Y ), (A.32)
?vec(?) ?vec(?) ?vec(?)
?R(?)u ?[I ? P ?nG n ]u ??[(P
?
n ? In)(IG ?Mn)]vec(U)= = = ?(IG ? U)
?vec(P ) ?vec(P ) ?vec(P )
144
and thus
?S(?, ?)y ??S(?, ?)y ?vec(B)= = ?(IG ? Y )(ig,G ? L?,g) = ig,G ? Yg, (A.33)
??g ?vec(B) ??g
?S(?, ?)y ??S(?, ?)y ?vec(?)= = ?(IG ? Y )(ig,G ? L?,g) = ig,G ? Y g, (A.34)
??g ?vec(?) ??g
?R(?)u ? ?R(?)u ?vec(P )= = ?(IG ? U)(ig,G ? L?,g) = ig,G ? U g. (A.35)
??g ?vec(P ) ??g
Substituting (A.28) - (A.35) into the second terms of the scores and observe that
(??1(?)? In)R(?)(ig,G ?X ) = (?.gg ? In)Rg(?g)Xg,
(??1(?)? In)R(?)(i .gg,G ? Yg) = (? ? In)Rg(?g)Yg,
(??1(?)? In)R(?)(ig,G ? Y g) = (?.g ? In)Rg(?g)Y g,
(??1(?)? In)(ig,G ? U g) = (?.g ? In)U g,
the second terms of the scores can then be written as
u?(?)R?(?)(?.g ? In)Rg(?g)Xg, (A.36)
u?(?)R?(?)(?.g ? In)Rg(?g)Yg, (A.37)
u?(?)R?(?)(?.g ? In)Rg(?g)Y g, (A.38)
u?(?)R?(?)(?.g ? In)U g. (A.39)
145
Finally, note that ?vec(?
?1(?)) = L? by definition and recall Lemma 4 shows that??
?u?(?)R?(?)(??1(?)? In)R(?)u(?) ( )?
= vec E(?)?E(?) L?.
??
Substituting these results into (A.18), we obtain obtain
?lnL(?) 1 [ ? ( ? ) ]?= nvec(?(?)) ? vec E(?) E(?) L?. (A.40)
?? 2
We combine above results (A.25), (A.26), (A.27) for simplifying first terms of the scores,
(A.36) - (A.39) for simplifying second terms of the scores, as well as (A.40). We can then re-write
scores in (A.14) - (A.18) as
?lnL(?)
=? vecn(S(?, ?).g)?L ? ? .g?,g + u (?)R (?)(? ? In)Rg(?g)Yg,
??g
?lnL(?)
=u?(?)R?(?)(?.g ? In)Rg(?g)Xg,
??g
?lnL(?)
=? vec (S(?, ?).g,W )?L + u?n ?,g (?)R?(?)(?.g ? In)Rg(?g)Y g,
??g
?lnL(?)
=? vec (R?1n g (?g),M)?L ??,g + u (?)R?(?)(?.g ? In)U g,??g
?lnL(?) 1 [ ( ) ]?
= nvec(?(?))? ? vec E(?)?E(?) L?.
?? 2
146
Recall the notation stacked blocked matrices defined in Chapter A.2.1, we have
vec (S(?, ?).g)?L = [tr(S1gn ?,g n (?, ?)), . . . , tr(S
Gg
n (?, ?))]L?,g,
vecn(S(?, ?)
.g,W )?L 1g?,g = [tr(Sn (?, ?)W1,n), . . . , tr(S
1g
n (?, ?)WP,n),
. . . , tr(SGgn (?, ?)W1,n), . . . , tr(S
Gg
n (?, ?)WP,n)]L?,g,
vec (R?1n g (?g),M)
?L?,g = [tr(R
?1 ?1
g,n(?g)M1,n, . . . , tr(Rg,n(?g)MQ,n)]L?,g.
Taking transposes of above expressions of scores, it then follows
?lnLn(?, ?)
=Y ? R (? )?(?g.? g,n g,n g (?)? In)Rn(?)un(?)? L
?
?,gvecn(S(?, ?)
.g),
??g
?lnLn(?, ?) ? ? g.
? =Xg,nRg,n(?g) (? (?)? In)Rn(?)un(?),??g
?lnLn(?, ?) ?
? =Y g,nRg,n(?
? g.
g) (? (?)? In)Rn(?)u (?)? L?n ?,gvecn(S(?, ?).g,W ),??g
?lnLn(?, ?)
? =U g,n(?g)
?(?g.(?)? In)Rn(?)un(?)? L? ?1
?? ?,g
vecn(Rg (?g),M),
g
?lnLn(?, ?) 1 ? [ ]
? = L? nvec(?(?))? vec(En(?)
?En(?)) ,
?? 2
The statement of the proposition now follows.
A.2.3 Proof of Proposition 2
Under Assumption 4 and note that ml 1 e ? g.g(?0) = Z?g(?0) (?0 ? In)? is linear in ?. Thusn
Emlg(?0) = 0 as the proposition claimed.
147
A.2.4 Proof of Proposition 3
For the quadratic moments, note that at ?0, we have ?g.(?0) = EE ?g,0E0 =?[?g1,0, ..., ?gG,0].
Next, observe that v = S?1(?0, ? )u = S?10 (?0, ? )R?10 (?0)? and thus vl =
G lr ?1
r=1 S Rr ?r.
Consequently, for the l-th vector in V ? ? g.g (?)Rg(?g)(? (?) ? In)R(?)u(?) at the true parameter
values we have
?G ?G
Ev?R? (?g. ? I )Ru = E ??R??1 lr? ? gsl g 0 n r r S Rg ?0 ?s (A.41)
r?=1 ? s=1G G
= tr[R??1Slr? ?r Rg?
gs
0 E?s?
?
r] (A.42)
?r=?1 s=1G G
= tr[R??1Slr? ?r Rg?
gs
0 ?sr,0] (A.43)
?r=1 s=1G ?G
= tr[R??1Slr?R? ] ?gsr g 0 ?sr,0 = tr[R
??1 lg? ? lg
g S Rg] = tr(S )
r=1 s=1
? ?
observing that G ?gs G gss=1 0 ?sr,0 = 0 for r ?= g and s=1 ?0 ?sr,0 = 1 for r = g. This proves that
EV ?g (?0)R
? g.
g(?0,g)(?0 ? In)R(?0)u(?0) = ??,g(?0, ?0) ?
Next, since v = (IG ? W )v and thus v = W G lr ?1l,p p r=1 S Rr ?r for l = 1, , , .G and p =
148
1, ..., P . In light of the proof above, we the have
?G ?G
Ev? R? (?g. ? I )Ru =E ??R??1Slr? ?l,p g 0 n r r WpR? ?
gs
g 0 ?s
?r=?1 s=1G G
= tr[R??1Slr?r W
?R? ?gsp g 0 E?
?
s?r]
?r=1 s=1G ?G
= tr[R??1Slr?W ?R? gsr p g?0 ?sr,0]
?r=1 s=1G ?G
= tr[R??1r S
lr?W ? ? gs ??1 lg? ? lgpRg] ?0 ?sr,0 = tr[Rg S Rg] = tr(S Wp).
r=1 s=1
?
This shows that EV ? g.g(?0)Rg(?0,g)(?0 ? In)R(?0)u(?0) = ??,g(?0, ?0)
For the moment conditions w.r.t. ?g, recall that U g = [M1ug, ...,MQug] and hence
Eu? M ?(?g.g q 0 ? In)Ru = ?? R?1?M ?
g.
g g q(?0 ? In)??G
= E?? R?1?M ? ?gsg g q 0 ?s
?s=1G
= tr(R?1?M ? ?gsE? ??g q 0 s g)
?s=1G
= tr(R?1?M ? gsg q ?0 ?gs,0)
s=1
= tr(R?1g Mq)
? ?
where the last equality follows because G ?gss=1 0 ?gs,0 = 1. This proves that EU g(?
g.
0 ? In)Ru =
??,g(?g,0).
Putting above results together, the claim of the proposition follows.
149
A.2.5 Proof of Lemma 1
Recall from equation (2.27) in that the set of quadratic moments
?? ??? V ? ? g.? g,n(?) Rg,n(?g) (?n (?)? In)?n(?)? ??,g,n(?, ?)1 ?? ?
??
mq
?
g,n(?) = n ?? V g,n(?)
?Rg,n(?g)
?(?g.n (?)? In)?n(?)? ? ? ,?,g,n(?, ?) ???
U g,n(?g)
?(?g.n (?)? In)?n(?)? ??,g,n(?g)
for g = 1, . . . , G based on the score of the (log)-likelihood function, and
??,g,n(?, ?) = L
?
?,g[tr(S
1g
n (?, ?)), . . . , tr(S
Gg
n (?, ?))]
?, (A.44)
??,g,n(?, ?) = L
?
?,g[tr(S
1g
n (?, ?)W1,n), . . . , tr(S
1g
n (?, ?)WP,n), (A.45)
. . . , tr(SGgn (?, ?)W1,n), . . . , tr(S
Gg
n (?, ?)W
?
P,n)] (A.46)
? ? ?1?,g,n(?g) = L?,g[tr(Rg,n(?g)M1,n, . . . , tr(R
?1
g,n(?g)MQ,n)]
?.
Let vh,n(?) be the h-th n? 1 block in Vn(?) and assume vh,n(?) is one column in Vg,n(?). In
light of mqg,n(?) and the expression of Vg,n(?) deduced in Section 2.3.1, we see that
?G
vh,n(?) = S
hl
n (?, ?)R
?1
l,n(?l)?l,n(?)
l=1
The (full information) quadratic moments associated with the score w.r.t., e.g., bhg in the
g-th equ?ation, is thus given by?? ?G1 ? (?)?R?1 ? hl 1l,n l,n(?l) Sn (?, ?)??Rg,n(? ?g) (?g. ? In)?n(?)? tr(Shgn (?, ?)), (A.47)n n
l=1
150
Recall that Sh.n (?, ?) denotes the h-th n ? nG block of S?1n (?, ?) and ig,G denotes the g-th
column of the identity matrix of dimension G. We further note that (A.47) is equivalent to
1
? (?)?
( )
R?1
1
n n (?)
? i? h.g,G ? Sn (?, ?)? R?n(?)(??1 ? In)?n(?)? tr(Shgn (?, ?)). (A.48)n n
To see this, note that i? h.g,G ? Sn (?, ?)? is in fact
?? ??? 0 . . . Sh1n (?, ?)? . . . 0 ???
i? ? Sh. ? ? .. .. . ?g,G n (?, ?) =???? . .
.. ????
,
0 . . . ShGn (?, ?)
? . . . 0
which is a nG? nG matrix of zeros except for the g-th nG? n (column) block. Thus
1
? (?)?R??1(?)?
( )
i? ? Sh.n n g,G n (?, ?)? R?n(?)(??1 ? In)?n(?)n
? ???? 0 . . . R
?1(? ? h11,n 1) Sn (?, ?)
?Rg,n(? )
?
g . . . 0
? ?
?
1 ??
= ? (?)? ?? .. .. . ?1n . . .. ??? (? ? In)?? ? n
(?)
n
0 . . . R?1 ? hGG,n(?G) Sn (?, ?)
?R (? )? . . . 0
? g,n g ? ?? ?? ? (?1.? ? In)?n(?) ?1 ?? ? ?
?
G ?? ??
? ?? ?= 0 . . . , ? ?1 ? hl ? ? .? ?l,n(?) R .n l,n(?l)??Sn (?, ?) Rg,n(?g)?, . . . , 0??????? . ???l=1 ?
? g-th 1? n bloc?k (?G. ? In)?n(?)??G1= ? (?)?R?1 ? hl ?? ? g.l,n
n l,n
(?l) Sn (?, ?) Rg,n(?g) (? ? In)?n(?)
l=1
as desired. With entirely analogous steps, one can write an element in V ?g,n(?) R (? )?(?g.g,n g n (?)?
151
In)?n(?)? ??,g,n(?, ?) that is associated with the score w.r.t., e.g., ?hg,p, as
1 ? ? ( )1 1?n(?) Rn (?)? i?g,G ? Sh.n (?, ?)?W ?p,n Rn(?)?(??1 ? In)? hgn(?)? tr(Sn (?, ?)Wp,n), (A.49)n n
and the element in U ?g,n(?g) (?g.n (?)? In)?n(?)???,g,n(?g) that is associated with the score w.r.t.,
e.g., ?g,q, as
1
? (?)?(i i? ?R?1 1n g,G g,G g,n(?g)?M ?q,n)(??1 ? In)? (?)? tr(R?1n g,n(?g)Mq,n). (A.50)n n
Upon replacing ? with ? and taking transposes of the expressions (A.48),(A.49) and (A.50),
the claim of Lemma 1 now follows.
152
A.3 Explicit Expressions of VCV matrices
A.3.1 Explicit Expression of ??qgg,n(g)
Recall that the block corresponding to the quadratic moments is of the form
??
? ?
? ???? ?? ??? ?gg,n(g) ??gg,n(g) ??gg,n(g)? ???qgg,n(g) = ???
? ??
???? ?? ?? ? ,gg,n(g) ??gg,n(g) ??gg,n(g)
? ???
???? (g) ????
?
gg,n gg,n(g) ??
??
gg,n(g)
as well as the general forms of the individual elements in a sub-matrix of ??qgg,n(g), say ??
??
gg,n(g),
that presented in Section 2.5.1.
For easier presentation, we define the following matrices
[ ]
A??,l,A rg,Ar,g = MATd [Rg,n(??g,n)Sn (??n, ?? )R
A
n g,n(??g,n) , ]
A??,l,Ar,p,g = MATd [Rg,n(??
rg,A A
g,n)Wp,nS]n (??n, ??n)Rg,n(??g,n) ,
A??,l,Aq,g = MAT M
A
d q,nRg,n(??g,n) .
The superscript l is used to highlight these matrices are associated with the limited information
moments, in contrast to the full information matrices, e.g., A??,f,Ar,g defined in the next section.
Recall that ?g, ?g and ?g are mg,? ? 1, mg,? ? 1 and mg,? ? 1 vector of parameters. The
153
mg,? ?mg,? block ????gg,n(g) is
?? ( ) ( ) ??? tr A??,l,A
? ?
? 1,g (A?
?,l,A ?,l,A ?,l,A ?,l,A ?,l,A
1,g + A?1,g ) . . . tr A?1,g (A?G,g + A?G,g ) ??
??2 ?
?? gg,n ? .. . ???gg,n(g) = Ln ?,g ???? ( . )
. . . ( .. ) ???L?,g.
tr A??,l,A
? ? ?
G,g (A?
?,l,A ?,l,A
1,g + A?1,g ) . . . tr A?
?,l,A(A??,l,A + A??,l,AG,g G,g G,g )
The mg,? ?mg,? block ????gg,n(g) is
? ( ) ( ) ?
??? tr A??,l,A
? ?
? 1,g (A?
?,l,A ?,l,A
1,1,g + A?1,1,g ) . . . tr A?
?,l,A ?,l,A ?,l,A
1,g (A?G,P,g + A?G,P,g) ??
??2 ?
????
gg,n
(g) = L? ??? ( ... ) . .gg,n ?,g ? . (
.
.. ) ?????
L?,g,
n
? ?
tr A??,l,A(A??,l,A + A??,l,A ) . . . tr A??,l,A(A??,l,A ?,l,AG,g 1,1,g 1,1,g G,g G,P,g + A?G,P,g)
and in accordance to the order of quadratic moments mq,Ag,n,R(?, g) in defining the LQ-GSLIVE,
the first row in ????gg,n(g) (ignoring selecting matrix) is
[ ( ) ( )
tr A??,l,A(A??,l,A ?,l,A
? ?,l,A ?,l,A ?,l,A?
1,g( 1,1,g
+ A?1,1,g ) , . . . , tr A?1,g (A?) ( 1,P,g
+ A?1,P,g ) , )]
. . . , tr A??,l,A(A??,l,A ?,l,A
? ?,l,A ?,l,A ?,l,A?
1,g G,1,g + A?G,1,g ) , . . . , tr A?1,g (A?G,P,g + A?G,P,g) .
The mg,? ?mg,? block ????gg,n(g) is
?? ( ) ( ) ??? tr A??,l,A(A??,l,A + A??,l,A
?
) . . . tr A??,l,A
?
? 1,g 1,g 1,g 1,g (A?
?,l,A ?,l,A
Q,g + A?Q,g ) ?????2
?? gg,n ???gg,n(g) = L? .. . .?,g ???? ( . ) . (
... ) ???L?,g.n
tr A??,l,A(A??,l,A
?
+ A??,l,A ) . . . tr A??,l,A(A??,l,A
? ?
G,g 1,g 1,g G,g Q,g + A?
?,l,A
Q,g )
154
The m ?m block ????g,? g,? gg,n(g) is
?? ( ) ( ) ??? ?,l,A ?,l,A ?,l,A? ?,l,A ?,l,A ?,l,A?? tr A?1,1,g (A? ?1,1,g + A?1,1,g ) . . . tr A?1,1,g (A?G,P,g + A?G,P,g)??? . . . ????
??
? ( .. ) . . ( .. )??2gg,n ?
???? (g) = L? ??? tr A??,l,A(A??,l,A + A??,l,A?) . . . tr A??,l,A ?gg,n ?,g ? 1,P,g 1,1,g 1,1,g 1,P,g(A?
?,l,A ?,l,A ?L ,
n ? G,P,g
+ A?G,P,g)
? ?? . ?
? ?,g
? ( . ) . . . ( .. ?? . . ) ???
tr A??,l,A
?
(A??,l,A + A??,l,A ) . . . tr A??,l,A (A??,l,A ?,l,A
? ?
G,P,g 1,1,g 1,1,g G,P,g G,P,g + A?G,P,g)
and in accordance to the order of quadratic moments mq,Ag,n,R(?, g) in defining the LQ-GSLIVE,
the first row in ????gg,n(g) (ignoring selecting matrix) is
[ ( ) ( )
tr A??,l,A
? ?
1,1,g (A?
?,l,A + A??,l,A ) , . . . , tr A??,l,A(A??,l,A + A??,l,A
( 1,1,g 1,1,g ) 1,1,(g 1,P,g 1,P,g
) ,
)]
?
. . . , tr A??,l,A(A??,l,A + A??,l,A ) , . . . , tr A??,l,A ?,l,A ?,l,A
?
1,1,g G,1,g G,1,g 1,1,g (A?G,P,g + A?G,P,g) .
The mg,? ?mg,? block ????gg.n(g) is
?? ( ) ( ) ??? tr A??,l,A(A??,l,A ?,l,A? ?,l,A ?,l,A ?,l,A?? ?1,1,g 1,g + A?1,g ) . . . tr A?1,1,g (A?Q,g + A?Q,g )??
?
? ?. ?
2 ??? (
.. ) . . . ( ... ) ???
??
?? gg,n?? (g) = L? ? ? ?tr A??,l,A(A??,l,A + A??,l,A ) . . . tr A??,l,A(A??,l,A ?,l,A?? ?gg,n n ?,g L ,? 1,P,g 1,g 1,g 1,P,g Q,g
+ A?Q,g ) ??? ?,g
??? ( .. ) . . . ( .. ?? . . ) ???
tr A??,l,A (A??,l,A + A??,l,A
? ?,l,A ?,l,A ?,l,A? ?
G,P,g 1,g 1,g ) . . . tr A?G,P,g(A?Q,g + A?Q,g )
155
and lastly the m ??g,? ?mg,? block ??gg.n(g) is
?? ( ) ( ) ?? ?,l,A ?,l,A ?,l,A? ??? tr A?1,g (A?1,g + A?1,g ) . . . tr A?
?,l,A
1,g (A?
?,l,A
Q,g + A?
?,l,A
Q,g ) ??
??2 ?
?? gg,n?? (g) = L? ??? ( ... ) . . . ( ..gg,n n ?,g ? . ) ??
??L?,g.
?,l,A ?,l,A ?,l,A?tr A? (A? + A? ) . . . tr A??,l,A(A??,l,A ?,l,A
? ?
Q,g 1,g 1,g Q,g Q,g + A?Q,g )
A.3.2 Explicit Expression of ??qn
For easier expressions, we define
[ ]
A??,f,A ? r. Ar,g =MATD [Rn(??n)(ig,G ? Sn (??n, ??n))Rn (??n) ]
A??,f,Ar,p,g =MATD [Rn(?? )(i
?
n g,G ?Wp,nSr.n (??n], ??n))R
A
n (??n)
A??,f,Aq,g =MAT i i
? A
D g,G g,G ?Mq,nRg,n(??g,n) .
The superscript f is used to highlight these matrices are associated with the full information
moments, in contrast to the limited information matrices, e.g., A??,l,Ar,g defined in the previous
section.
Recall that the block corresponding to the quadratic moments is of the form
?? ???? ??
?? ???? ?? ?n n ??n ??
??q
?
n = ???? ??
?? ?? ??
n ??n ??n ????
???? ?? ??n ??n ??n
with each sub-matrix consists of G?G sub-blocks, as well as the general forms of the individual
156
elements in each sub-block, say ????gh,n, that presented in Section 2.5.2.
157
158
Explicitly, the gh-th block of ????n (of size mg,? ?mh,?) is
?? ( ) ( ) ???? tr A??,f,A(A??,f,A + (?? ? I )A??,f,A (???1 ? I )) . . . tr A??,f,A(A??,f,A + (?? ? I )A??,f,A
? ?1
? 1,g 1,h n n 1,h n n 1,g G,h n n G,h (??n ? In)) ?? ????? 1= L? ? ?? ( .. . . .. ?gh,n n ?,g ? . ) . ( . ) ???
L?,h.
tr A??,f,A(A??,f,A + (?? ? I )A??,f,A
?
(???1 ? I )) . . . tr A??,f,A(A??,f,A
? ?
G,g 1,h n n 1,h n n G,g G,h + (??n ? I )A?
?,f,A
n G,h (??
?1
n ? In))
The gh-th block of ????n (of size mg,? ?mh,?) is
?? ( ) ( ) ?? ? ??? tr A?
?,f,A ?,f,A ?,f,A ?1 ?,f,A ?,f,A
1,g (A?1,1,h + (??n ? In)A?1,1,h (??n ? In)) . . . tr A?1,g (A?G,P,h + (??n ? In)A?
?,f,A ?1
G,P,h (??n ? In)) ???
????
1
= L? ??? ( .. . . .. ?gh,n ?,g ? . ) .n ( . ) ????
L?,h,
? ?
tr A??,f,A ?,f,A ?,f,A ?1 ?,f,AG,g (A?1,1,h + (??n ? In)A?1,1,h (??n ? In)) . . . tr A?G,g (A?
?,f,A
G,P,h + (?? ? I )A?
?,f,A ?1
n n G,P,h (??n ? In))
where, to clarify the ordering, the first row in the bracket is
[ ( ) ( )
tr A??,f,A
? ?
1,g (A?
?,f,A
1,1,h + (??n ? In)A?
?,f,A
1,1,h (??
?1 ? I )) , . . . , tr A??,f,An n 1,g (A?
?,f,A + (?? ? I )A??,f,A (???1
( ) ( 1,P,h n n 1,P,h n
? In)) , )]
. . . , tr A??,f,A(A??,f,A + (?? ? I )A??,f,A
?
(???1 ? I )) , . . . , tr A??,f,A(A??,f,A
?
1,g G,1,h n n G,1,h n n 1,g G,P,h + (??n ? I )A?
?,f,A (???1n G,P,h n ? In)) .
159
The gh-th block of ????n (of size mg,? ?mh,?) is
?? ( ) ( ) ?? ? ?? tr A??,f,A? 1,g (A?
?,f,A
1,h + (??n ? In)A?
?,f,A ?1 ?,f,A ?,f,A ?,f,A ?1
1,h (??n ? In)) . . . tr A?1,g (A?Q,h + (??n ? In)A?Q,h (??n ? In)) ???
????
1
= L? ??? ( ... ) . . .. ?gh,n ?,g ? . ( . ) ???
L?,h.
n
tr A??,f,A(A??,f,A ?,f,A
? ? ?,f,A ?,f,A ?,f,A?1 ? ?1
G,g 1,h + (??n ? In)A?1,h (??n ? In)) . . . tr A?G,g (A?Q,h + (??n ? In)A?Q,h (??n ? In))
The gh-th block of ????n (of size mg,? ?mh,?) is
?? ( ) ( ) ??? ?,f,A ? ?? tr A?1,1,g (A?
?,f,A + (?? ? I )A??,f,A (???1 ? I )) . . . tr A??,f,A(A??,f,A + (?? ? I )A??,f,A (???1n n n ?1,1,h 1,1,h n 1,1,g G,P,h n n G,P,h n ? In))
??
???
?? ( .
. .
. ) . . .?? (
.. ) ????
????
1
= L? ? tr A??,f,A ?,f,A ?,f,A? ?1 ?,f,A ?,f,A ?,f,A?? ?1 ?gh,n n ?,g L ,? 1,P,g
(A?1,1,h + (??n ? In)A?1,1,h (??n ? In)) . . . tr A?1,P,g (A?G,P,h + (??n ? In)A?G,P,h (??n ? In)) ? ?,h
??
?
? ( .
?
..? )
. . . ( ... ) ????
tr A??,f,A(A??,f,A + (?? ? I )A??,f,A
?
(???1 ? I )) . . . tr A??,f,A(A??,f,A
?
+ (?? ? I )A??,f,A (???1G,P,g 1,1,h n n 1,1,h n n G,P,g G,P,h n n G,P,h n ?
?
In))
160
where, to clarify the ordering, the first row in the bracket is
[ ( ) ( )
?,f,A ?,f,A ?tr A? (A? + (?? ? I )A??,f,A (???1 ? I )) , . . . , tr A??,f,A ?,f,A ?,f,A
? ?1
1,1(,g 1,1,h n n 1,1,h n n 1,1,g
(A?1,P,h + (??) ( n
? In)A?1,P,h (??n ? In)) , )]
. . . , tr A??,f,A(A??,f,A + (?? ? I )A??,f,A
?
(???1 ? I )) , . . . , tr A??,f,A(A??,f,A ?,f,A
? ?1
1,1,g G,1,h n n G,1,h n n 1,1,g G,P,h + (??n ? In)A?G,P,h (??n ? In)) .
The gh-th?block(of ????n (of size mg,? ?mh,?) is? ) ( ) ??? tr A??,f,A(A??,f,A + (?? ? I )A??,f,A?(???1 ? I )) . . . tr A??,f,A ?,f,A ?? 1,1,g 1,h n n 1,h n n 1,1,g (A?Q,h + (??n ? I )A?
?,f,A (???1n Q,h n ? In)) ???
??? ( ... ) . . . ( .
?
? ..? )
????
????
1
= L? ?? tr A??,f,A(A??,f,A + (?? ? I )A??,f,A?(???1 ? I )) . . . tr A??,f,A(A??,f,A + (?? ? I )A??,f,A? ?1 ?gh,n n ?,g 1,P,g 1,h n n 1,h n n 1,P,g Q,h n n L .Q,h (??n ? In)) ?? ?,h
???
??? ( .. ) . . . ( .. ) ?
?
. . ???
?,f,A ?,f,A ?,f,A? ? ?tr A?G,P,g(A?1,h + (??n ? In)A?1,h (???1n ? In)) . . . tr A??,f,A(A??,f,A ?,f,A ?1G,P,g Q,h + (??n ? In)A?Q,h (??n ? In))
Lastly, the?gh-th(block of ????n (of size mg,? ?mh,?) is ) ( ) ?
?? ? ??? tr A?
?,f,A ?,f,A ?,f,A
1,g (A?1,h + (??n ? In)A?1,h (???1n ? In)) . . . tr A?
?,f,A
1,g (A?
?,f,A
Q,h + (??
?,f,A ?1
n ? In)A?Q,h (??n ? In))
? ?
??
?? 1?? = L? ?? ( .. . .gh,n .n ?,g ? . ) (
.. ?. ) ???L?,h.
tr A??,f,A(A??,f,A + (?? ? I )A??,f,A
? ? ? ?1
Q,g 1,h n n 1,h (??n ? In)) . . . tr A?
?,f,A ?,f,A ?,f,A ?1
Q,g (A?Q,h + (??n ? In)A?Q,h (??n ? In))
A.4 Additional Monte Carlo Results
In this section, we report additional Monte Carlo results utilize the ?dumbbell-shaped?
weights matrix considered in Section 2.7.5.
A.4.1 Results with Alternative Wn?s
For the robustness check, we experiment with the same model specification as those in
Section 2.7 for each scenario, but with a subset of the parameter constellations described in the
main text. Specifically, for Scenario I, we experiment with
Eqn 1 b21 = 0.15 ?11,1 = 0.3 ?11,2 = 0.2 ?11 = 0.2 ?12 = 0.1
Eqn 2 b12 = 0.3 ?22,1 = 0.3 ?22,2 = 0.15 ?21 = 0.1 ?22 = 0
We consider both the cases with c = c = [1, 1, 1]?.1 .2 and c.1 = c.2 = [0.4, 0.4, 0.4]
? in this
scenario.
Scenario II
We experiment with
Eqn 1 b21 = 0.15 ?11,1 = 0.3 ?21,1 depends on ?11,1, b21, ?
Eqn 2 b12 = 0.3 ?22,1 = 0.3 ?22,2 = 0.15
For the parameters on the exogenous variables, we let
? Equation 1: c11, c21, c31 = 1; c41, c51, c61 depend on c11,c21, c31, respectively, as well as
?11,1 and ?,
? Equation 2: c72, c82, c92 = 1,
161
and we set the deviation parameter ? = 1 and 0.4 (corresponds to the strong identification and
the weak identification cases, respectively).
162
163
Scenario I
Table A.1: Median and RMSE of Scenario I, homoskedasticity, alternative Wn?s
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.151 0.023 0.164 0.028 0.154 0.025 0.152 0.024 0.152 0.024 0.151 0.024 0.151 0.024 0.154 0.024 0.154 0.025
?11,1 0.300 0.303 0.031 0.307 0.036 0.311 0.035 0.301 0.034 0.303 0.032 0.311 0.036 0.309 0.032 0.301 0.036 0.301 0.032
?11,2 0.200 0.196 0.032 0.178 0.044 0.188 0.037 0.194 0.036 0.196 0.033 0.186 0.040 0.191 0.037 0.193 0.036 0.193 0.035
?11 0.200 0.191 0.062 0.182 0.072 0.185 0.072 0.191 0.066 0.194 0.069 0.183 0.071 0.189 0.071 0.196 0.069 0.196 0.062
?12 0.100 0.095 0.070 0.111 0.078 0.105 0.078 0.092 0.075 0.092 0.078 0.100 0.077 0.100 0.077 0.098 0.080 0.105 0.068
b12 0.300 0.301 0.022 0.313 0.025 0.302 0.023 0.303 0.022 0.302 0.021 0.302 0.023 0.301 0.021 0.304 0.022 0.303 0.023
?22,1 0.300 0.300 0.029 0.306 0.037 0.308 0.033 0.299 0.034 0.298 0.030 0.304 0.037 0.305 0.033 0.299 0.035 0.298 0.030
?22,2 0.150 0.151 0.030 0.132 0.041 0.140 0.036 0.149 0.036 0.148 0.032 0.143 0.039 0.144 0.033 0.149 0.036 0.148 0.031
?21 0.100 0.097 0.064 0.088 0.069 0.090 0.071 0.096 0.069 0.103 0.069 0.088 0.070 0.096 0.068 0.098 0.072 0.102 0.069
?22 0.000 -0.014 0.075 0.002 0.085 -0.006 0.087 -0.017 0.090 -0.017 0.089 -0.007 0.089 -0.009 0.090 -0.007 0.092 -0.010 0.078
Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.152 0.060 0.223 0.090 0.181 0.069 0.155 0.060 0.158 0.061 0.161 0.058 0.158 0.067 0.169 0.063 0.162 0.063
?11,1 0.300 0.311 0.080 0.356 0.116 0.367 0.121 0.298 0.086 0.308 0.084 0.355 0.107 0.352 0.101 0.302 0.089 0.306 0.080
?11,2 0.200 0.187 0.077 0.081 0.154 0.108 0.134 0.189 0.089 0.182 0.087 0.130 0.114 0.145 0.105 0.175 0.092 0.177 0.082
?11 0.200 0.183 0.096 0.114 0.157 0.108 0.163 0.190 0.116 0.183 0.110 0.133 0.132 0.142 0.126 0.186 0.117 0.188 0.098
?12 0.100 0.107 0.096 0.225 0.174 0.190 0.152 0.101 0.117 0.108 0.108 0.163 0.131 0.146 0.128 0.116 0.125 0.117 0.099
b12 0.300 0.304 0.050 0.360 0.079 0.318 0.062 0.308 0.057 0.309 0.053 0.310 0.054 0.308 0.056 0.317 0.056 0.308 0.056
?22,1 0.300 0.297 0.071 0.347 0.110 0.359 0.116 0.291 0.088 0.300 0.078 0.329 0.093 0.328 0.092 0.294 0.092 0.295 0.075
?22,2 0.150 0.148 0.074 0.050 0.134 0.070 0.125 0.148 0.089 0.137 0.078 0.103 0.104 0.104 0.098 0.135 0.089 0.139 0.072
?21 0.100 0.107 0.102 0.025 0.144 0.019 0.153 0.106 0.113 0.101 0.112 0.057 0.123 0.065 0.119 0.106 0.119 0.107 0.106
?22 0.000 -0.017 0.105 0.089 0.154 0.071 0.142 -0.018 0.128 -0.008 0.117 0.028 0.125 0.022 0.121 -0.004 0.126 -0.008 0.107
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
164
Scenario II
Table A.2: Median and RMSE of Scenario II, homoskedasticity, alternative Wn?s
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.148 0.024 0.163 0.027 0.155 0.025 0.150 0.024 0.149 0.024 0.166 0.029 0.156 0.026 0.152 0.024 0.150 0.024
?11,1 0.300 0.300 0.027 0.324 0.050 0.320 0.045 0.304 0.045 0.301 0.043 0.317 0.036 0.314 0.033 0.305 0.032 0.303 0.030
?21,1 -0.195 -0.195 0.027 -0.198 0.032 -0.191 0.032 -0.196 0.031 -0.195 0.030 -0.205 0.031 -0.197 0.029 -0.199 0.029 -0.196 0.027
c41 -1.300 -1.302 0.065 -1.312 0.075 -1.314 0.065 -1.300 0.074 -1.300 0.067 -1.303 0.070 -1.308 0.062 -1.298 0.070 -1.303 0.064
c51 -1.300 -1.300 0.070 -1.306 0.078 -1.309 0.074 -1.297 0.080 -1.301 0.075 -1.301 0.081 -1.309 0.070 -1.298 0.082 -1.303 0.070
c61 -1.300 -1.291 0.074 -1.311 0.082 -1.304 0.071 -1.295 0.085 -1.292 0.070 -1.302 0.078 -1.300 0.074 -1.295 0.079 -1.294 0.074
b12 0.300 0.300 0.023 0.312 0.026 0.307 0.024 0.301 0.024 0.301 0.024 0.314 0.027 0.308 0.024 0.303 0.023 0.305 0.024
?22,1 0.300 0.299 0.027 0.311 0.037 0.307 0.036 0.299 0.038 0.299 0.035 0.309 0.030 0.306 0.029 0.303 0.029 0.301 0.027
?22,2 0.150 0.149 0.026 0.145 0.035 0.145 0.033 0.149 0.037 0.151 0.032 0.145 0.029 0.146 0.028 0.147 0.028 0.149 0.026
Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.149 0.024 0.165 0.029 0.157 0.026 0.149 0.024 0.149 0.024 0.165 0.029 0.156 0.025 0.151 0.024 0.150 0.024
?11,1 0.300 0.299 0.041 0.414 0.162 0.390 0.129 0.302 0.110 0.299 0.106 0.323 0.052 0.323 0.051 0.305 0.045 0.302 0.044
?21,1 -0.105 -0.104 0.029 -0.116 0.032 -0.109 0.030 -0.107 0.028 -0.106 0.031 -0.119 0.032 -0.112 0.031 -0.109 0.030 -0.106 0.029
c41 -0.700 -0.696 0.066 -0.791 0.145 -0.769 0.117 -0.705 0.127 -0.702 0.116 -0.715 0.076 -0.717 0.069 -0.698 0.074 -0.700 0.063
c51 -0.700 -0.696 0.074 -0.790 0.148 -0.776 0.121 -0.703 0.127 -0.710 0.112 -0.718 0.086 -0.715 0.078 -0.699 0.085 -0.698 0.077
c61 -0.700 -0.685 0.076 -0.789 0.147 -0.767 0.122 -0.698 0.120 -0.694 0.113 -0.713 0.086 -0.704 0.076 -0.695 0.082 -0.690 0.076
b12 0.300 0.300 0.025 0.314 0.027 0.307 0.025 0.302 0.025 0.302 0.024 0.316 0.029 0.307 0.026 0.304 0.025 0.305 0.025
?22,1 0.300 0.299 0.026 0.305 0.033 0.307 0.032 0.299 0.033 0.297 0.033 0.305 0.028 0.304 0.027 0.301 0.027 0.300 0.027
?22,2 0.150 0.150 0.025 0.143 0.036 0.140 0.033 0.149 0.037 0.151 0.032 0.142 0.028 0.144 0.026 0.147 0.027 0.149 0.025
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
A.4.2 Correlated x.k?s
Recall that we generated each column in Xn = [x1,n, x2,n, . . . , x6,n] by i.i.d. normal with
mean ?x = 1 and variance ?x = 1. Let xi.,n = [xi1,n, xi2,n, . . . , xi6,n] be the ith row of Xn.
We now set the covariance between xij,n and xik,n to be 0.25, for j ?= k, and thus the variance
covariance matrix for elements in xi.,n is
?? ???? 1 0.25 . . . 0.25 ??? ?
?
? . ?? 0.25 1
. . 0.25 ??? ,
??? ... . . ?? .
. . . ... ????
0.25 . . . 0.25 1
6?6
for i = 1, . . . , n. The Xn is generated once for all Monte Carlo experiments. The model speci-
fications and parameter choices are the same as those in Chapter A.4.1, but here we focus only
on the strong identification cases.
165
166
Scenario I
Table A.3: Median and RMSE of Scenario I, homoskedasticity, correlated Xn
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.151 0.023 0.165 0.027 0.155 0.024 0.152 0.023 0.151 0.023 0.152 0.023 0.150 0.025 0.153 0.024 0.152 0.025
?11,1 0.300 0.302 0.026 0.302 0.029 0.307 0.028 0.301 0.028 0.302 0.025 0.304 0.029 0.305 0.027 0.301 0.027 0.302 0.027
?11,2 0.200 0.197 0.028 0.184 0.033 0.189 0.029 0.196 0.028 0.197 0.027 0.193 0.029 0.194 0.028 0.194 0.029 0.195 0.027
?11 0.200 0.192 0.059 0.189 0.068 0.187 0.067 0.193 0.065 0.191 0.066 0.192 0.065 0.194 0.066 0.198 0.064 0.197 0.057
?12 0.100 0.094 0.065 0.105 0.075 0.101 0.078 0.093 0.073 0.098 0.075 0.096 0.073 0.098 0.076 0.099 0.077 0.101 0.067
b12 0.300 0.300 0.020 0.313 0.023 0.301 0.020 0.301 0.021 0.300 0.019 0.301 0.020 0.302 0.020 0.302 0.020 0.302 0.021
?22,1 0.300 0.299 0.023 0.300 0.025 0.306 0.026 0.299 0.025 0.298 0.024 0.300 0.026 0.302 0.025 0.299 0.025 0.300 0.024
?22,2 0.150 0.151 0.024 0.140 0.027 0.145 0.025 0.150 0.024 0.151 0.023 0.147 0.024 0.148 0.023 0.150 0.025 0.149 0.024
?21 0.100 0.097 0.061 0.095 0.063 0.095 0.066 0.098 0.063 0.099 0.064 0.097 0.063 0.100 0.065 0.102 0.064 0.102 0.062
?22 0.000 -0.017 0.076 -0.005 0.085 -0.009 0.086 -0.015 0.089 -0.017 0.090 -0.012 0.085 -0.012 0.089 -0.007 0.087 -0.008 0.076
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
167
Scenario II
Table A.4: Median and RMSE of Scenario II, homoskedasticity, correlated Xn
Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E
Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE
b21 0.150 0.148 0.023 0.161 0.026 0.153 0.024 0.150 0.023 0.149 0.022 0.164 0.027 0.155 0.023 0.152 0.024 0.150 0.023
?11,1 0.300 0.301 0.025 0.315 0.039 0.312 0.035 0.301 0.035 0.301 0.033 0.313 0.031 0.312 0.028 0.304 0.027 0.304 0.026
?21,1 -0.195 -0.197 0.026 -0.195 0.032 -0.190 0.032 -0.195 0.033 -0.195 0.030 -0.202 0.030 -0.196 0.028 -0.200 0.029 -0.197 0.028
c41 -1.300 -1.301 0.074 -1.311 0.085 -1.313 0.072 -1.301 0.086 -1.300 0.074 -1.302 0.081 -1.307 0.070 -1.298 0.082 -1.301 0.072
c51 -1.300 -1.301 0.079 -1.304 0.089 -1.309 0.083 -1.299 0.096 -1.302 0.082 -1.300 0.090 -1.308 0.079 -1.297 0.090 -1.301 0.080
c61 -1.300 -1.292 0.080 -1.307 0.095 -1.304 0.082 -1.294 0.094 -1.292 0.081 -1.300 0.088 -1.300 0.081 -1.296 0.089 -1.293 0.082
b12 0.300 0.300 0.021 0.313 0.026 0.307 0.024 0.302 0.025 0.302 0.025 0.314 0.026 0.307 0.023 0.303 0.021 0.304 0.022
?22,1 0.300 0.299 0.026 0.312 0.033 0.308 0.031 0.300 0.031 0.299 0.029 0.311 0.029 0.307 0.026 0.302 0.026 0.302 0.026
?22,2 0.150 0.150 0.020 0.147 0.026 0.147 0.025 0.150 0.026 0.151 0.025 0.147 0.024 0.147 0.021 0.149 0.024 0.149 0.020
1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1.
Appendix B: Appendix to Chapter 3
B.1 Theoretical Motivation for the Demand Equation
We consider the indirect utility function of Gorman polar form:
yi ? fi(p)
vi(p, y) = ,
g(p)
where yi denotes the consumer income (or wealth) and both the functions fi(p) and g(p) are ho-
mogeneous of degree one in p. In our context, i denotes the index of stations. Thus we implicitly
work with ?aggregated? indirect utility function and demand functions. The aggregation is at
census tract level on which station i resides.
In the following, we view g(p) as a price index for normalization. The logarithm version of
Roy?s identity implies that the demand schedule can be written as
hi(p, y) = ?
?vi(p, y) ?vi(p, y) ?fi(p) ?g(p)
/ = + (yi ? fi(p))/g(p).
?pi ?yi ?pi ?pi
Let ?fi(p) = ??i ,
?g(p) = ??, and view g(p) = p0 as a price index that does not vary with?pi ?pi
168
consumers. Then the above demand equation can be written equivalently as
hi(p, y) = ?
?
i + ?f
?
i (p) + ?
?yi, (B.1)
where f ?i (p) denotes the ?normalized? version of fi(p) which is linear in p. Above model accom-
modates log-log transformation in the following sense: note that
( ) ( )
ln hi(p, y) = ln ?
? ? ?
i + ?fi (p) + ? yi ,
and first order Taylor series expansion of above demand function at (p0, y0) can be written as
( )
ln hi(p, y) ?
?hi(p, y) | pi ? pi,0 ?hi(p, y) yi ? y0ln(hi(p0, y0)) + p
?p 0
+ |y0 .
i hi(p0, y0) ?yi hi(p0, y0)
Note further that first order linear approximation also implies ln(x) ? ln(x ) = x?x00 , we thenx0
have
( ) ?hi(p, y) ln(pi)? ln(pi,0) ?hi(p, y) ln(yi)? ln(yi,0)
ln hi(p, y) ? ln(hi(p0, y0)) + |p0 + |y .?p 0i hi(p0, y0)/pi,0 ?yi hi(p0, y0)/yi,0
(B.2)
In spirit of Pinkse and Slade (2004)?s modeling assumptions, we also assume ?hi(p,y) | pi,0
?p pi 0 hi(p0,y0)
to be a nonlinear function of p. For?simplicity, we assume such nonlinear functions can be
approximated by some series function T 1t Wtp. In the context of spatial markets, one may take
wij,p to denote some distance measure between firms i and j. In the context of Pinkse and Slade
(2004) context, i is the brand index in the sample and the distance measure could be a function
1This formulation shares spirit of semi-parametric estimators. See Pinkse et al. (2002) for an example on spatial
competition, among others.
169
of alcohol content of each product/brand. Since ?hi(p,y) = ?? and thus let ?hi(p,y) | yi,0
?y yi ?yi 0
=
hi(p0,y0)
? y? i,0 . For simplicity, we assume ? y? = ? i,0 to be a constant. Thus the coefficients of
hi(p0,y0) hi(p0,y0)
(log) aggregate income ln(yi) does not depend on proximity measures or individual stations.
In summary, in light of (B.2), we could formulate the log-demand equation for station i as
?
ln(qi) = ?i + ? wijln(pj) + ?ln(yi),
j
where ?i may vary between stations and thus reflect station-specific effects/characteristics.
B.2 Edgeworth Cycle
B.2.1 Retail Margins
Panel (a) of Figure B.1 and B.2 plot the retail margin, defined as the station level retail price
minus the rack price of corresponding wholesale outlet.2 The margin plot does not show strong
evidence for the presence of Edgeworth cycle during this period. Panel (b) and (c) of Figure B.1
and B.2 plot the retail margin calculated by retail price on day t minus spot rack price on day
t? 5 and t? 10, respectively, since station often maintain inventory that was purchased/ordered
one to two weeks ago. These two figures still do not show strongly the asymmetric fluctuations
in prices that mark an Edgeworth cycle.
2For stations without a supplier agreement with either Shell or Suncor, we use the average rack price of these two
outlets.
170
(a)
(b)
(c)
Figure B.1: Average Retail Margin Computed with Spot Rack Price, Rack Price of 5 and 10
Days? Lead, Aug - Nov, 2019
171
(a)
(b)
(c)
Figure B.2: Average Retail Margin Computed with Spot Rack Price, Rack Price of 5 and 10
Days? Lead, Feb - Apr, 2020
172
B.2.2 Markov Switching Regression (MSR)
To further diagnose the possible presence of Edgeworth cycles, we fit the daily prices and
margins of Vancouver market to a Dynamic Markov Switching model of two states, in spirit to
Noel (2007) and Noel (2007). Table B.1 and Table B.2 document the estimation results for the
Markov Switching model and the estimated expected duration for the relenting phase (State 1)
and the undercutting phase (State 2). Table B.3 reports the switching probabilities between State
1 and State 2 for both periods.
Generically, the model can be written as
yt = ?st +Xt? + Zt?s + ?t,
where yt is the dependent variable, Xt is the matrix of exogenous variables with state invariant
coefficients ? and Zt is the matrix of exogenous variables with state-dependent coefficients ?s,
?st is the state-dependent intercept. In the current context, the dependent variable are set to be
the average station prices (in Canadian cents) at time t (specification (1)), average station level
retail margins with spot rack prices (specification (2)), and average station level retail margins
with rack prices of 10 days lead (specification (3)). For each specification, Xt includes the first
and second order lags of the dependent variables, to account for potential trend and momentum
in dependent variables that are not state-dependent. We do allow for state dependent intercept
and volatility of ?t.
The estimated switching probabilities (Table B.3) show that price tend to stay in the current
state, i.e., continue to be in State 1 in period t + 1 if the process is in State 1 in period t. This is
in contrast to the results reported in, e.g., Noel (2007) and Noel (2007) where the probabilities of
173
moving from relenting phase (State 1) to undercutting phase (State 2) are generally above 90%
while that of moving from undercutting phase (State 2) to relenting phase (State 1) are often
less than 10%. In other words, their results are consistent with the asymmetric pattern of price
undercutting and restoring that an Edgeworth cycle features (e.g., p11 is much lower than p12),
while our results are not. Together with the Figure B.1, we conclude that there is no strong
evidence of the presence of Edgeworth cycle during the sample period of this empirical study.
174
Table B.1: Within-Regime Estimates and Expected Duration in Days, September 2019
(1) (2) (3)
Main
L.price 0.387?
(0.177)
L2.price 0.163
(0.156)
L.Margin 0.060
(0.172)
L2.Margin -0.066
(0.164)
L.Margin (10 days lead) 0.386??
(0.130)
L2.Margin (10 days lead) 0.209?
(0.115)
State 1
Constant 68.576? 62.794?? 26.979??
(22.304) (14.296) (5.517)
?s1 0.881
?? 0.351 1.494??
(0.166) (0.232) (0.245)
State 2
Constant 71.825?? 62.790?? 26.407??
(23.260) (14.271) (5.331)
?s2 0.089 1.177
?? 0.094
(0.193) (0.198) (0.182)
Expected Duration State 1 29.522 16.586 2.834
Expected Duration State 2 25.229 18.585 5.664
Observations 33 33 33
1 Standard errors in parentheses , * (p?0.05), ** (p?0.01)
175
Table B.2: Within-Regime Estimates and Expected Duration in Days, September 2020
(1) (2) (3)
Main
L.price 0.146
(0.153)
L2.price -0.152
(0.136)
L.Margin 0.260
(0.169)
L2.Margin 0.082
(0.157)
L.Margin (10 days lead) 0.214
(0.166)
L2.Margin (10 days lead) 0.105
(0.164)
State 1
Constant 152.266?? 39.122?? 41.114??
(26.253) (11.155) (11.994)
? ?? ? ?s1 0.825 0.421 0.369
(0.163) (0.232) (0.221)
State 2
Constant 157.127?? 40.654?? 43.465??
(27.042) (11.681) (12.707)
?s2 0.150 1.275
?? 1.190??
(0.199) (0.183) (0.186)
Expected Duration State 1 23.395 15.586 7.834
Expected Duration State 2 21.210 12.585 4.664
Observations 31 31 31
1 Standard errors in parentheses , * (p?0.05), ** (p?0.01)
176
Table B.3: Switching Probabilities
(1) (2) (3)
September 2019
p11 (relenting ?? relenting) 0.966?? 0.940?? 0.647??
(0.042) (0.096) (0.176)
p12 (relenting ?? undercutting) 0.034 0.060 0.353?
(0.042) (0.096) (0.176)
p21 (undercutting ?? relenting) 0.040 0.054 0.177
(0.040) (0.089) (0.111)
p22 (undercutting ?? undercutting) 0.960?? 0.946?? 0.823??
(0.960) (0.089) (0.111)
March 2020
p11 (relenting ?? relenting) 0.969?? 0.956?? 0.953??
(0.038) (0.059) (0.064)
p12 (relenting ?? undercutting) 0.031 0.044 0.047
(0.038) (0.059) (0.064)
p21 (undercutting ?? relenting) 0.040 0.037 0.036
(0.052) (0.048) (0.043)
p22 (undercutting ?? undercutting) 0.960?? 0.963?? 0.964??
(0.052) (0.048) (0.043)
1 Standard errors in parentheses , * (p?0.05), ** (p?0.01)
B.3 Test for IV power
In this section, we present the heteroskedasticity-robust F-statistics used for the empirical
application. Some parts of the presentation follows closely that of Andrewsi et al. (2018).
For generality of the discussion, we consider the following linear instrumental variables (IV)
model with a single outcome variable Yn:
Yn = X1,n? +X2,n?1 + ?n, (B.3)
X1,n = Z1,n?+X2,n?2 + Vn, (B.4)
177
where X1,n is a n ?K1 matrix of (potentially) endogenous regressors, X2,n is a n ?K2 matrix
of exogenous regressors. and Z1,n is the n ? H matr(ix of in)struments.( In light )of above con-
str(uction, w)e maintai(n the follo)wing assumptions: E Z ?1,n?n = 0, E Z ?1,nVk,n = 0 for ?k,
E X ? ? = 0. E X ?2,n n 2,nVk,n = 0. We are interested in( estimating)? consistently, but X1,n is
potentially endogenous in the sense that we may have E ??nVk,n ?= 0 . Substituting for X1,n in
(B.3), we obtian the equation
Yn = Z
?
1,n? +X
?
2,n?3 + Un, (B.5)
with ? = ??. Following the common terminology, we refer to (B.3) as the structural form, (B.4)
as the first-stage, and (B.5) as the reduced-form. Let MX2,n = I ? X (X ? ?1 ?n 2,n 2,nX2,n) X2,n and
Q?ZZ|X2 =
1Z ?1,nMn X2,nZ1,n, the 2SLS estimator can be written as
( )?1
?? ? ?2SLS = ?? Q?ZZ|X2?? ?? Q?ZZ|X2 ??,
where ?? is the first stage O(LS estimator for the reduced)form parameter ?.
Let ???? = Q??1 ?ZZ|X Z MX ,n(V?
? ?1
nV? )MX ,nZ1,n Q? . Then the heteroskedasticity
2 1,n 2 n 2 ZZ|X2
robust F-statistics is given by
HFR = ??????1???? ? ?
2
H .
In the following table, we report the effective first-stage F-statistics proposed by Olea and
Pflueger (2013) that depends on a non-homoskedasticity robust estimate of the variance. Explic-
itly, the effective F-statistics is computed as
???eff Q?ZZ|XF = 2
??
.
tr(????Q?ZZ|X2)
178
Table B.4a: First-stage OLS Regression and tests for IV power
W1 W4
(1) (2) (3) (1) (2) (3)
No. Nb -0.041 -0.221?? No. Nb
(0.039) (0.038)
Avg. Nb Size 0.030 -0.010 Avg. Nb Size 0.065?? 0.081??
(0.043) (0.050) (0.020) (0.022)
No. Indirect Nb -0.142?? -0.151?? No. Indirect Nb
(0.017) (0.015)
Avg. Indirect Nb Size 0.104 0.104 Avg. Indirect Nb Size 0.056? 0.044?
(0.075) (0.082) (0.020) (0.021)
Nb. Car Wash -0.066?? -0.068?? Nb. Car Wash 0.034?? 0.040??
(0.022) (0.024) (0.008) (0.008)
Nb. Service 0.024? 0.022? Nb. Service 0.026?? 0.026??
(0.009) (0.010) (0.003) (0.003)
Suncor X Rack 0.163?? 0.164? 0.172?? Suncor X Rack 0.092?? 0.089? 0.101??
(0.052) (0.062) (0.056) (0.027) (0.032) (0.028)
Shell X Rack 0.055? 0.052 0.049 Shell X Rack 0.157?? 0.113?? 0.170??
(0.024) (0.029) (0.026) (0.033) (0.039) (0.035)
Dist Refinery 0.012?? 0.013?? 0.012?? Dist Refinery 0.145?? 0.141?? 0.147??
(0.002) (0.002) (0.002) (0.018) (0.021) (0.019)
Weak IV (F-stat) 36.486 30.290 40.804 Weak IV (F-stat) 46.538 30.658 48.847
?2 crit-val (5%) 20.530 18.370 19.860 ?2 crit-val (5%) 19.860 16.850 19.280
W2 W5
(1) (2) (3) (1) (2) (3)
No. Nb -0.081?? -0.129?? No. Nb -0.041 0.006
(0.020) (0.018) (0.037) (0.038)
Avg. Nb Size -0.034?? -0.037?? Avg. Nb Size -0.013 -0.068?
(0.005) (0.006) (0.027) (0.030)
No. Indirect Nb -0.050?? -0.069?? No. Indirect Nb 0.010 -0.003
(0.010) (0.008) (0.023) (0.022)
Avg. Indirect Nb Size -0.197?? -0.076 Avg. Indirect Nb Size 0.151?? 0.148??
(0.054) (0.056) (0.024) (0.026)
Nb. Car Wash -0.104?? -0.119?? Nb. Car Wash -0.016 -0.019
(0.012) (0.012) (0.011) (0.011)
Nb. Service 0.035?? 0.037?? Nb. Service -0.203?? -0.200??
(0.005) (0.005) (0.033) (0.035)
Suncor X Rack 0.137?? 0.167?? 0.080? Suncor X Rack 0.189?? 0.188?? 0.176??
(0.037) (0.044) (0.038) (0.030) (0.035) (0.030)
Shell X Rack 0.102?? 0.113?? 0.074? Shell X Rack -0.028 -0.042 -0.036
(0.026) (0.031) (0.028) (0.023) (0.028) (0.025)
Dist Refinery 0.044 0.269 0.215 Dist Refinery 0.141?? 0.151?? 0.142??
(0.200) (0.235) (0.217) (0.017) (0.020) (0.018)
Weak IV (F-stat) 59.806 36.789 55.517 Weak IV (F-stat) 27.797 24.030 31.931
?2 crit-val (5%) 20.530 18.370 19.860 ?2 crit-val (5%) 20.530 18.370 19.860
179
Table B.4b: First-stage OLS Regression and tests for IV power (continued)
W3 W6
(1) (2) (3) (1) (2) (3)
No. Nb -0.020?? -0.009?? No. Nb -0.041 0.006
(0.004) (0.003) (0.037) (0.038)
Avg. Nb Size 0.012 -0.084?? Avg. Nb Size -0.013 -0.068?
(0.011) (0.009) (0.027) (0.030)
No. Indirect Nb 0.008? -0.006? No. Indirect Nb 0.010 -0.003
(0.004) (0.003) (0.023) (0.022)
Avg. Indirect Nb Size -0.144 -0.050 Avg. Indirect Nb Size 0.151?? 0.148??
(0.099) (0.105) (0.024) (0.026)
Nb. Car Wash -0.033?? -0.033?? Nb. Car Wash -0.016 -0.019
(0.004) (0.004) (0.011) (0.011)
Nb. Service -0.052?? -0.050?? Nb. Service -0.203?? -0.200??
(0.011) (0.011) (0.033) (0.035)
Suncor X Rack 0.034? 0.027 0.039?? Suncor X Rack 0.189?? 0.188?? 0.176??
(0.013) (0.014) (0.012) (0.030) (0.035) (0.030)
Shell X Rack 0.104 -0.025 0.072 Shell X Rack -0.028 -0.042 -0.036
(0.062) (0.070) (0.065) (0.023) (0.028) (0.025)
Dist Refinery -0.315 0.153 -0.161 Dist Refinery 0.141?? 0.151?? 0.142??
(0.206) (0.235) (0.221) (0.017) (0.020) (0.018)
Weak IV (F-stat) 65.243 50.940 67.045 Weak IV (F-stat) 27.797 24.030 31.931
?2 crit-val (5%) 20.530 18.370 19.860 ?2 crit-val (5%) 20.530 18.370 19.860
1 Panels (1),(2),(3) correspond to different sets of IVs; Robust SEs are in parenthesis; * (p < 0.05), ** (p < 0.01).
2 We report effective fist-stage F-statistics based on Olea and Pflueger (2013) and Stock-Yogo weak ID test
critical values.
B.4 Additional Empirical Results
180
181
Table B.6: Estimation Results with W2,n (2-miles radius)
(1) (2) (3)
OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E
Demand
ln(price) -3.996?? -12.217?? -12.250?? -9.980? -9.989?? -9.274? -9.327?
(1.233) (3.751) (3.068) (3.870) (3.282) (3.720) (3.228)
W*ln(price) -7.301 1.821 1.928 1.145 1.147 0.138 0.258
(4.162) (1.336) (1.923) (1.707) (1.097) (1.543) (1.332)
Car wash 0.042 -0.261 -0.006 -0.200 -0.005 -0.082 -0.001
(0.113) (0.126) (0.132) (0.121) (0.119) (0.120) (0.118)
Service Station 0.047 0.038 -0.001 0.038 0.035 0.045 0.039
(0.036) (0.040) (0.042) (0.038) (0.038) (0.038) (0.038)
C-store Size -0.047 -0.036 -0.094? -0.038 -0.042 -0.046 -0.041
(0.035) (0.038) (0.040) (0.036) (0.036) (0.036) (0.036)
No. Drivers 0.053? 0.053? 0.052 0.041 0.058? 0.051? 0.047?
(0.023) (0.026) (0.028) (0.025) (0.024) (0.024) (0.025)
Med Income 0.020 -0.159 -0.123 -0.156 -0.029 -0.142 -0.023
(0.239) (0.266) (0.284) (0.254) (0.252) (0.251) (0.249)
Commute Dist. -1.814? -1.472 -1.533 -1.541 -1.581 -1.609 -1.607
(0.912) (1.027) (1.148) (0.976) (0.977) (0.965) (1.002)
Travel Mode -0.015? -0.012 -0.010 -0.010 -0.016? -0.009 -0.013?
(0.006) (0.007) (0.008) (0.006) (0.006) (0.006) (0.006)
?1 0.274 -15.533?? -15.522?? -8.751?? -8.743?? -7.084?? -7.008??
(2.540) (0.460) (0.694) (0.584) (1.232) (0.029) (1.367)
Period Dummy Yes Yes Yes
Sample size 302 302 302
R-square 0.15
1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), **
(p < 0.01). wij = 0.0588 if i and j are neighbors.
182
Table B.7: Estimation Results with W3,n (common boundary and reciprocal of the travel distance)
(1) (2) (3)
OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E
Demand
ln(price) -3.345? -6.944? -6.815?? -5.730? -5.712? -7.419?? -7.320??
(1.234) (2.806) (2.048) (2.055) (2.035) (2.856) (2.077)
W*ln(price) 1.633?? 1.317? 1.418? 0.749 1.438? 1.247? 1.326?
(0.565) (0.628) (0.558) (0.627) (0.539) (0.625) (0.475)
Car wash 0.022 -0.172 0.083 -0.012 0.053 -0.008 0.086
(0.110) (0.113) (0.111) (0.111) (0.110) (0.112) (0.111)
Service Station 0.048 0.049 0.032 0.050 0.031 0.050 0.032
(0.036) (0.036) (0.035) (0.035) (0.035) (0.036) (0.036)
C-store Size -0.042 -0.050 -0.031 -0.046 -0.030 -0.046 -0.031
(0.034) (0.034) (0.033) (0.033) (0.033) (0.034) (0.034)
No. Drivers 0.046 0.037 0.037 0.035 0.037 0.038 0.034
(0.022) (0.026) (0.028) (0.025) (0.028) (0.026) (0.027)
Med Income -0.041 -0.436 0.137 0.167 0.132 -0.052 0.192
(0.231) (0.250) (0.257) (0.242) (0.254) (0.247) (0.259)
Commute Dist. -1.585 -1.472 -1.441 -1.289 -1.407 -1.452 -1.267
(0.897) (1.060) (1.151) (1.032) (1.146) (1.039) (1.132)
Travel Mode -0.012? -0.004 -0.017? -0.013 -0.018? -0.009 -0.017?
(0.006) (0.007) (0.008) (0.007) (0.008) (0.007) (0.008)
?1 2.430 2.572?? 4.227?? 2.739?? 4.373?? 2.327? 4.167??
(1.476) (0.806) (1.040) (0.054) (1.338) (0.959) (0.648)
Period Dummy Yes Yes Yes
Sample size 302 302 302
R-square 0.17
1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), **
(p < 0.01).
183
Table B.8: Estimation Results with W4,n (nearest neighbor)
(1) (2) (3)
OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E
Demand
ln(price) 3.688 -6.814? -6.833? -4.393 -4.176 -11.579?? -11.358??
(2.553) (2.964) (2.938) (2.792) (2.359) (3.140) (3.074)
W*ln(price) 0.630 6.479? 6.403? 4.374? 4.369? 10.545?? 9.444??
(1.506) (2.252) (2.301) (1.907) (1.776) (2.962) (3.023)
Car wash 0.037 0.111 0.110 0.093 -0.102 0.229? -0.273?
(0.112) (0.118) (0.147) (0.118) (0.143) (0.113) (0.122)
Service Station 0.048? 0.069? 0.060? 0.064? 0.058? 0.072? 0.082
(0.023) (0.027) (0.026) (0.027) (0.028) (0.033) (0.040)
C-store Size -0.047 -0.045 -0.059 -0.047 0.011 -0.049 -0.002
(0.035) (0.035) (0.040) (0.035) (0.042) (0.036) (0.062)
No. Drivers 0.056? 0.070?? 0.065? 0.068? 0.082?? 0.078?? 0.100??
(0.022) (0.023) (0.023) (0.023) (0.024) (0.023) (0.022)
Med Income -0.008 0.042 -0.020 0.036 -0.349 0.013 0.025
(0.214) (0.334) (0.389) (0.407) (0.439) (0.436) (0.491)
Commute Dist. -1.878? -2.063? -2.095? -2.069? -1.988? -2.052? -1.994?
(0.903) (0.923) (0.923) (0.938) (0.831) (0.927) (0.707)
Travel Mode -0.016 -0.016? -0.047? -0.017? -0.029 -0.013? -0.080??
(0.006) (0.006) (0.016) (0.006) (0.016) (0.006) (0.025)
? 0.124?1 0.005 0.335 0.032 0.337 -0.040 0.336
(0.061) (1.813) (2.599) (1.116) (1.557) (1.076) (1.590)
Period Dummy Yes Yes Yes
Sample size 302 302 302
R-square 0.08
1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), **
(p < 0.01).
184
Table B.9: Estimation Results with W5,n (common street)
(1) (2) (3)
OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E
Demand
ln(price) -3.945?? -7.213? -7.078? -6.786? -6.679?? -7.008? -6.980?
(1.225) (2.646) (2.369) (2.697) (2.159) (3.094) (2.864)
W*ln(price) 0.654 1.967?? 1.349? 1.359? 1.331? 1.201? 1.405?
(0.294) (0.512) (0.529) (0.509) (0.540) (0.513) (0.519)
Car wash 0.042 -0.034 -0.102 -0.026 -0.101 -0.080 -0.103
(0.111) (0.114) (0.122) (0.113) (0.121) (0.114) (0.131)
Service Station 0.047 0.043 0.066 0.046 0.065 0.039 0.066
(0.036) (0.037) (0.036) (0.037) (0.036) (0.037) (0.036)
C-store Size -0.046 -0.043 -0.064 -0.040 -0.063 -0.047 -0.064
(0.035) (0.035) (0.034) (0.035) (0.034) (0.035) (0.034)
No. Drivers 0.051? 0.048? 0.056? 0.051? 0.058? 0.050? 0.056?
(0.023) (0.024) (0.024) (0.024) (0.024) (0.024) (0.023)
Med Income -0.057 -0.066 -0.147 -0.058 -0.136 -0.074 -0.151
(0.236) (0.245) (0.248) (0.245) (0.254) (0.245) (0.250)
Commute Dist. -1.746 -1.599 -1.961? -1.625 -2.048? -1.635 -1.971?
(0.907) (0.981) (0.971) (0.984) (0.960) (0.982) (0.948)
Travel Mode -0.014? -0.018? -0.015? -0.017? -0.015? -0.014? -0.015?
(0.006) (0.006) (0.006) (0.006) (0.006) (0.006) (0.006)
? ??1 0.272 0.109 0.098 0.118 0.100 0.110 0.098
(0.055) (0.410) (0.322) (0.388) (0.321) (0.435) (0.320)
Period Dummy Yes Yes Yes
Sample size 302 302 302
R-square 0.15
1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), **
(p < 0.01).
185
Table B.10: Estimation Results with W6,n (hybrid measure of common street and travel distance)
(1) (2) (3)
OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E
ln(price) -3.946?? -8.875?? -9.112?? -6.286?? -6.428?? -7.692? -7.698?
(1.225) (2.526) (2.303) (2.608) (2.043) (3.008) (2.746)
W*ln(price) 0.650? 0.129 0.135 0.648? 0.614? 0.609 0.625
(0.299) (0.329) (0.332) (0.321) (0.331) (0.327) (0.335)
Car wash 0.042 -0.147 -0.047 -0.195 -0.279? -0.688?? -0.317?
(0.111) (0.118) (0.120) (0.116) (0.118) (0.124) (0.127)
Service Station 0.047 0.041 0.040 0.048 0.045 0.055 0.035
(0.036) (0.038) (0.038) (0.038) (0.037) (0.040) (0.038)
C-store Size -0.046 -0.043 -0.047 -0.048 -0.029 -0.082? -0.019
(0.035) (0.036) (0.036) (0.036) (0.036) (0.038) (0.036)
No. Drivers 0.051? 0.040 0.048? 0.055? 0.091?? 0.051? 0.072??
(0.023) (0.024) (0.024) (0.024) (0.024) (0.025) (0.023)
Med Income -0.057 -0.130 -0.071 -0.130 0.119 -0.479 0.071
(0.236) (0.250) (0.253) (0.246) (0.251) (0.261) (0.269)
Commute Dist. -1.747 -1.504 -1.574 -1.624 -3.185?? -1.683 -2.474??
(0.907) (0.980) (0.967) (0.968) (0.954) (1.011) (0.974)
Travel Mode -0.014? -0.020?? -0.006 -0.022?? -0.017? -0.009 -0.014?
(0.006) (0.006) (0.006) (0.006) (0.006) (0.006) (0.006)
?1 0.157?? 0.046 0.088 0.052 0.062 0.012 0.049
(0.040) (0.408) (0.354) (0.367) (0.343) (0.412) (0.395)
Period Dummy Yes Yes Yes
Sample size 302 302 302
R-square 0.15
1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), **
(p < 0.01).
186
Table B.11: Definition of Regression Variables
ln(volume) log of monthly sales volume in liters
ln(price) log of monthly average price in Canadian cents
Car wash 1 if the station provides car wash, 0 otherwise
Service Station 1 if the station has service station, 0 otherwise
No. Drivers number of car drivers live on the census tract that a station is located (1 unit = 100 people)
Med Income median income of the residents live on the census tract that a station is located on (unit = 10, 000 canadian dollars)
Commute Dist. index of commute distance (between 0 and 1)
Travel Mode index of transportation mode (between 0 and 1)
No. Nb number of neighboring stations
Avg. Nb Size average size of neighboring stations measured by the number of pumps
C-store Size/No. Pump size of C-store (square meters)/number of pumps
Suncor X Rack Dummy of Suncor?s presence in neighborhood interacted with monthly average rack price
Shell X Rack Dummy of Shell?s presence in neighborhood interacted with monthly average rack price
Dist Refinery Distance to closed refineries by brands (in kilometers)
187
B.5 Impact Measures
Table B.12: Impact Measures
W3 W4
??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E
Direct Indirect Direct Indirect Direct Indirect Direct Indirect
Demand Side Var.
Car wash -0.0433 -0.0463 -0.0178 -0.0207 -0.0127 -0.0347 -0.0128 -0.0789
Service Station -0.0266 -0.0302 -0.0282 -0.0318 0.0388 0.0112 0.0306 -0.0523
C-store Size -0.0598 -0.0608 -0.0399 -0.0410 -0.0428 -0.0404 -0.0559 -0.0486
No. Drivers 0.0371 0.0371 0.0373 0.0373 0.0701 0.0701 0.0646 0.0646
Med Income -0.4364 -0.4364 0.1374 0.1374 0.0423 0.0423 -0.0195 -0.0195
W5 W6
??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E
Direct Indirect Direct Indirect Direct Indirect Direct Indirect
Demand Side Var.
Car wash -0.0339 -0.0613 -0.0403 -0.0675 -0.0508 -0.0832 -0.0407 -0.0731
Service Station 0.0190 -0.0022 0.0423 0.0214 0.0128 -0.0122 0.0120 -0.0130
C-store Size -0.0452 -0.0470 -0.0663 -0.0681 -0.0458 -0.0480 -0.0494 -0.0516
No. Drivers 0.0477 0.0477 0.0556 0.0556 0.0405 0.0405 0.0479 0.0479
Med Income -0.0657 -0.0657 -0.1470 -0.1470 -0.1295 -0.1295 -0.0707 -0.0707
1 Specifications correspond to estimates obtained with W3 - W6, respectively.
2 Estimates are based on regression specification (1).
2 Direct refers to the the ?Average Total Direct Impact?; Indirect refers to the ?Average Total Impact from an Observation?.
B.6 Test for Network Dependence
Table B.13: Test for Network Dependence
W1 W2 W3
I2y 165.28 59.31 400.10
Critical Value 14.07 21.03 19.68
W4 W5 W6
I2y 53.52 46.72 19.47
Critical Value 14.07 19.68 19.68
1 Critical value based on 0.01 significant level.
The following test statistics is derived based on Liu and Prucha (2018) and is a special case
of that presented in their Theorem 2.
Consider the following linear model
y = Z? + u,
where Z contains the exogenous and endogenous regressors. The null hypothesis is given by
Hy0 :Ey = X? and cov(y) is diagonal.
Note that X denotes only the exogenous variables and thus under Hy0 there is no network gener-
ated dependence.
Denote the IV/GMM estimator for the parameters ? of above model as ??. In our imple-
mentation, we used LQ-GS2SLS estimates. Denote Z? = X(X ?X)?1X ?Z, ?? = diag(u?2i ),
??k = diag(u?i??ik), and ??kl = diag(??ik??il), where u?i is the i-th element of u? = y ? Z?? and
188
??ik is the (i, k)-th element of E? = Z ? Z?.
Let W = (W +W ?)/2,
?? ? ? ?
????
V? Y ?
???? ?
?? ??Y Y ??Y Z ??Y U? ????
V? = ?? Z , ?? = Y Z ? ZZ ZU ,? V? ??? ???? (?? ) ?? ?? ????
V? U (??Y U)? (??ZU)? ??UU
where
V? Y =u??Wy, V? Z = (u??WZ)?, V? U = u??Wu??,
?K ?K ?K
??Y Y =2tr(W ??W ??) + 2 ??ktr(W ??W ??
?
k +W ??W ??k) + ??k??ltr(W ??kW ??l +W ??klW ??)
k=1 k=1 l=1
?+ ???Z? ?W ?MZ???MZ?WZ???,
?Y Z =? ?K2tr(W ??W ?? ) + ?? tr(W ?? W ?? +W ?? W ???)?
?
l k k l kl
k=1
l=1,...,K
+ ???Z? ?W ?MZ???MZ?WZ??K
??Y U =2tr(W ??W ??) + 2 ??ktr(W ??kW ??),
[ k=1 ]
??ZZ = [tr(W ??
? ? ?
kW ??l +]W ??klW ??) + Z? W MZ???MZ?WZ?,k,l=1,...,K
??ZU = 2tr(W ??kW ??) ,
k=1,...,K
??UU =2tr(W ??W ??),
and K denotes the number of columns in Z and MZ? = In ? Z?(Z? ?Z?)?1Z? ?. In their Theorem 2,
Liu and Prucha (2018) showed that 1 ?? is a consistent estimator for the VC matrix of n?1/2V? ,
n
189
and the generalized test statistic I2y is now given by
I2y = (LV? )?(L??L?)?1(LV? ), (B.6)
where L is a selector matrix such that L??L? is nonsingular. Their Theorem 2 implies I2 ??dy
?2(rank(L)). Note that the degree of freedom of the ?2 distribution is given by rank(L).
190
Bibliography
[1] Daron Acemoglu, Suresh Naidu, Pascual Restrepo, and James A Robinson. Democracy
does cause growth. Journal of political economy, 127(1):47?100, 2019.
[2] David R Agrawal. The tax gradient: Spatial aspects of fiscal competition. American
Economic Journal: Economic Policy, 7(2):1?29, 2015.
[3] Isaiah Andrewsi, James Stocki, and Liyang Sun. Weak instruments in iv regression: The-
ory and practice. 2018.
[4] Luc Anselin. Spatial econometrics: methods and models, volume 4. Springer Science &
Business Media, 1988.
[5] Luc Anselin. Thirty years of spatial econometrics. Papers in regional science, 89(1):3?25,
2010.
[6] Luc Anselin and Raymond JGM Florax. New directions in spatial econometrics: Intro-
duction. In New directions in spatial econometrics, pages 3?18. Springer, 1995.
[7] Luc Anselin et al. Spatial econometrics. A companion to theoretical econometrics, 310330,
2001.
[8] Irani Arraiz, David M Drukker, Harry H Kelejian, and Ingmar R Prucha. A spatial cliff-
ord-type model with heteroskedastic innovations: Small and large sample results. Journal
of Regional Science, 50(2):592?614, 2010.
[9] Andrea Ascani, Alessandra Faggian, and Sandro Montresor. The geography of covid-19
and the structure of local economies: The case of italy. Journal of Regional Science, 61
(2):407?441, 2021.
[10] Benjamin Atkinson. On retail gasoline pricing websites: potential sample selection biases
and their implications for empirical research. Review of Industrial Organization, 33(2):
161?175, 2008.
[11] Benjamin Atkinson. Retail gasoline price cycles: Evidence from guelph, ontario using
bi-hourly, station-specific retail price data. The Energy Journal, 30(1), 2009.
[12] Benjamin Atkinson, Andrew Eckert, and Douglas S West. Daily price cycles and constant
margins: Recent events in canadian gasoline retailing. The Energy Journal, 35(3), 2014.
[13] Coralio Ballester, Antoni Calvo?-Armengol, and Yves Zenou. Who?s who in networks.
wanted: The key player. Econometrica, 74(5):1403?1417, 2006.
191
[14] Badi H Baltagi and Georges Bresson. Maximum likelihood estimation and lagrange mul-
tiplier tests for panel seemingly unrelated regressions with spatial lag and spatial errors:
An application to hedonic housing prices in paris. Journal of Urban Economics, 69(1):
24?42, 2011.
[15] Badi H Baltagi and Ying Deng. Ec3sls estimator for a simultaneous system of spatial
autoregressive equations with random effects. Econometric Reviews, 34(6-10):659?694,
2015.
[16] Badi H Baltagi, Peter Egger, and Michael Pfaffermayr. Estimating models of complex fdi:
Are there third-country effects? Journal of econometrics, 140(1):260?281, 2007.
[17] Badi H Baltagi, Peter Egger, and Michael Pfaffermayr. Estimating regional trade agree-
ment effects on fdi in an interdependent world. Journal of Econometrics, 145(1-2):194?
208, 2008.
[18] Melissa Bantle, Matthias Muijs, and Ralf Dewenter. A new price test in geographic market
definition: An application to german retail gasoline market. Technical report, Diskussion-
spapier, 2018.
[19] John M Barron, Beck A Taylor, and John R Umbeck. Number of sellers, average prices,
and price dispersion. International Journal of Industrial Organization, 22(8-9):1041?
1066, 2004.
[20] John M Barron, John R Umbeck, and Glen R Waddell. Consumer and competitor reactions:
Evidence from a field experiment. International Journal of Industrial Organization, 26(2):
517?531, 2008.
[21] Kristian Behrens, Cem Ertur, and Wilfried Koch. ?dual?gravity: Using spatial econo-
metrics to control for multilateral resistance. Journal of Applied Econometrics, 27(5):
773?794, 2012.
[22] Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equilibrium.
Econometrica: Journal of the Econometric Society, pages 841?890, 1995.
[23] Bruce A Blonigen, Ronald B Davies, Glen R Waddell, and Helen T Naughton. Fdi in
space: Spatial autoregressive relationships in foreign direct investment. European eco-
nomic review, 51(5):1303?1325, 2007.
[24] Lawrence E Blume, William A Brock, Steven N Durlauf, and Yannis M Ioannides. Identifi-
cation of social interactions. In Handbook of social economics, volume 1, pages 853?964.
Elsevier, 2011.
[25] Phillip Bonacich. Power and centrality: A family of measures. American journal of
sociology, 92(5):1170?1182, 1987.
[26] James M Brundy and Dale W Jorgenson. Efficient estimation of simultaneous equations
by instrumental variables. The Review of Economics and Statistics, pages 207?224, 1971.
192
[27] Antoni Calvo?-Armengol, Eleonora Patacchini, and Yves Zenou. Peer effects and social
networks in education. The review of economic studies, 76(4):1239?1267, 2009.
[28] G. Clemenz and K. Gugler. Locational choice and price competition: Some empirical
results for the austrian retail gasoline market. Empirical Economics, 31:291 ? 312, 2006.
[29] Andrew Cliff and J.K. Ord. Spatial Autocorrelation. London:Pion, 1973.
[30] Andrew Cliff and J.K. Ord. Spatial Processes, Models and Applications. London:Pion,
1981.
[31] Ethan Cohen-Cole, Xiaodong Liu, and Yves Zenou. Multivariate choices and identification
of social interactions. Journal of Applied Econometrics, 33(2):165?178, 2018.
[32] Timothy G Conley and Ethan Ligon. Economic distance and cross-country spillovers.
Journal of Economic Growth, 7(2):157?187, 2002.
[33] Phoebus J Dhrymes and John Guerard. Introductory econometrics, volume 4. Springer,
1978.
[34] Joseph Doyle, Erich Muehlegger, and Krislert Samphantharak. Edgeworth cycles revisited.
Energy Economics, 32(3):651?660, 2010.
[35] David M Drukker, Peter H Egger, and Ingmar R Prucha. Simultaneous equations models
with higher-order spatial or social network interactions. Econometric Theory, pages 1?48,
2022.
[36] Andrew Eckert. Retail price cycles and the presence of small firms. International Journal
of Industrial Organization, 21(2):151?170, 2003.
[37] Andrew Eckert. Empirical studies of gasoline retailing: A guide to the literature. Journal
of Economic Surveys, 27(1):140?166, 2013.
[38] Andrew Eckert and Douglas S West. Retail gasoline price cycles across spatially dispersed
gasoline stations. The Journal of Law and Economics, 47(1):245?273, 2004.
[39] Andrew Eckert and Douglas S West. Price uniformity and competition in a retail gasoline
market. Journal of Economic Behavior & Organization, 56(2):219?237, 2005.
[40] Cem Ertur and Wilfried Koch. Growth, technological interdependence and spatial exter-
nalities: theory and evidence. Journal of applied econometrics, 22(6):1033?1062, 2007.
[41] Cem Ertur and Antonio Musolesi. Weak and strong cross-sectional dependence: A panel
data analysis of international technology diffusion. Journal of Applied Econometrics, 32
(3):477?503, 2017.
[42] Ying Fan. Ownership consolidation and product characteristics: A study of the us daily
newspaper market. American Economic Review, 103(5):1598?1628, 2013.
193
[43] Bernard Fingleton and Nikodem Szumilo. Simulating the impact of transport infrastructure
investment on wages: a dynamic spatial panel model approach. Regional Science and
Urban Economics, 75:148?164, 2019.
[44] Roberto Gallardo, Brian Whitacre, Indraneel Kumar, and Sreedhar Upendram. Broadband
metrics and job productivity: a look at county-level data. The Annals of Regional Science,
66(1):161?184, 2021.
[45] Heather D Gibson, Stephen G Hall, Pavlos Petroulas, Vassilis Spiliotopoulos, and George
S Tavlas. The effect of emergency liquidity assistance (ela) on bank lending during the
euro area crisis. Journal of International Money and Finance, 108:102154, 2020.
[46] Georg Go?tz and Klaus Gugler. Market concentration and product variety under spatial
competition: Evidence from retail gasoline. Journal of Industry, Competition and Trade,
6(3-4):225?234, 2006.
[47] Sebastian Hauptmeier, Ferdinand Mittermaier, and Johannes Rincke. Fiscal competition
over taxes and public inputs. Regional science and urban economics, 42(3):407?419,
2012.
[48] Jerry A Hausman. An instrumental variable approach to full information estimators for lin-
ear and certain nonlinear econometric models. Econometrica: Journal of the Econometric
Society, pages 727?738, 1975.
[49] David F Hendry. The structure of simultaneous equations estimators. Journal of Econo-
metrics, 4(1):51?88, 1976.
[50] Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university press, 1985.
[51] Daniel S Hosken, Robert S McMillan, and Christopher T Taylor. Retail gasoline pricing:
What do we know? International Journal of Industrial Organization, 26(6):1425?1436,
2008.
[52] Jean-Franc?ois Houde. Spatial differentiation and vertical mergers in retail markets for
gasoline. American Economic Review, 102(5):2147?82, 2012.
[53] Jonathan Hughes, Christopher R Knittel, and Daniel Sperling. Evidence of a shift in the
short-run price elasticity of gasoline demand. The Energy Journal, 29(1), 2008.
[54] P Wilner Jeanty, Mark Partridge, and Elena Irwin. Estimation of a spatial simultaneous
equation model of population migration and housing price dynamics. Regional Science
and Urban Economics, 40(5):343?352, 2010.
[55] Harry H Kelejian and Ingmar R Prucha. A generalized spatial two-stage least squares
procedure for estimating a spatial autoregressive model with autoregressive disturbances.
The Journal of Real Estate Finance and Economics, 17(1):99?121, 1998.
[56] Harry H Kelejian and Ingmar R Prucha. A generalized moments estimator for the au-
toregressive parameter in a spatial model. International economic review, 40(2):509?533,
1999.
194
[57] Harry H Kelejian and Ingmar R Prucha. Estimation of simultaneous systems of spatially
interrelated cross sectional equations. Journal of econometrics, 118(1-2):27?50, 2004.
[58] Harry H Kelejian and Ingmar R Prucha. Specification and estimation of spatial autoregres-
sive models with autoregressive and heteroskedastic disturbances. Journal of economet-
rics, 157(1):53?67, 2010.
[59] Harry H Kelejian, Ingmar R Prucha, and Yevgeny Yuzefovich. Instrumental variable esti-
mation of a spatial autoregressive model with autoregressive disturbances: Large and small
sample results. In Spatial and spatiotemporal econometrics. Emerald Group Publishing
Limited, 2004.
[60] Frank Kleibergen and Richard Paap. Generalized reduced rank tests using the singular
value decomposition. Journal of econometrics, 133(1):97?126, 2006.
[61] Guido M Kuersteiner and Ingmar R Prucha. Dynamic spatial panel models: Networks,
common shocks, and sequential exogeneity. Econometrica, 88(5):2109?2146, 2020.
[62] Jim Lee and Yuxia Huang. Covid-19 impact on us housing markets: evidence from spatial
regression models. Spatial Economic Analysis, pages 1?21, 2022.
[63] Lung-fei Lee. Best spatial two-stage least squares estimators for a spatial autoregressive
model with autoregressive disturbances. Econometric Reviews, 22(4):307?335, 2003.
[64] Lung-fei Lee. Gmm and 2sls estimation of mixed regressive, spatial autoregressive models.
Journal of Econometrics, 137(2):489?514, 2007.
[65] Lung-Fei Lee. Identification and estimation of econometric models with group interac-
tions, contextual factors and fixed effects. Journal of Econometrics, 140(2):333?374,
2007.
[66] James LeSage and Robert Kelley Pace. Introduction to spatial econometrics. Chapman
and Hall/CRC, 2009.
[67] Laurence Levin, Matthew S Lewis, and Frank A Wolak. High frequency evidence on the
demand for gasoline. American Economic Journal: Economic Policy, 9(3):314?47, 2017.
[68] Xu Lin and Lung-fei Lee. Gmm estimation of spatial autoregressive models with unknown
heteroskedasticity. Journal of Econometrics, 157(1):34?52, 2010.
[69] Xiaodong Liu. Identification and efficient estimation of simultaneous equations network
models. Journal of Business & Economic Statistics, 32(4):516?536, 2014.
[70] Xiaodong Liu. Simultaneous equations with binary outcomes and social interactions.
Econometric Reviews, 38(8):921?937, 2019.
[71] Xiaodong Liu. Gmm identification and estimation of peer effects in a system of simulta-
neous equations. Journal of Spatial Econometrics, 1(1):1?27, 2020.
195
[72] Xiaodong Liu and Ingmar R Prucha. A robust test for network generated dependence.
Journal of econometrics, 207(1):92?113, 2018.
[73] Xiaodong Liu and Paulo Saraiva. Gmm estimation of spatial autoregressive models in a
system of simultaneous equations with heteroskedasticity. Econometric Reviews, 38(4):
359?385, 2019.
[74] Alexander MacKay and Nathan Miller. Estimating models of supply and demand: Instru-
ments and covariance restrictions. Available at SSRN 3025845, 2021.
[75] Jan R Magnus and Heinz Neudecker. Matrix differential calculus with applications in
statistics and econometrics. John Wiley & Sons, 2019.
[76] Mark D Manuszak. Predicting the impact of upstream mergers on downstream markets
with an application to the retail gasoline industry. International Journal of Industrial
Organization, 28(1):99?111, 2010.
[77] Shohei Nakamura and Paolo Avner. Spatial distributions of job accessibility, housing rents,
and poverty: The case of nairobi. Journal of Housing Economics, 51:101743, 2021.
[78] Aviv Nevo. Measuring market power in the ready-to-eat cereal industry. Econometrica,
69(2):307?342, 2001.
[79] Xiaoming Ning and Robert Haining. Spatial pricing in interdependent markets: a case
study of petrol retailing in sheffield. Environment and Planning A, 35(12):2131?2159,
2003.
[80] Michael D Noel. Edgeworth price cycles, cost-based pricing, and sticky pricing in retail
gasoline markets. The Review of Economics and Statistics, 89(2):324?334, 2007.
[81] Michael D Noel. Edgeworth price cycles: Evidence from the toronto retail gasoline market.
The Journal of Industrial Economics, 55(1):69?92, 2007.
[82] Michael D Noel. Edgeworth price cycles and focal prices: Computational dynamic markov
equilibria. Journal of Economics & Management Strategy, 17(2):345?377, 2008.
[83] Jose? Luis Montiel Olea and Carolin Pflueger. A robust test for weak instruments. Journal
of Business & Economic Statistics, 31(3):358?369, 2013.
[84] Sung Y Park and Guochang Zhao. An estimation of us gasoline demand: A smooth time-
varying cointegration approach. Energy Economics, 32(1):110?120, 2010.
[85] Dieter Pennerstorfer. Spatial price competition in retail gasoline markets: evidence from
austria. The Annals of Regional Science, 43(1):133?158, 2009.
[86] Dieter Pennerstorfer and Christoph Weiss. Spatial clustering and market power: Evidence
from the retail gasoline market. Regional Science and Urban Economics, 43(4):661?675,
2013.
196
[87] Jordi Perdiguero and Joan-Ramon Borrell. Driving competition in local gasoline markets.
Document de Treball No. XREAP2012-04, 2012.
[88] Joris Pinkse and Margaret E Slade. Contracting in space: An application of spatial statistics
to discrete-choice models. Journal of Econometrics, 85(1):125?154, 1998.
[89] Joris Pinkse and Margaret E Slade. Mergers, brand competition, and the price of a pint.
European Economic Review, 48(3):617?643, 2004.
[90] Joris Pinkse, Margaret E Slade, and Craig Brett. Spatial price competition: a semipara-
metric approach. Econometrica, 70(3):1111?1153, 2002.
[91] Ingmar R Prucha and Harry H Kelejian. The structure of simultaneous equation estima-
tors: A generalization towards nonnormal disturbances. Econometrica: Journal of the
Econometric Society, pages 721?736, 1984.
[92] Anindya Sen. Higher prices at canadian gas pumps: international crude oil prices or local
market concentration? an empirical investigation. Energy Economics, 25(3):269?288,
2003.
[93] Anindya Sen. Does increasing the market share of smaller firms result in lower prices?
empirical evidence from the canadian retail gasoline industry. Review of Industrial Orga-
nization, 26(4):371?389, 2005.
[94] Margaret E Slade. Vancouver?s gasoline-price wars: An empirical exercise in uncovering
supergame strategies. The Review of Economic Studies, 59(2):257?276, 1992.
[95] Margaret E Slade. Strategic motives for vertical separation: evidence from retail gasoline
markets. Journal of Law, Economics, & Organization, pages 84?113, 1998.
[96] Shawn W Ulrick, Seth B Sacher, Paul R Zimmerman, and John M Yun. Defining geo-
graphic markets with willingness-to-travel circles. Supreme Court Economic Review, 28
(1):241?284, 2020.
[97] Wim Van Meerbeeck. Competition and local market conditions on the belgian retail gaso-
line market. De Economist, 151(4):369?388, 2003.
[98] Luya Wang, Kunpeng Li, and Zhengwei Wang. Quasi maximum likelihood estimation for
simultaneous spatial autoregressive models. 2014.
[99] Peter Whittle. On stationary processes in the plane. Biometrika, pages 434?449, 1954.
[100] Kai Yang and Lung-fei Lee. Identification and qml estimation of multivariate and simulta-
neous equations spatial autoregressive models. Journal of Econometrics, 196(1):196?214,
2017.
[101] Kai Yang and Lung-fei Lee. Identification and estimation of spatial dynamic panel simul-
taneous equations models. Regional Science and Urban Economics, 76:32?46, 2019.
197