ABSTRACT Title of Dissertation: LIVE AND FIVE ESTIMATION OF SIMULTANEOUS EQUATIONS MODELS WITH HIGHER-ORDER SPATIAL AND SOCIAL INTERACTIONS Jiankun Chen Doctor of Philosophy, 2022 Dissertation Directed by: Professors Ingmar Prucha and Andrew Sweeting Department of Economics The first part of the dissertation introduces a new class of limited and full information GMM estimators for simultaneous equation systems (SEM) with network interdependence modeled by Cliff-Ord type spatial lags (Cliff and Ord (1973, 1981)). We consider the same model specifi- cation as that in Drukker, Egger, and Prucha (2022) and allow for higher order spatial lags in the dependent variables, the exogenous variables and the disturbances. The network is defined in terms of a measure of proximity and can accommodate a wide class of dependence structures that may appear in both micro and macro economic settings. We show that the scores of the log-likelihood function can be viewed as a weighted sum of linear and quadratic components that motivate valid moment conditions. One contribution of this dissertation is showing that the linear moments can be written to permit instrumental variable (IV) interpretation, extending on the existing results in the context of classical SEMs. Towards constructing the linear moments, the instruments exploit the nonlinear structure of the parameters implied by the reduced form model, while those utilized by the existing 2SLS- and 3SLS-type estimators do not. From this perspective, the new estimation methodology incorporates the ideas underlying the LIVE and the FIVE estimators in Brundy and Jorgenson (1971) for classical SEMs, as well as the IV estimators using optimal instruments for spatial autoregressive (SAR) models. In addition to the linear IV estimators, we also consider one-step GMM estimators that utilize both the linear and quadratic moments implied by the scores. Our new LIVE and FIVE estimators for the network SEMs re- main computationally feasible even in large sample and are robust against heteroskedasticity of unknown form. Monte Carlo simulations show that the new estimators in general outperform the existing 2SLS- and 3SLS-type estimators for this class of models when the instruments are weak. In the second part of the dissertation, we estimate the consumer demand for gasoline in the market of Vancouver, Canada. We employ a demand system with a spatial network component, utilizing the model and the estimation methods considered in the first part. Demand elasticity for gasoline at aggregate level are well documented in the literature, while estimates at station level are relatively scarce. We estimate the station-level demand elasticities as well as (spatial) elasticity of substitution under a variety of network structures based on different proximity mea- sures. We collected station-level data on retail prices, sales volume, station characteristics of the 151 stations, as well as the characteristics of local markets, for September 2019 as well as March 2020. To deal with the endogeneity of prices, existing works typically exploit variations in the characteristics of each station?s direct competitors. We argue that in a geographically con- tinuous market, this strategy may not be sufficient. In spirit of Fan (2013), our instruments also exploit the variations in the characteristics of the competitors of each station?s competitors (indi- rect competitors). We find that the own-price demand elasticity is between ?12 and ?4 while the cross-station price elasticity is in general between 0.6 ? 6, depending on the construction of the network matrices that governs the degree of competition. We also report the impact measures that provides interpretations on the estimated coefficients of the exogenous variables in the context of spatial network models. We find that the availability of service station in general have contributed positively on the sales volume at a station. In general, a station located within a neighborhood of more drivers face stronger demand. LIVE AND FIVE ESTIMATION OF SIMULTANEOUS EQUATIONS MODELS WITH HIGHER-ORDER SPATIAL AND SOCIAL INTERACTIONS by Jiankun Chen Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2022 Advisory Committee: Professors Ingmar Prucha and Andrew Sweeting, Co-Chairs Professor John Chao Professors Guido Kuersteiner Professor Haluk U?nal ? Copyright by Jiankun Chen 2022 Dedication To my mother and father, Shumei Li (???) and Fengyu Chen (???) for their endless love and encouragement. To my aunt and uncle, Shuling Li (???) and Feng Yan (??), for their unconditional support since my childhood. To Alan Taylor and Kenneth Elzinga, who inspired and encouraged me to embark on the journey of economic research. ii Acknowledgments I would like to express my sincerest gratitude to my advisors, Professors Ingmar Prucha and Andrew Sweeting. I am deeply indebted for their invaluable support, encouragement and patience during my research. This dissertation would not have been possible without their guidance. I would also thank Professors John Chao, Guido Kuersteiner, and Haluk U?nal for serving on my committee and offering their helpful comments. My gratitude also goes to all the faculty, staff and students in the economics department at the University of Maryland. They make the department a friendly and supportive place. iii Table of Contents Dedication ii Acknowledgements iii Table of Contents iv List of Tables vii List of Figures viii Chapter 1: Introduction 1 Chapter 2: LIVE and FIVE Estimation of the Simultaneous Equation Models with Higher Order Spatial/Social Interactions 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Structural Form Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Reduced Form and Structural Model with Exclusion Restrictions . . . . . 20 2.2.3 Model Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Maximum Likelihood Estimation and Estimator Generating Equations . . . . . . 26 2.3.1 Scores of the Log-likelihood Function . . . . . . . . . . . . . . . . . . . 26 2.3.2 IV Interpretation and Estimator Generating Equations . . . . . . . . . . 31 2.3.3 Connection to LIVE/FIVE and Optimal IV Estimation . . . . . . . . . . 32 2.4 Moment Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4.1 Heteroskedasticity-robust Moment Conditions . . . . . . . . . . . . . . 38 2.4.2 Approximated Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.5 LIVE and FIVE Estimators for Network SEM . . . . . . . . . . . . . . . . . . . 49 2.5.1 Limited Information Estimators . . . . . . . . . . . . . . . . . . . . . . 51 2.5.2 Full Information Estimators . . . . . . . . . . . . . . . . . . . . . . . . 56 2.6 Identification Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.6.1 Scenario I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.6.2 Scenario II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2.7 Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.7.1 Data Generation Process . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.7.2 Implemented Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.7.3 Performance of Estimators under Strong and Weak Identifications Sce- nario I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 iv 2.7.4 Performance of Estimators under Strong and Weak Identifications Sce- nario II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.7.5 Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.7.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 2.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Chapter 3: Empirical Application: Demand Estimation for Retail Gasoline Market with Network Dependence 91 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.3.1 Theoretical Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.3.2 Econometric Specification . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.5 Instruments and Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 3.6 Estimation Results and Impact Measures . . . . . . . . . . . . . . . . . . . . . . 115 3.6.1 Main Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.6.2 Impact Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 3.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Chapter A: Appendix to Chapter 2 125 A.1 Appendix: Example expression of EYn . . . . . . . . . . . . . . . . . . . . . . 125 A.2 Proofs of Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 A.2.1 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 A.2.2 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 A.2.3 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 A.2.4 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 A.2.5 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 A.3 Explicit Expressions of VCV matrices . . . . . . . . . . . . . . . . . . . . . . . 153 A.3.1 Explicit Expression of ??qgg,n(g) . . . . . . . . . . . . . . . . . . . . . . 153 A.3.2 Explicit Expression of ??qn . . . . . . . . . . . . . . . . . . . . . . . . . 156 A.4 Additional Monte Carlo Results . . . . . . . . . . . . . . . . . . . . . . . . . . 161 A.4.1 Results with Alternative Wn?s . . . . . . . . . . . . . . . . . . . . . . . 161 A.4.2 Correlated x.k?s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Chapter B: Appendix to Chapter 3 168 B.1 Theoretical Motivation for the Demand Equation . . . . . . . . . . . . . . . . . 168 B.2 Edgeworth Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 B.2.1 Retail Margins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 B.2.2 Markov Switching Regression (MSR) . . . . . . . . . . . . . . . . . . . 173 B.3 Test for IV power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 B.4 Additional Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 B.5 Impact Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 B.6 Test for Network Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 v Bibliography 191 vi List of Tables 2.1 Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 1 . . 75 2.2 Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 2 . . 76 2.3 Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 3 . . 77 2.4 Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 1 . 81 2.5 Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 2 . 82 2.6 Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 3 . 83 2.7 Median and RMSE of Scenario I under Heteroskedasticity, Parameter Constella- tion 1-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.1 Summary Statistics of Retail Prices, Sales Volume . . . . . . . . . . . . . . . . . 108 3.2 Summary Statistics of Station Characteristics . . . . . . . . . . . . . . . . . . . 110 3.3 Correlation of Station Characteristics in Neighboring Markets . . . . . . . . . . 113 3.4 First-stage OLS Regression and tests for IV power . . . . . . . . . . . . . . . . 116 3.5 Estimation Results with W based on common boundaries . . . . . . . . . . . . . 122 3.6 Impact Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 A.1 Median and RMSE of Scenario I, homoskedasticity, alternative Wn?s . . . . . . . 163 A.2 Median and RMSE of Scenario II, homoskedasticity, alternative Wn?s . . . . . . 164 A.3 Median and RMSE of Scenario I, homoskedasticity, correlated Xn . . . . . . . . 166 A.4 Median and RMSE of Scenario II, homoskedasticity, correlated Xn . . . . . . . . 167 B.1 Within-Regime Estimates and Expected Duration in Days, September 2019 . . . 175 B.2 Within-Regime Estimates and Expected Duration in Days, September 2020 . . . 176 B.3 Switching Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 B.4a First-stage OLS Regression and tests for IV power . . . . . . . . . . . . . . . . 179 B.4b First-stage OLS Regression and tests for IV power (continued) . . . . . . . . . . 180 B.6 Estimation Results with W2,n (2-miles radius) . . . . . . . . . . . . . . . . . . . 181 B.7 Estimation Results with W3,n (common boundary and reciprocal of the travel distance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 B.8 Estimation Results with W4,n (nearest neighbor) . . . . . . . . . . . . . . . . . . 183 B.9 Estimation Results with W5,n (common street) . . . . . . . . . . . . . . . . . . . 184 B.10 Estimation Results with W6,n (hybrid measure of common street and travel distance)185 B.11 Definition of Regression Variables . . . . . . . . . . . . . . . . . . . . . . . . . 186 B.12 Impact Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 B.13 Test for Network Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 vii List of Figures 3.1 Market Area based on Common Boundaries . . . . . . . . . . . . . . . . . . . . 104 3.2 Dynamics of Average Retail Price . . . . . . . . . . . . . . . . . . . . . . . . . 109 B.1 Average Retail Margin Computed with Spot Rack Price, Rack Price of 5 and 10 Days? Lead, Aug - Nov, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 B.2 Average Retail Margin Computed with Spot Rack Price, Rack Price of 5 and 10 Days? Lead, Feb - Apr, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 viii Chapter 1: Introduction Empirical literature has documented substantial evidence of cross sectional interdependence among observations both at the micro and at the macro level. At the micro level, cross sectional units are often individuals, firms or plants, whereas at the macro level the units could be states or countries. These works often utilize data with spatial features and/or employ models allow for network structures among cross-sectional units. Due to the large size of relevant literature, we only mention a few here. Using county level data, Gallardo, Whitacre, Kumar, and Upendram (2021) study the impact of broadband access on job productivity. Nakamura and Avner (2021) analyze the spatial distributions of job accessibility and housing rents in Nairobi, Kenya. Fingle- ton and Szumilo (2019) estimate the impact of investment on high-speed rail infrastructure on wage levels in England and Wales. In their analysis on causal effects of democracy on economic growth, Acemoglu, Naidu, Restrepo, and Robinson (2019) model explicitly spatial correlation among variables (e.g., GDP and shocks) to control for regionally correlated omitted factors. At the micro level, Pinkse, Slade, and Brett (2002) estimate spatial price competition among whole- sale gasoline terminals. With bank-level data of euro area countries, Gibson, Hall, Petroulas, Spiliotopoulos, and Tavlas (2020) analyze the spillover effects on other banks when providing emergency liquidity assistance (ELA) to one bank during the euro area crisis. There are also works that apply spatial models to study social networks, include Ballester, Calvo?-Armengol, 1 and Zenou (2006), Lee (2007), Calvo?-Armengol, Patacchini, and Zenou (2009), Blume, Brock, Durlauf, and Ioannides (2011), Liu (2014) and Cohen-Cole, Liu, and Zenou (2018). 1 The first part of the dissertation considers the estimation of simultaneous equation models (SEM) with network interdependence. An important class of models for spatial networks origi- nates from the single equation model introduced by Cliff and Ord (1973, 1981). This model can be viewed as a variant of the model introduced by Whittle (1954) and is often referred to as a spatial autoregressive (SAR) model; see, e.g., Anselin (1988). In SAR models, cross-sectional interactions are modeled through Cliff-Ord type spatial lags.2 We consider the same model spec- ification as that in Drukker, Egger, and Prucha (2022) and allow for higher order spatial lags in the dependent variables, the exogenous variables and the disturbances. Following their termi- nology, we would refer to this class of models as the simultaneous spatial autoregressive model with spatially autoregressive disturbances (SE-SARAR) model. In this dissertation, we consider a new class of generalized method of moments (GMM) estimators that utilize approximations to the optimal instruments towards constructing the linear moments. We also consider the GMM estimators that utilize both the linear moments and the quadratic moments that originate from the scores of the log-likelihood function. More specifically, the new estimators build on three lines of the literature. First, we show that the linear parts of the ML scores can be viewed as a set of 1In addition, Baltagi and Bresson (2011), Jeanty, Partridge, and Irwin (2010) study housing price spillovers at the county level. Hauptmeier, Mittermaier, and Rincke (2012) focus on fiscal competition over taxes and public input. Agrawal (2015) studies spatial fiscal competition in the local tax rates near district borders. Conley and Ligon (2002), Ertur and Koch (2007) document spatial spillover in economic growth. Behrens, Ertur, and Koch (2012) suggest that bilateral trade flows exhibit spatial interdependence, and Baltagi, Egger, and Pfaffermayr (2007), Baltagi, Egger, and Pfaffermayr (2008) and Blonigen, Davies, Waddell, and Naughton (2007) find similar pattern in bilateral foreign direct investment. Ertur and Musolesi (2017) examine the technological knowledge spillovers among countries. Some recent works address the relation between COVID-19 and economic activities. For example, Ascani, Faggian, and Montresor (2021) used Italian data to show that geographical concentration of economic activity in specific areas of the country acts as a vehicle of disease transmission and thus generates a core-periphery pattern in the geography of COVID-19. Lee and Huang (2022) analyze the shifting housing preferences during the pandemic in the United States. 2In the introduction part of Chapter 2, we explain the term spatial lags in details. 2 estimator generating equations from which a generic form of instrumental variable (IV) estima- tors can be derived. This result extends on its relevant counterparts in Hausman (1975), Hendry (1976) and Prucha and Kelejian (1984) in the context of classical SEMs. Second, the new estima- tors incorporate the underlying ideas of the LIVE and the FIVE estimators proposed by Brundy and Jorgenson (1971) in the context of classical SEMs. The LIVE and the FIVE estimators can be viewed as a specific form of the limited information and the full information IV estimators, respectively. Compared to the 2SLS and the 3SLS estimators, the LIVE and the FIVE differ in the approach of estimating the expected values of the endogenous variables towards constructing instruments and utilize better the information of the nonlinear parameter restrictions implied by the reduced form model. The new estimators considered in this dissertation share this feature of the LIVE and the FIVE estimators. Finally, the estimation methodology considered in this pa- per also relates to the instrumental variable estimators with optimal instruments in the context of single equation spatial autoregressive (SAR) models. Early contribution to this line of literature includes Lee (2003) and Kelejian, Prucha, and Yuzefovich (2004). Furthermore, our new GMM estimators that utilize both the linear and the quadratic moments remain robust to heteroskedas- ticity of unknown form and computationally feasible even when sample size (i.e., the size of the networks) becomes large. The Monte Carlo results show that the new GMM estimators out- perform their existing counterparts, e.g., the 2SLS-type and 3SLS-type estimators considered in Drukker, Egger, and Prucha (2022), when the instruments are weak. In the second part of this dissertation, we estimate the consumer demand for retail gaso- line in the market of Vancouver, Canada. We employ a demand system with a spatial network component, utilizing the model and the estimation methods considered in the first part. Demand elasticity for gasoline at aggregate level are well documented in the literature, while estimates 3 at station level are relatively scarce. We estimate the station-level demand elasticities as well as (spatial) elasticity of substitution under a variety of network structures based on different proximity measures. We collected station-level data on retail prices, sales volume, station char- acteristics of the 151 stations, as well as the characteristics of local markets, for September, 2019 and March 2020. We obtained the sales volume data from Kalibrate. The price data were col- lected from Gasbuddy.com at daily frequency and then aggregated to monthly frequency.3 The data on the station and regional characteristics are provided by Kalibrate?s survey and Census Canada 2016. Towards constructing the network matrices, we also collected the traffic and geo- graphical information using Google Map API. We estimate the station-level demand elasticities as well as (spatial) elasticity of substitution under a variety of network structures based on differ- ent proximity measures. To deal with the endogeneity of prices, existing works typically exploit variations in the characteristics of each station?s direct competitors. We argue that in a geograph- ically continuous market, this strategy may not be sufficient. Following Fan (2013)?s arguments, we additionally exploit the variations in the characteristics of the competitors of each station?s competitors (indirect competitors). We find that the own-price demand elasticity is between ?12 and ?6 while the cross-station price elasticity in general between 0.6 ? 6, depending on the specific construction of the network matrices that governs the degree of competition. These esti- mates are largely consistent with economic theory, but of smaller magnitude than those reported in some recent works, e.g., Houde (2012). One possible explanation is that we build networks based on different sources of information. Houde (2012) exploits heavily on the local traffic pat- tern and road structure and thus allow a station to compete with another located far away if they are located on a main commute route or a segment of highway. Our approach of constructing 3In Chapter 3, we show that there is no significant cyclical pattern in prices during the sample periods. 4 the spatial network matrices lies in line with the majority of existing literature. 4 Our proximity measures exploit the information of stations? neighborhoods or the pair-wise travel distance be- tween neighboring stations. Thus, we implicitly assume that the competition is largely local. We also computed the impact measures that provides interpretations on the estimated coefficients of the exogenous variables in the context of spatial network models. We find that the availability of service station in general has contributed positively on the sales at station level. In general, a station located within a neighborhood of more drivers face stronger demand. The organization of the dissertation is as follows. Chapter 2 proposes the new class of GMM estimators for the simultaneous equations model with network dependence. Chapter 3 illustrates the empirical relevance of the first chapter by estimating the demand system with network depen- dence for the retail gasoline market in Vancouver, Canada. We relegate the proofs and technical details of Chapter 2 to Chapter A. Additional results to Chapter 3 are documented in Chapter B. 4See, e.g., Pennerstorfer (2009), Pennerstorfer and Weiss (2013), Pinkse, Slade, and Brett (2002), among others. 5 Chapter 2: LIVE and FIVE Estimation of the Simultaneous Equation Models with Higher Order Spatial/Social Interactions In this chapter, we introduce a new class of GMM estimators for simultaneous equation models (SEM) of cross sectional data with network interactions in the dependent variables, the exogenous variables and the disturbances. The model and the proposed estimators are applicable to a wide class of networks, including spatial networks and social networks. 2.1 Introduction In spatial autoregressive (SAR) models, cross-sectional network interactions are modeled through Cliff-Ord type spatial lags, which are weighted averages of the model variables. For a network consists of n cross-sectional units, a (first-order) spatial lag in the endogenous variable yn (n ? 1) is Wnyn, where Wn (n ? n) is often referred to as the spatial weights matrix.1 The weights reflect the relative importance of the links between cross-sectional units in generating network interactions. In the context of geographical networks, the elements in Wn are typically based on some measure of distance between units, e.g., the inverse of Euclidean distance. We emphasize that, however, the measure of proximity is not necessarily indexed by the locations of the units and thus the notion of distance is not confined to geographical distance. The flexibility 1In the (social) network literature, this Wn matrix is sometimes referred to as the adjacency matrix. 6 in constructing the measures of proximity makes the model capable of describing a wide variety of networks. For example, in the context of social networks, proximity measures could depend on the number of friends that each person has in the sample. Spatial econometrics has a long history in regional science, urban economics and development; see, e.g., Anselin (1988). The development of econometric methods of estimation and inference for Cliff-Ord type models has also attracted lots of attention.2 While most theoretical works in the spatial econometrics literature focused on single-equation models, the literature on the estimation of simultaneous systems of spatially interrelated cross sectional equations is relatively scarce. In economics, different outcome variables are frequently determined jointly within a system of equations. In the context of a network SEM, the simulta- neous nature of the outcomes can stem from both the interactions between different endogenous economic variables (i.e., the classical simultaneous effect), as well as the interaction between cross sectional units (i.e., the network generated interdependence). Extending on the methodol- ogy developed in Kelejian and Prucha (1998, 1999) for single equation SAR models, Kelejian and Prucha (2004) provide an early development of GMM estimators for a SE-SARAR model with first order spatial lags (i.e., model allows for one spatial weights matrix Wn). Liu (2014, 2019, 2020), Cohen-Cole, Liu, and Zenou (2018) and Liu and Saraiva (2019) build and extend on Kelejian and Prucha (2004) in estimation methodology and identification conditions within the context of social interaction models with first order spatial lags, and cross-sectionally inde- pendent disturbances. More specifically, Liu (2014) focuses on IV-based linear estimators and shows that the Bonacich centrality (Bonacich (1987)) provides additional information to iden- tify peer effects and can be used as an instrumental variable. Liu (2020) extends on Liu (2014) 2See, e.g., Anselin (2010) for a review of the development of spatial econometric methods. 7 and considers one-step GMM estimation methods that utilize both the linear and the quadratic moment conditions. Liu (2014, 2020) also provide bias correction procedures with many instru- ments. Liu and Saraiva (2019) consider one-step GMM estimation based on both the linear and quadratic moments and allows for heteroskedasticity of unknown form. Cohen-Cole, Liu, and Zenou (2018) consider the identification and estimation of social interaction effects in the con- text of multivariate choices. Liu (2019) proposes an estimation methodology for a simultaneous system of equations with binary outcomes generated from an incomplete information network game. Drukker, Egger, and Prucha (2022) further extend on the methodology of Kelejian and Prucha (2004) to a more general model specification that allow for higher order spatial lags in the dependent variables, the exogenous variables and the disturbances. In addition to 2SLS- and 3SLS-type estimators, they also consider one-step limited and full information GMM estimators which utilize both linear and quadratic moments and remain robust against heteroskedasticity. The model specification in the current chapter is the same as that in Drukker, Egger, and Prucha (2022), which, to the best of our knowledge, is more general than the existing cross-sectional SEMs with network interactions in the literature. Other recent contributions to the literature on spatial simultaneous equation models include Baltagi and Deng (2015), who consider an ex- tension of a two-equation system with first-order spatial lags to panels. Wang, Li, and Wang (2014) analyze the quasi maximum likelihood (QML) estimator for a two-equation system with first-order spatial lags in the cross section. Yang and Lee (2017) consider the quasi maximum likelihood estimator for a multi-equation system with a first-order spatial lag in the dependent variable. Yang and Lee (2019) provide an extension to dynamic panel data models allowing for multiple weight matrices. In contrast to this dissertation,these papers either consider only first-order spatial lags in the dependent variable, and/or do not allow for spatial spillovers in the 8 disturbance process. In the Section 2.2, we will explain that allowing for higher order spatial lags brings considerable flexibility and robustness to the model. Also will be explained below, this dissertation also differs from the above cited literature in terms of estimation methodology. In particular, we considers an alternative way of constructing the IVs and the moment conditions, and consider both two-step estimation procedures and one-step GMM estimators. The new class of GMM estimators considered in this chapter builds on three lines of the literature. First, in the context of classical SEM, Hausman (1975) shows that ML estimator can be written to carry IV interpretation where the instruments embody all the a priori restrictions. Hendry (1976) and Prucha and Kelejian (1984) demonstrate that the normal equations of ML estimators can be viewed as an estimator generating equations and IV estimators can be viewed as numerical approximations to the solutions of this nonlinear system. These insights carry over to the present simultaneous equation system with network dependence. In Section 2.3, we extend on these insights and show that the linear parts of the ML scores can also be viewed as a set of estimator generating equations from which IV estimators can be derived. The new estimators also incorporate the underlying ideas of the LIVE and the FIVE esti- mators proposed by Brundy and Jorgenson (1971) in the context of classical SEM, which can be viewed as a specific form of the limited information and the full information IV estimators, re- spectively. Compared to the 2SLS and the 3SLS estimators, the LIVE and the FIVE differ in the approach of estimating the expected values of the endogenous variables towards constructing in- struments. In the first stage, 2SLS (and 3SLS) estimates for the expected value of the endogenous variables implied by the reduced form model by OLS, without imposing parameter restrictions. Alternatively, Brundy and Jorgenson (1971) utilize the specific form of the reduced form model and compute reduced form parameters with some consistent initial estimates. In this way, their 9 instruments exploit the underlying nonlinear structure of parameters in the reduced form while 2SLS and 3SLS do not. In finite sample, the LIVE and the FIVE estimators may be more efficient than the 2SLS and the 3SLS, respectively. The 2SLS- and 3SLS-type estimators considered in Kelejian and Prucha (2004) and Drukker, Egger, and Prucha (2022) also estimate the expected values of the endogenous variables in an unrestricted fashion. While numerically convenient, this procedure ignores the underlying nonlinear structure of the parameters in the reduced form and hence loses efficiency. This motivates the exploration of estimators with more carefully con- structed instruments for this class of network systems. To indicate the links to the LIVE and the FIVE estimators in the context of classical SEMs, we would refer to the new limited and full information IV estimators for SEMs with network dependence as the Generalized Spatial LIVE (GSLIVE) estimator and the Generalized Spatial FIVE (GSFIVE) estimator, respectively. Finally, the estimation methodology considered in this chapter relates to the instrumental variable estimators with optimal instruments in the context of single-equation spatial autore- gressive (SAR) models. As an early contribution to estimation methodology for this class of models, Kelejian and Prucha (1998) introduce 2SLS and 3SLS type estimators, noting that the mean of endogenous variable yn depends on Xn, W X , W 2n n nXn and so on. To obtain the instru- me[nts, they regress right-hand side v]ariables over the collection of linearly independent columns in Xn,WnX ,W 2X , . . . ,W Sn n n n Xn for some finite order S. This resembles the first stage of classical 2SLS estimator and thus, in light of the above discussion, also ignores the nonlinear structure of the reduced form model. Alternatively, Lee (2003) constructs the IV by estimat- ing for the optimal instruments of endogenous variables (i.e., Eyn) implied by the reduced form model with some consistent initial estimates. 3 However, their instruments involve inversions 3Specifically, they considered initial estimates obtained by the GS2SLS estimator in Kelejian and Prucha (1998). 10 of n ? n matrices and thus become computationally infeasible in large sample. To cope with this challenge, Kelejian, Prucha, and Yuzefovich (2004) approximate for the inverses embedded in Eyn with corresponding geometric sums of finite order.4 The estimators considered in this chapter share the same spirit with Kelejian, Prucha, and Yuzefovich (2004) in constructing the instruments for the endogenous regressors, i.e., approximate for Eyn implied by the reduced form with consistent initial estimates and geometric series of finite order. On a more general level, the IV estimators considered Brundy and Jorgenson (1971), Lee (2003) and Kelejian, Prucha, and Yuzefovich (2004) all utilize the nonlinear structure of parameters in Eyn implied by the reduced form when construct the instruments. In addition to the GSLIVE and the GSFIVE estimator that utilize the linear moments, we would also consider one-step GMM estimators that utilize both the linear and the quadratic mo- ments that originates from the scores of the log likelihood function. The spatial literature has long recognized the value of quadratic moments in identifying spatial parameters. As suggested in Kelejian and Prucha (1998), the linear moment conditions may fail to identify the regression parameters.5 An extreme case of such scenario is when the true parameter values on the ex- ogenous variables are all zeros. In these cases, the ML estimator, whose scores consist of both the linear and the quadratic components, may still remain consistent in estimating the regres- sion parameters. We note that the quadratic components of the scores represent valid quadratic moment conditions. 6 Building on Kelejian and Prucha (1999), Kelejian and Prucha (2004) 4Note that in Kelejian, Prucha, and Yuzefovich (2004), they allow for the order of approximation in the geometric sums goes to infinity and thus their estimator is asymptotically equivalent to that considered in Lee (2003). 5For example, Kelejian and Prucha (1998) suggest a case in which the weight matrix is row normalized and the only exogenous regressor is a constant. 6Specifically, quadratic components of the scores can be showed to be of the form: ?nAn?n, where ?n is a n ? 1 vector of disturbances and An being nonstochastic. Under homoskedasticity (E? ?n?n = ?nIn), imposing tr(An) = 0 implies E?nAn?n = ?ntr(An) = 0. Under heteroskedasticity, ones needs to further assume An to be of zero diagonal for these quadratic moments to be valid. See Proposition 3 and Lemma 1 for more discussion. 11 suggest a GMM procedure with quadratic moments to estimate the spatial autoregressive pa- rameter in the disturbance process with the disturbances correlated across equations.7 In this chapter, we also extend the GSFIVE estimator to the Linear-Quadratic Generalized Spatial FIVE (LQ-GSFIVE) estimator by complementing the linear moments with the moments implied by the quadratic components of the scores. Analogously, we extend the GSLIVE estimator to the Linear-Quadratic Generalized Spatial LIVE (LQ-GSLIVE) by incorporating quadratic moments but ignoring the cross-equation error structure. One major difference to the existing estimators is that the quadratic moments utilized in this dissertation explicitly take into account the nonlinear structure of the structural parameters implied by the quadratic components of the scores of the log-likelihood function. However, while ML is asymptotically efficient under normality, it is in general inconsistent under heteroskedasticity. This drawback passes onto the quadratic moments implied by the quadratic scores. To cope with this drawback, we further modify on the quadratic moments and derive their heteroskedasticity-robust counterparts. As such, our GMM estimators that utilize both the linear and the quadratic moments remain consistent under heteroskedastic disturbances. Relative to the ML estimator, another advantage of the new estimators is that they remain feasible when sample size gets large. As the scores of log-likelihood function depend on the inverses of n ? n matrices where n is the sample size, ML procedures are not feasible when n becomes large. Note that n also determines the size of the network in the spatial/social network 7Acknowledging that the quadratic moments could help with identifying spatial autoregressive parameters, Lee (2007) and Lin and Lee (2010) both considered GMM estimator with a single vector consists of both the linear and quadratic moments for single equation SAR models. Kuersteiner and Prucha (2020) built on this idea and showed that it can help with the weak identification problem of linear moments. For a simultaneous system, Liu and Saraiva (2019), Drukker, Egger, and Prucha (2022) considered GMM estimators utilizing both linear and quadratic moments that remain consistent when linear moments are not sufficient for identification. Their quadratic moments exploit the covariance structure of model disturbances both within and across equations. 12 models. For instance, there are over 3, 000 counties and 70, 000 census tracts in the U.S. To address this difficulty, we approximate for the inverses embedded in the instruments (and moment conditions) with geometric series of finite order. In the context of single equation SAR models, this idea is utilized by Kelejian, Prucha, and Yuzefovich (2004) in order to cope with the large n problem exists in the 2SLS-type estimator with optimal instruments considered in Lee (2003). The rest of this chapter is organized as follows. In Section 2.2, we specify the considered simultaneous equation system with cross-sectional network interactions. We would also discuss model assumptions. In Section 2.3, we derive the scores of the log-likelihood function and show that the linear and quadratic parts of the scores represent valid moment conditions. We would also show that the linear moments carry an IV interpretation, which in turn motivate the generic form of linear IV estimators. In Section 2.4, we present the heteroskedasticity-robust limited and full information linear and quadratic moment conditions and their approximated versions with the inverse matrices replaced by the corresponding geometric series. In Section 2.5, we define and discuss the implementation details of the GSLIVE and the GSFIVE estimators, as well as the LQ-GSLIVE and the LQ-GSFIVE estimators. We also provide a brief discussion on the identification of the model in Section 2.6, which would motivate the strong and the weak identification scenarios that appear in our Monte Carlo simulations. Section 2.7 documents the Monte Carlo results and Section 2.8 concludes. All proofs and auxiliary discussions are relegated to Chapter A. Notations: We adopt the following notations throughout the dissertation. Let (An)n?N be some sequence of n ? m matrices, then we denote the (i, j)-th element of An with aij,n. The i-th row and j-th column with ai.,n and a.j,n. The j-th column of the n ? n identity matrix In will be denoted as ij,n. If An is nonsingular, we denote the (i, j)-th element of A?1n as a ij n . More 13 generally, if An is a blocked matrix of size nG ? nP with G ? P blocks, we then denote the (g, h)-th n ? n block as Agh,n with g = 1, ..., G and h = 1, ..., P . Borrowing from the standard matrix notations, we let Ag.,n denote the g-th n ? nP block of An and let A.h,n denote the h-th nG ? n block of An. Moreover, if G = P , we denote the (g, h)-th block of A?1 as Aghn n with g, h = 1, ..., G. We further let Ag.n denote the g-th n ? nG block of A?1n and let A.hn den{ote th}e h-th nG ? n block of A?1n . For any n ? n matrix A nn = (aij,n) we denote with diagi=1 aii,n the n? n diagonal matrix with the (i, i)-th ele{ment b}eing aii,n. Analogously, for some nG? nG matrix An = (A Ggh,n) we denote with diagg=1 Agg,n the diagonal block matrix with the (g, g)-th matrix block being Agg,n. Let An be of dimension mn ?mn with elements potentially depend on sample size n, then the maximum column sum and row sum matrix norms of An are, respectively, defined as ?mn ?mn ?An?1 = max |aij,n| and ?An?? = max |aij,n|.1?j?mn 1?j?mn i=1 j=1 If ?An?1 ? c and ?An?? ? c for some finite constant c which does not depend on n, then we say that the row and column sums of the sequence of matrices An are uniformly bounded in absolute value. 2.2 Model In this section we specify our simultaneous system of G equations for G endogenous vari- ables observed for n cross sectional units. We consider the same model as Drukker, Egger, and Prucha (2022), but will develop new estimation methods for that model. We note that our presen- tation and discussion of the model follows very closely and/or duplicates that of Drukker, Egger, 14 and Prucha (2022). In our model, simultaneity can stem from two sources. The first source is the classical simultaneity across equation that captures the dependence of the g-th endogenous variable for the i-th unit on the other endogenous variables for the i-th unit. The other source stems from Cliff and Ord (1973, 1981) type cross-sectional network interactions between cross-sectional unit i and j, which are modeled in the form of ? spatial lags?. The model specification is fairly general and allows for multiple network structures, in the form of higher order spatial lags in the endogenous variables, in the exogenous variables as well as in the disturbances. 2.2.1 Structural Form Model We assume that the cross-sectional data of n units are generated by the following system (g = 1, ..., G): ? ? ???G K G ? ?Py ?g,n = b y? lg l,n + c ? kg xk,n + ?lg,pWp,n yl,n + ug,n, (2.1) l=?1 k=1 l=1 p=1 u = ? Qg,n ? ?g,qMq,n ug,n + ?g,n, q=1 where yg,n is the n?1 vector of cross-sectional observations on the dependent variable in the g-th equation, xk,n is the n? 1 vector of cross-sectional observations on the k-th exogenous variable, ug,n is the n?1 disturbance vector in the g-th equation, and ?g,n is the n?1 vector of innovations entering the disturbance process for the g-th equation. With blg and ckg we denote the scalar parameters corresponding to the l-th endogenous and k-th exogenous variables, respectively, of the g-th equation. In general, the structural model parameters are not identified without certain 15 restrictions. Those restrictions will be introduced in the next subsection.8 With above notations, recall that a classical SEM model can be formulated as (g = 1, ..., G) ?G ?K yg,n = blgyl,n + ckgxk,n + ug,n, (2.2) l=1 k=1 where blg represent the classical simultaneity effects. The difference between the network SEM (2.1) and the classical SEM (2.2) is that the former contains the additional explanatory regres- sors Wp,nyl,n with associated parameters ?lg,p. The terms ?lg,pWp,nyl,n capture the spatial/cross- sectional dependence among agents. Note that in the classical SEM, the equilibrium outcome of the i-th agent depends only on the other outcome variables, the exogenous characteristics and unobservables of itself. In contrast, in a simultaneous equation model with network dependence, the equilibrium outcome of the i-th agent also depend on the outcomes of the other agents. In other words, the equilibrium outcomes are determined simultaneously by all agents. Consistent with the usual terminology for Cliff-Ord type network interactions, we refer to the nonstochastic n? n matrices Wp,n and Mq,n as spatial weights matrices and to yl,p,n = Wp,nyl,n and ug,q,n = Mq,nug,n as spatial lags on yl,n and on ug,n. Correspondingly, the (scalar) parameters ?lg,p and ?g,q are referred to as the spatial autoregressive parameters. Observe that the i-th element of yl,p,n and ug,q,n are given by ?n ?n yil,p,n = wij,p,nyjl,n and uig,q,n = mij,q,nujg,n. j=1 j=1 8At minimum, bgg for g = 1, . . . , G are restricted to zeros in the model. 16 The weights matrices contain the information on the links between units and on the relative weight of those links. For example, in the spatial settings, the elements in the weights matrices are often some function of the inverse distance between units. The spatial autoregressive parameters describe the strength of the spillovers. It is worth noting that, Cliff-Ord type network models do not require indexing the observations by locations. They only rely on some measure of proximity between units in the formation of the spatial weights. Although originally introduced for spatial networks, the notion of network in these models goes beyond geographic ones and thus can accommodate a wide class of networks, as remarked in the introduction. For example, in the context of social network models, one common specification has been to assign to each of the i-th individual?s friends a weight of 1/ni, where ni denotes the total number of friends of i, while assigning zero weights to individuals not being a friend of i.9 As is often the case in the literature, the elements of the spatial weight matrices are allowed to depend on the sample size. This permits normalization of these matrices where the normal- ization factor(s) depend on the sample size, which in turn, implies that the model parameters ?depend on sample size. 10 As seen from the above, the i-th element of, say, yl,p,n is given by n j=1wij,p,nyjl,n. Note that in light of this, even if the elements of the spatial weights matrices do not depend on sample size, the elements of the spatial lag yl,p,n and, analogously, the elements of ug,q,n will generally depend on sample size. This in turn implies that the elements of yg,n and ug,n will generally depend on the sample size, or in other words, form triangular arrays. In allow- ing the elements of xk,n to depend on the sample size, we implicitly also allow for some of the exogenous variables to be spatial lags of exogenous variables. For example, the elements of xk,n 9See, e.g., Cohen-Cole, Liu, and Zenou (2018), among others. 10See Kelejian and Prucha (2010) for further discussions regarding normalizations. 17 ? could be of the form x nik,n = j=1wij,p,n?j where ?j is some basic exogenous variable. Thus the model allows for, as remarked above, cross-sectional interactions in the endogenous variables, the exogenous variables, and the disturbances. The above model generalizes the spatial simultaneous equation model considered in Kele- jian and Prucha (2004) in allowing for higher-order spatial lags. One attractiveness of this feature is that they can capture different forms of proximity between units. For example, in the context of social networks, different matrices may refer to different circle of friends, e.g., one matrix may contain the very close friends, and a second matrix contains the other friends. As remarked in Drukker, Egger, and Prucha (2022), an estimation theory that allows for multiple spatial weights matrices can also be used to accommodate certain parameterizations of the spatial weights. We borrow their discussion of the motivating example in the following. Spatial weights are often specified as functions of some distance measure. Let W = (wij) be the basic spatial weights matrix, let dij denote some distance measure between units i and j, and let d?ij be some conti- guity measure taking on values of one or zero. Then, the researcher may specify the weights as the product of the contiguity measure and a polynomial in dij , treating the coefficients of the polynomial as unknown parameters: [ ] wij = d ? ij ?1dij + . . .+ ? p pdij . Now, suppose the researcher models yg as a function of, say, ?lgWyl, then it follows that ? ? ? ?? ?P P ?lgWyl = ?? ? ? ?lg ?pWp yl = ?lg,pWp yl, p=1 p=1 18 with ?lg,p = ?lg? ? p p, Wp = (wij,p), and wij,p = dijdij . In allowing for higher-order spatial lags, model (2.1) covers this specification as a special case. Note further that the above specification can accommodate other basis functions (e.g., alternate basis functions) and more general mea- sures of distance and contiguity. The same ideas also apply to the modeling of the disturbance process. As suggested in Drukker, Egger, and Prucha (2022), we would refer to this model as a simultaneous spatial autoregressive model of order P with spatially autoregressive disturbances of order Q, for short, a simultaneous SARAR(P ,Q) model, consistent with the terminology of Anselin and Florax (1995). In order to distinguish the true parameters from other possible values in the parameter space, we will denote in the following the true parameters as blg,0, ckg,0, ?lg,p,0 and ?g,p,0. Model (2.1) can be written more compactly as Yn = YnB0 +XnC0 + Y n?0 + Un, (2.3) Un = UnP0 + En with Yn = (y1,n, ..., yG,n)n?G, Xn = (x1,n, ..., xK,n)n?K Un = (u1,n, ..., uG,n)n?G, En = (?1,n, ..., ?G,n)n?G Y n = (y1,1,n, ..., y1,P,n, ..., yG,1,n, ..., yG,P,n)n?PG, Un = (u1,1,n, ..., uQ,1,n, ..., u1,G,n, ..., uQ,G,n)n?QG where the parameter matrices B0 = (blg,0)G?G, C0 = (ckg,0)K?G, ?0 = (?lg,p,0)PG?G, and 19 P0 = (?g,q,0)QG?G are defined conformably.11 2.2.2 Reduced Form and Structural Model with Exclusion Restrictions To derive the reduced form of the above model, we denote the vectorized model variables as yn = vec(Yn), xn = vec(Xn), un = vec(Un), ?n = vec(En). We also denote with Wn and Mn the stacked spatial weight matrices Wn = [W ? ? 1,n, ...,WP,n] ?, Mn = [M ? 1,n, ...,M ? ? Q,n] . Observing that vec(Y n) = (IG ? Wn)yn and vec(Un) = (IG ? Mn)un and that for any two conformable matrices A1 and A2, vec(A1A ?2) = (A2 ? I)vec(A1), it is readily seen that the spatial simultaneous equation system (2.1) can be re-written in the stacked notation as yn = B ? ? 0yn + C0xn + un, (2.4) un = P ? 0 un + ?n, where B?0 = (B ? 0 ? I ? ? ? ? ?n) + (?0 ? In)(IG ?Wn), C0 = C0 ? In, and P0 = (P0 ? In)(IG ?Mn). 11Under this formulation, the g-th column of ?0 and P0 are, respectively, [? ? ?1g,1,0, ..., ?1g,P,0, ..., ?Gg,1,0, ..., ?Gg,P,0] and [0, ..., 0, ?g,1,0, ..., ?g,Q,0, 0, ..., 0] . For normalization, diago- nal elements of B0 are restricted to zeros. 20 To facilitate discussion, we denote Sn(?0, ?0) = InG ?B?0 , Rn(? ? 0) = InG ? P0 . The reduced form of the system is given by y = (I ?B?)?1 ?n nG 0 (C0xn + un) = S?1n (?0, ?0) (C?0xn + un) , (2.5) un = (InG ? P ? ?1 ?10 ) ?n = Rn (?0)?n, assuming that the inverses of InG ?B? ?0 and InG ? P0 exist. In general, the structural parameters of the spatial simultaneous equation system (2.1) and (2.3) are not identified without imposing exclusion restrictions. Let ?g,0, ?g,0, ?g,0 and ?g,0 denote the mg,? ? 1, mg,? ? 1, mg,?? 1 and mg,?? 1 vectors of non-zero elements of the g-th column of B0, C0, ?0 and P0, respectively. Let Yg,n, Xg,n, Y g,n, U g,n and Eg,n be the corresponding matrices of observations on the endogenous variables, exogenous variables, spatially lagged endogenous variables, spatially lagged disturbances and disturbances appearing in the structural equation for the g-th endogenous variable. Then model (2.3) can be expressed as (g = 1, ..., G): yg,n = Zg,n?g,0 + ug,n, (2.6) ug,n = U g,n?g,0 + ?g,n, (2.7) where Zg,n = [Y ? ? ? ?g,n, Y g,n, Xg,n] and ?g,0 = [?g,0, ?g,0, ?g,0] . 21 For purposes of estimation, it proves helpful to apply a spatial Cochran?e-Orcutt transfor- mation to the model. In particular, premultiplying (2.6) by Rg,n(?g,0) = In ? q?I ?g,? g,q,0Mq,n yields y?g,n = Z?g,n?g,0 + ?g,n, (2.8) with y?g,n = y?g,n(?g,0) = Rg,n(?g,0)yg,n and Z?g,n = Z?g,n(?g,0) = Rg,n(?g,0)Zg,n. Stacking the transformed equations yields y?n = Z?n?0 + ?n, (2.9) [ ]? { } [ ]? with y?n = y??1,n, . . . , y ? ?G,n , Z G ? ? ?n = diagg=1 Z?g,n , ?0 = ?1,0, . . . , ?G,0 . It then follows from (2.6) and above transformed form of the model (2.8) that ? ? ?[ ] ? ?g,n = In ? ? ?g,q,0Mq,n yg,n ? Zg,n?g,0 = Rg,n(?g,0)ug,n. (2.10) q?Ig,? We can stack (2.6) and (2.10) over G equations to write the whole system as yn = Zn?0 + un, (2.11) ?n = Rn(?0)un, { } { } where Zn = diagG Z G ? ? ?g=1 g,n , Rn(?0) = diagg=1 Rg,n(?g,0) , ?0 = [?1,0, ..., ?G,0] denotes the vector of all parameters in the structural equations and ?0 = [??1,0, ..., ? ? ? G,0] is the vector of all spatial autoregressive parameters in the disturbance process. To facilitate below expressions in the next subsection, we let ? ? ? ?g,0 = [?g,0, ?g,0] be the vector of parameters in the gth equation of the full model and ? ?0 = [?1,0, ..., ? ? ? G,0] be the vector of all model parameters. 22 2.2.3 Model Assumptions Given that we consider the same model specification as in Drukker, Egger, and Prucha (2022), the following assumptions regarding the data generating process (DGP) are also the same as those maintained in their paper. Assumption 1. For p = 1, ..., P and q = 1, ..., Q: (a) A?ll dia?gonal elem? ents ?of Wp,n and Mq,n are zero. ?(b) ?W? ? ??c, ?p,n M? ?q,n ? c for some finite constant c which does not depend on n, and? ? 1W ? ? 1 p,n ? = 1, Mq,n ? = 1. Assumption 2. (a)The matrix?Sn(? ? 0, ?0) = InG ?B0 is nonsingular. (b)The spatial autoregres- sive parameters satisfy supn q?I |?g,q,0| < 1 for g = 1, ..., G, with Ig,? g,? = {qg,1, . . . , qg,qg} ? {1, ..., Q} denotes the set of indices associates with the elements of ? .12g,0 (c)The row and column sums of S?1n (?0, ?0) are uniformly bounded in absolute value. The above assumptions are in line with the spatial literature. Assumption 1(a) entails a normalization rule and embodies the fact that unit i is not treated as a neighbor of itself. Assump- tion 1(b) implies that the row and column sums of the matrices Wp,n and M? q,n a?re uniformly ?bounde?d in absolute value. As pointed out in Kelejian and Prucha (2010), ?W ?p,n = 1 and?M ? ? q,n ? = 1 implies a normalization for the parameters. This normalization can always be achieved by appropriately re-scaling the elements of the spatial weight matrix, along with corre- spondingly redefined spatial autoregressive parameters.13 12Note that the index set Ig,? varies with g and hence embodies different exclusive restrictions imposed on spatial autoregressive parameters in different disturbance equations. 13See Kelejian and Prucha (2010) or Drukker, Egger, and Prucha (2022) for a more detailed discussion. 23 Assumption 2(a) ensures that the first equation of {the express}ion for the reduced form (?2.5) is well defined. 14 Observe that Rn(?0) = diagGg=1 Rg,n(?g,0) with Rg?,n(?g,0) = In ?? q?I ??g,q,0M? q,n. In? light of this, it follows from Assumption 1(b) and 2(b) that ?InG ?Rn(?0)?? ?g,? max ?? ?g q?I g,q,0 < 1, which in turn implies that Rn(?n) is nonsingular, by Horn and John-g,? son (1985), pp.301). Consequently, the second equation of the reduced form in (2.5) is also well defined, and thus yn is uniquely defined by the model. Assumption 1(b) and 2(b) imply even that sup ?P ?n 0 ?? < 1, which in turn implies that the row sums?of the mat?rices R?1g,n[(?g,0) are un]iformly[bounded in absol]ute value. To see this, observe that ?? ?1 ?Rg,n(?g,0)? ? 1/ 1? ?P ?0 ?? ? 1/ 1? supn ?P ?0 ?? < ?; see Horn and? Johnson (1985), pp.301). Assumption 3. The matrix of (nonstochastic) exogenous regressors Xn has full column rank. Furthermore, the elements of Xn are uniformly bounded in absolute value by some finite constant. Above assumption are standard in the spatial literature. In particular, in treating Xn as nonstochastic, analysis in this dissertation should be viewed as conditional on Xn. We next state the assumptions maintained for the disturbance ?n. Let Vn = [v1,n, ..., vG,n] be an n?G matrix of basic innovations and let vn = vec(Vn). Then assume Assumption 4. The innovations ?n are generated as follows ?n = (? ? ? ? In)vn, (2.12) where ??? is a nonsingular G ? G matrix and the random variables {vig,n : i = 1, ..., n, g = 14One common yet more restrictive condition in the literature is to assume ?B?0? < 1 for some induced matrix form. 24 1, ..., G} are, for each n, identically and independently distributed with zero mean, unitary vari- ance, and finite 4 + ? moments for some ? > 0, and their distribution does not depend on n. Furthermore, let ?0 = ?????, then the diagonal elements of ?0 are bounded by some finite con- stant. The above assumption on the innovation process is in line with the specification of the disturbance terms for a classical simultaneous equation system. Let ?n(i) denote the i?th row of En, then equation (2.12) implies En = Vn?? since ?n = vec(En). It is then readily seen that the innovation vectors {?n(i) : 1 ? i ? n} are i.i.d. with zero mean and VC matrix ?0. With respect to the stacked innovation vector, the above assumption implies that E?n = 0 and E?n? ? n = ?0 ? In. For notational convention, let ?0 be the column vector consists of nonzero upper diagonal elements of ?0, the G ? G variance-covariance matrix of the innovation vectors {?n(i) : 1 ? i ? n}. Following the aforementioned notations, ?g.,0 and ?.g,0 denote the g-th row and the g-th column of ?0, respectively. Similarly, ?g.,0 and ?.g,0 denote the g-th row and the g-th column of ??1, respectively. We note that ?? = ? and ?g. ? 0 g.,0 .g,0 0 = ? .g 0 by symmetry of ?0. From the reduced form model (2.5) and Assumption 4, we see that the VC matrices of un and yn are given by, respectively, ? = R?1u,n n (?0)(? ?1 0 ? In)Rn (?0)?, (2.13) ? ?1 ?1 ?y,n = Sn (?0, ?0)?u,nSn (?0, ?0) . (2.14) Assumption 2 and 4 imply that the row and column sums of the VC matrix of un and that of yn are uniformly bounded in absolute value, thus limiting the degree of correlation between, respectively, the elements of un and of yn. 25 2.3 Maximum Likelihood Estimation and Estimator Generating Equations Hendry (1976) and Prucha and Kelejian (1984) show that the normal equations of the ML estimator can be utilized as a set of estimator generating equations, and IV estimators can be viewed as numerical approximations to its solution. Hausman (1975) shows explicitly that full information maximum likelihood estimator of a classical simultaneous system can be written into a form that permits IV interpretation where the instruments embody all the underlying parameter restrictions. Extending on these results, one contribution of the dissertation is showing that the linear components of the ML scores for the simultaneous system with network dependence can also be re-written to carry IV interpretations. In this section, we first derive the scores of the ML estimator of the network SEM model in equation (2.3). The scores are seen to be composed of weighted averages of linear and quadratic forms. We then show that both the linear and the quadratic forms represent valid moment condi- tions, and that the linear moments can be written into a form defining the generic IV estimator. 2.3.1 Scores of the Log-likelihood Function In light of the reduced form model (2.5), we see that yn ? N(?y,?y) under normality, where ?y = (I ? ?1 ? nG ?B0) C0xn, and ?y is defined in equation (2.14). In order to obtain more elegant expression of the scores, it proves helpful to reparameterize the model with ??1 instead of ?, observing that there is a one-to-one correspondence between the elements of ? and ??1. In the following, let ? denote 26 the upper diagonal elements of ??1(?). Assuming now the elements of ? are distributed i.i.d. normal, the reparameterized log-likelihood function is then given by15 ? nG nlnLn(?, ?) = ln(2?)? ln|?(?)|+ ln|Sn(?, ?)|+ ln|Rn(?)| 2 2 ? 1[Sn(?, ?)yn ? C?n(?)x]?R?n(?)(??1(?)? In)Rn(?)[Sn(?, ?)yn ? C?n(?)xn].2 (2.15) To help with the presentation of scores, define L?,g as the G?mg,? selection matrix on B0 for equation g such that ? ?g,0 = L?,gb.g,0 where b.g,0 denotes the g-th column of B0 and recall that ?g,0 denotes the vector non-zero elements in b.g,0. The selection matrices L?,g, L?,g and L?,g are defined analogously. We also denote the selection matrix associated with ? as L 16?. The proofs of the propositions in this section are given in Chapter A.2. Proposition 1. The scores of (2.15) with respect to the parameters of interests in equation g, for 15With ?y,n in (2.14) and yen(?) = S ?1 n (?, ?)C ? n(?)xn, note that (yn?yen(?))???1y,n(yn?yen(?)) = [Sn(?, ?)yn? C?n(?)xn] ?R?n(?)(? ?1(?)? In)Rn(?)[Sn(?, ?)yn ? C?n(?)xn]. 16To be clear, L?,g , L?,g and L{?,g a}re of dimension PG ?mg,?,K??mg,? and Q ?mg,?, respectively. For the full system, e.g., L = diagG? g=1 L?,g which is of dimension G 2 ? Gg=1 mg,? . Consistent with the definition of ?, selection matrix L? is of dimension G2 ?G(G+ 1)/2. 27 g = 1, ..., G, are: ?lnLn(?, ?) ? =Y ? ? g. g,nRg,n(?g) (? (?)? In)Rn(?)un(?)? ??,g,n(?, ?), (2.16)??g ?lnLn(?, ?) ? ? g. ??? =Xg,nRg,n(?g) (? (?)? In)Rn(?)un(?), (2.17) g ?lnLn(?, ?) ? ? g. ? =Y g,nRg,n(?g) (? (?)? In)Rn(?)un(?)? ??,g,n(?, ?), (2.18)??g ?lnLn(?, ?) =U ? g.? g,n(?g) (? (?)? In)Rn(?)un(?)? ??,g,n(?g), (2.19)??g ?lnLn(?, ?) 1 ? [ ] ? = L? nvec(?(?))? vec(En(?) ?En(?)) , (2.20) ?? 2 where ? (?, ?) = L??,g,n ?,g[tr(S 1g Gg ? n (?, ?)), . . . , tr(Sn (?, ?))] , (2.21) ??,g,n(?, ?) = L ? ?,g[tr(S 1g(?, ?)W ), . . . , tr(S1gn 1,n n (?, ?)WP,n), (2.22) . . . , tr(SGgn (?, ?)W1,n), . . . , tr(S Gg n (?, ?)W ? P,n)] (2.23) ??,g,n(? ) = L ? [tr(R?1 (? )M , . . . , tr(R?1 (? )M )]?g ?,g g,n g 1,n g,n g Q,n . Note that the endogenous variables and their spatial lags can be decomposed into Y en =Yn (?0) + Vn(?0), (2.24) e Y n =Y n(?0) + V n(?0), (2.25) e with Y en (?0) = EYn and EVn(?0) = 0, as well as Y n(?0) = EY n and EV n(?0) = 0. Obviously, e Y en (?0) is the best instrument for Yn and Y n(?0) is the best instrument for Y n. 28 e Explicit expressions for the columns of Y en (?0), Y n(?0), Vn(?0) and V n(?0) are obtained from the reduced form of the model (2.5). Let yen(?0) = vec(Y e n (?0)) and vn(?0) = vec(Vn(?0)), it is readily seen that ye ?1n(?0) = Sn (?0, ?0)C ? 0xn and vn(?0) = S ?1 n (?0, ?0)R ?1 n (?0)?n. Recalling that vec(Y e en) = (IG ?Wn)yn, then it is furthermore readily seen that yn(?0) = (IG ?Wn)yn(?0) and vn(?0) = (IG ?Wn)vn(?0). Note that both vn(?0) and vn(?0) are linear in ?n. For the g-th equation, matrix Y e eg,n(?0) = Yn (?0)L?,g collects the mean of columns in Yg,n, i.e., the endogenous variables that appear on the right hand side of the g-th equation. Similarly, e e Y g,n(?0) = Y n(?0)L?,g collects the mean of columns in Y g,n, i.e., the spatially lagged endoge- nous variables that appear on the right hand side of the g-th equation. The stochastic components Vg,n(?0) and V g,n(?0) are defined analogously. We can then re-write the score w.r.t. to ?g and ?g as ?lnLn(?, ?) =Y e ? ? g.? g,n(?) Rg,n(?g) (? (?)? In)Rn(?)un(?)??g + [V ?g,n(?) Rg,n(? ) ?(?g.g (?)? In)Rn(?)un(?)? ??,g,n(?, ?)], ?lnLn(?, ?) e ? =Y g,n(?) ?R (? )?(?g.g,n g (?)? In)Rn(?)un(?) ??g + [V (?)?R (? )?(?g.g,n g,n g (?)? In)Rn(?)un(?)? ??,g,n(?, ?)], from which we see that the scores in (2.16) - (2.19) composed of either linear, or quadratic, or linear-quadratic forms in terms of ?n(?) = Rn(?)un(?). The following proposition collects the linear components of the scores and show that they represent valid moment conditions. Proposition 2. In light of above decomposition, the linear part of the scores (2.16),(2.17),(2.18) 29 can be collected as ?? ??? Y e? g,n(?) ?R ?g,n(?g) (? g. n (?)? In)?n(?) ? 1 ?l ? ? 1 mg,n(?) = e e ? g. n ???? Y g,n(?) ?Rg,n(? ) ?(?g.g n (?)? I = Zn)?n(?) ???? n ?g,n (?) (?n (?)? In)?n(?), X ? ? g.g,nRg,n(?g) (?n (?)? In)?n(?) [ ] (2.26) e where Ze?g,n(?) = R (? )Z e g,n g g,n(?) and Z e (?) = Y eg,n g,n(?), Y g,n(?), Xg,n . Under Assumption 4, Emlg,n(?0) = 0 and hence represent valid moment conditions. In addition to the information carried by the linear parts of the scores, the ML estimator also exploit the quadratic parts of the scores when it comes to identification (and estimation). It is well known to the spatial literature that the quadratic moment conditions could also help with identifying spatial autoregressive parameters when linear moments are weak.17 The following proposition collects the quadratic components of the scores and show that they also represent valid moment conditions, in addition to the linear ones. Proposition 3. In light of above decomposition, the quadratic part of the scores (2.16),(2.18) and (2.19) can be collected as ?? ? ? ??? mq,? ? ? ? ?g,n(?) ?? ?? Vg,n(?) Rg,n(?g) (?g.n (?)? In)?n(?)? ??,g,n(?, ?) ??? mq ? ? 1 ? ? g,n(?) =??? mq,?? g,n(?) ??? = ? ?n ?? V ? ? g. ? g,n(?) Rg,n(?g) (?n (?)? I )? (2.27) n n(?)? ??,g,n(?, ?) ???? mq,?g,n(?) U g,n(?g) ?(?g.n (?)? In)?n(?)? ??,g,n(?g) 1 = V ? g.?g,n(?) (?n (?)? In)?n(?)? ?g,n(?),n 17For a discussion, see, e.g., Kelejian and Prucha (1998), Lee (2007), Kuersteiner and Prucha (2020). 30 [ ] [ where V?g,n(?) = Rg,n(]?g)Vg,n(?), Rg,n(?g)V g,n(?), U g,n(?g) and ?g,n(?) = ? (?, ?) ? ?,g,n , ? ? ? ?,g,n(?, ?) , ??,g,n(?g) ? . Under Assumption 4, Emqg,n(?0) = 0 for g = 1, ..., G and hence represent valid moment conditions. 2.3.2 IV Interpretation and Estimator Generating Equations The implication of Proposition 2 is twofold. First, it shows that the linear part of the scores can be used to motivate a set of valid linear moments. Second, it implies that this set of linear moment conditions can be re-written to motivate IV estimators. To see the latter, let ? ? ??? ml? 1,n(?0) ??? ml (? ) = ??? .. ?n 0 ? . ???? mlG,n(?0) denote the sample moment vector obtained{from sta}cking the mlg,n(?) in (2.26) over G equations, and correspondingly let Zen(?) = diag G g=1 Z e g,n(?) . Then Eml 1 n(?0) = E Z e n(?0) ?Rn(?0) ?(??1(?0)? In)Rn(?0)(yn ? Zn?0) (2.28) n 1 = E Ze ? ?1?n(?0) (? (?0)? In)(y?n(?0)? Z?n(?0)?0) = 0,n with Ze e?n(?0) = Rn(?0)Zn(?0) and where the other Cochrane-Orcutt transformed matrices and vectors Z?n(?0) = Rn(?0)Zn and y?n(?0) = Rn(?0)yn are as defined in the previous section. In the spirit of Hendry (1976) and Prucha and Kelejian (1984), sample moment analogue of equation (2.28) can be viewed as an estimator generating equation in the following sense: given 31 some initial estimates ??n and ??n, it is readily seen that the sample analogue of (2.28) can be solved to yield ?? = [Ze (?? )?(???1 ? I )Z ]?1Ze (?? )?(???1n ?n n n n ?n ?n n n ? In)y?n(??n), (2.29) which defines the generic form of the IV estimators. Limited information estimators are obtained by setting ??n = IG. This result is an extension on the results of Hausman (1975) and Prucha and Kelejian (1984) in the context of classical SEMs. Within the context of the classical SEM, Hendry (1976) and Prucha and Kelejian (1984) observe that the 2SLS/3SLS and LIVE/FIVE estimators can be viewed as special cases of the class of estimators defined by the estimator generating equations corresponding to the ML score. Those estimators are also special cases of the generic form of the IV estimators discussed in Hausman (1975). Analogously, the our generic IV equation (2.29) also motivates the GS2SLS/GS3SLS considered in Drukker, Egger, and Prucha (2022) as well as the GSLIVE/GSFIVE estimators considered in this dissertation. As will be discussed in the next section, their difference lies in the approach of approximating the instruments Zen(?0). 2.3.3 Connection to LIVE/FIVE and Optimal IV Estimation As remarked in the introduction, the GSLIVE and the GSFIVE estimators incorporate ideas underlying the LIVE and the FIVE estimators introduced by Brundy and Jorgenson (1971) in the context of a classical SEM. To give additional background information on LIVE (FIVE) estima- tors, we briefly review in which way LIVE (FIVE) estimators differs from 2SLS (3SLS) estima- tors in the construction of the instruments. The optimal instruments are given by the expected 32 value of the endogenous variables, which are well know to be functions of the reduced form parameters. Since the reduced form parameters are unobserved they have to be estimated. The difference between the classical IV estimators is solely in the way the reduced form parameters are estimated. The LIVE (FIVE) estimators exploit the structure of the reduced form parame- ters implied by the reduced form model, while 2SLS (3SLS) estimators do not. The LIVE (and the FIVE) estimators employ consistent initial estimates of the structural parameters to compute the (estimated) expected value of the endogenous variables, which in turn are used to construct the instruments. The 2SLS (and the 3SLS) estimators simply estimate the expected values of the endogenous variables by running an OLS regression of the endogenous variables against the exogenous variables. As will become clear, similar differences distinguish the GS2SLS (GS3SLS) estimator con- sidered in Drukker, Egger, and Prucha (2022) and the GSLIVE (GSFIVE) estimator considered in the current paper in the context of network SEM. For network SEM, the optimal instruments Zeg,n(?0) involve the inverse of n ? n or even nG ? nG matrices, depending on the structure of the model. A further difference is that for numerical simplicity GS2SLS (GS3SLS) estimator do not fully exploit the structure of those inverse matrices. In that sense the GSLIVE (GSFIVE) estimator considered in the paper share related in spirit to Lee (2003) and Kelejian, Prucha, and Yuzefovich (2004) that consider IV estimation with optimal instruments in the context of single- equation spatial autoregressive models. Our estimators are in spirit close to Kelejian, Prucha, and Yuzefovich (2004), in that we also use a series approximation of the inverse matrices, to avoid computational issues with those matrices for large samples. In the following we provide a more formal discussion of these links between the new estimators considered in this dissertation and those previously considered in the literature. 33 2SLS and LIVE of Classical SEM Recall from the reduced form (2.5) of our model that Eyn = vec(EYn) = (InG?B?)?10 C?0xn with B? ? ?0 = B0 ? In + (?0 ? In)(IG ?Wn) and C?0 = C ?0 ? In. For classical SEM, ?0 = 0 and thus B?0 = B ? 0 ? In. Therefore, the reduced form model becomes [ ] Eyn = (InG ?B? ?1 ?0 ? In) C0x = (I ?B? )?1n G 0 ? I ?n (C0 ? In)xn and thus EYn = Xn?0 with ?0 = C ?1 180(IG ?B0) . In the context of classical SEM, the 2SLS and the 3SLS estimate EYn by an OLS regression of Y against X . Specifically, E?Y = X ?? where ?? = (X ?X )?1X ? Y .19n n n n n n n n n n LIVE and FIVE estimators employ an initial consistent estimator of the structural parameters, say ??n. Given this initial estimator they estimate EYn by E?Yn = Y en (??n) = Xn??n where ??n = C?n(IG ? B? ?1n) and thus utilize explicitly the structure of the reduced form model. In the vectorized form, we have 20 [ ] E?y = ye(?? ) = (I ? B?? ? I )?1C??x = (I ? B?? )?1 ? I (C? ?n n n nG n n n n G n n n ? In)xn. (2.30) The GS2SLS and GS3SLS estimators considered in Drukker, Egger, and Prucha (2022) differ from the GSLIVE and the GSFIVE estimators considered in this paper in a closely related manner. To see this observe that provided that ?B?0? < 1 for some induced matrix norm ?.? we 18Recall that vec(A1,nA2,n) = (A?2,n ? In)vec(A1,n) for any two conformable matrices A1,n and A2,n. 19Note that this is equivalent to the vectorized form E?yn = (???n ? In)vec(Xn). 20The first equality of (2.30) involves inversion of a nG ? nG matrix. This is not necessary for a classical SEM as suggested in the second equality, but it is only given for the ease of making a connection to the corresponding expression of E?yn for a SEM with network interactions in (2.32) in the following. 34 have (I ?B? ? nG 0) ?1 = ?s=0(B ? s 0) , and consequently the reduced form can be written as ?? Eyn = (I ? ?1 ? ? s ? nG ?B0) C0xn = (B0) C0xn. s=0 [ ] In light of the structure of B?0 and C ? ? ? ? ? 0 , for Wn = W1n,W2n, . . . ,WPn , we can then approximate for EYn as follows with finite S: ? ? ? EY ? . . . W s1 s2 sPn ? ?? ? l ,nW1 l2,n . . .Wl XP ,n n?(l1,s1),...,(l ,s ), (2.31)P Pl1,...,lP s1=0 sP=0 s1+...+sP?S where lp ? {1, . . . , P} and the elements of the matrices ?(l1,s1),...,(l ,s ) are functions of theP P structural parameters. Depending on the model not all combinations of products may appear in the approximation of EYn, or equivalently, some of the ?(l1,s1),...,(l ,s may be zero. As aP P ) motivating example, we give the explicit expression of approximated EYn up to the second order for a two equation system in the Chapter A.1. Analogously to 2SLS and 3SLS for the classical SEM, GS2SLS and GS3SLS do not ex- ploit the structure of above re(duced) form m[odel. Adopting a s]imilar notation from Drukker, Egger, and Prucha (2022) that SAs,n := A1,n, A2,n, . . . , AS,n for any conformable matricess=1 A1,n, . . . , AS,n, define ( )P X1,n = (Wj1,nXn ,j1=1 )P X2,n = Wj1,nWj2,nXn ,j1,j2=1 ... ( )P XR,n = Wj1,nWj2,n . . .Wj ,nXn ,R j1,j2,...,jR=1 35 [ ] and let HR,n = Xn, X1,n, . . . , XR,n . To approximate EYn, the GS2SLS and GS3SLS estima- tors run an OLS regression of Yn against a collection of the independent columns in [ ] HR,n,W1,nHR,n, . . . ,WP,nHR,n . Let ??(l1,s1),...,(l ,s ) denote the OLS estimator of ?(l1,s1),...,(l ,s ) from such a regression, then weP P P P can express the approximation of EYn as ? ? ? E?Yn = ? .?.?. s1 s2 ?Wl ,nWl ,n . . .W sP l ,nXn??1 2 P (l1,s1),...,(l .P ,sP ) l1,...,lP s1=0 sP=0 s1+...+sP?S The advantage of this approximation is that it is readily computable, but it is not consistent. The above approach differs from the GSLIVE and the GSFIVE estimators, which explicitly exploit the structure of parameters implied by the reduced form model. Given a consistent initial estimates ?? we estimate EYn as ( )?1 E?yn =y e n(??n) = I ? nG ? B?n C??nxn, (2.32) B??n =B? ? n ? In + (???n ? In)(IG ?Wn). IV Estimators of SAR models The GSLIVE (GSFIVE) estimators considered in this paper also relates to Lee (2003) and Kelejian, Prucha, and Yuzefovich (2004) who consider IV estimators with optimal instruments in the context of single-equation SAR model. To illustrate, consider the following SAR model with 36 first order spatial lags in both yn and un: yn =?Wnyn +Xn? + un un =?Mnun + ?n. The reduced form of yn is given by yn = (In ? ?W )?1n (Xn? + un) and thus the optimal instrument for yn is its mean Eyn = (In ? ?W )?1n Xn?. ? With |?| < 1, we can express (I ?1 ? s sn ? ?Wn) = s=0 ? Wn . As noted in Kelejian and Prucha (1998), Eyn can thus be expressed as a weighted sum of the sequence of matrices Xn, WnXn, W 2nXn, . . .. [Therefore, they define]d the IV matrix Hn that consists of linearly independent columns in Xn,WnXn, . . . , XSnXn for S being finite. Their first step IV estimator is ??n = (Z? ?nZ?n) ?1Z? ?nyn with Z?n = H (H ?H )?1 ?n n n HnZn and thus ??n is a 2SLS-type estimator. The GS2SLS and GS3SLS estimators of Drukker, Egger, and Prucha (2022) generalize this idea. Lee (2003) alternatively estimates for Eyn as y?n = (I ?1n ? ??nWn) Xn??n, assuming the availability of consistent initial estimates ??n and ??n. Their first step IV estimator is thus ??n = (Z? ? ?1 ?nZ?n) Z?nyn, where Z?n = [Wny?n, Xn]. Lee (2003)?s estimator exploits fully the nonlinear structure of the parameters in the reduced form model or in Eyn. Since the expression for y?n involves the inverse of n ? n matrix, Z?n may not be commutable in large sample. To overcome ?this difficulty, Kelejian, Prucha, and Yuzefovich (2004) used finite order approximation Wny?n =S ??sW s+1s=0 n n X??n in replace for Wny?n and their estimator remains feasible even when n is large. Note that Kelejian, Prucha, and Yuzefovich (2004) allows the order of series sum S to grow with 37 sample size and approach infinity. Asymptotically, their estimator is equivalent to that considered in Lee (2003). Recall that for our network SEM, E?y ?n given in (2.32) involves the inverse of InG ? B?n, which is an nG? nG matrix. Therefore, GSLIVE and GSFIVE could again be computationally infeasible when n becomes large. LIVE and FIVE e[stimators in the con]text of a classical SEM do not suffer from this issue since in this case E?yn = (I ? B??G n)?1 ? I C??n nxn and the dimension of I ?G ? B?n does not depend on the sample size n. To cope with the difficulty in computing (InG ? B??n)?1 for network SEM, we consider approximating (InG ? B??n)?1 with a geometric series of finite order. We defer the details to Section 2.4.2 where we introduce the approximated moment conditions. 2.4 Moment Conditions In this section, we present the heteroskedasticity-robust moment conditions based on Propo- sition 2 and Proposition 3. We will also give the series approximated version of these moment conditions that will be used in defining the GMM estimators in Section 2.5. 2.4.1 Heteroskedasticity-robust Moment Conditions Recall that by Assumption 4 we have E?n = 0 and ? ? ???? ?11In . . . ?1GIn ???? . . ?E?n? .n = ???? . . . . .. ??? ,? ?G1In . . . ?GGIn 38 where we drop the subscript zero for the true parameters for notational convenience. Now con- sider the case where the innovations are heteroskedastic in the sense that ? ? ???? ?11 . . . ?1G? ?? ? ? E? ?? = ? .n .. . . .. ?n ,? . . ???? ?G1 . . . ?GG { } where ? ngh = diagi=1 ?ii,gh denotes the true VC matrix under heteroskedasticity. In the follow- ing we introduce a modification of the moment conditions such that they remain valid under the above form of heteroskedasticity. For this discussion we assume the availability of some initial estimator ??g,n such that 1 ? ?gh = plimn?? ??g,n??h,nn with ??g,n = Rg,n(??g,n)(yg,n?Zg,n??g,n). For this discussion we will also assume that ? = (?gh) is known, while noting that for the empirical implementation ? will be replaced by the correspond- ing estimator with the (g, h)-th element given by n?1???g,n??h,n. Heteroskedasticity-robust Linear Moment Conditions In light of equation (2.28) and ?n = Rn(?0)un(?0), under heteroskedasticity, consider the following modified linear moments (at ?0): 1 ?1 mln(?0) = Z e(? )?n 0 Rn(? ) ? 0 (? (?0)? In)?n. (2.33) n It is straight forward to see that Emln(?0) = 0 even under heteroskedasticity, and thus they represent valid full information moment conditions even under heteroskedasticity. Of course, the 39 limited information counterpart for the g-th equation ml 1 g,n(?0, g) = Z e ? ? g,n(?0) Rg,n(?g,0) ?g,n (2.34)n also satisfies Emlg,n(?0, g) = 0 and thus represent valid limited information moment conditions under heteroskedasticity. Heteroskedasticity-robust Quadratic Moment Conditions Let An = (Agh,n) for g, h = 1, . . . , G be some non-stochastic nG ? nG matrix where the blocks Agh,n are of dimension n ? n. It is well known in the literature that to make a quadratic moment condition of the form 1 E ??g,nAgh,n?g,n = 0n robust against heteroskedasticity, we can set the diagonal elements of the matrix in the middle of the quadratic form, Agh,?n, equa?l to zeros. 21 For the system case we consider m?omen?t conditions of the?form 1 ?? ? nAn?n = 1 G G ?? 1 ? 1 G G ? n n g=1 h=1 g,n Agh,n?h,n, and thus E?nAn?n = E? A ? =n n g=1 h=1 g,n gh,n h,n 1 G G tr(A ?g=1{ h=1} gh,nE?h,n?g,n). We see immediately that 1E??nAn?n = 0 for any E?h,n? ? g,n =n n diagni=1 ?ii,hg , provided that the diagonal elements of Agh,n are zeros. For the following it proves convenient to introduce the following notation: ( ) ( ) MATd Bgh,n = Bgh,n ? diag(Bgh,n) and MATD (Bn) = (MATd Bgh,n ). ( ) That is MATd Bgh,n is the matrix obtained from Bgh,n by setting all diagonal elements equal to zero, and thus MATD (Bn) is obtained from Bn by setting the diagonal elements of each n ? n 21Here we let Agh,n denote the (g, h)-th n? n block of the nG? nG matrix An. 40 block equal to zero.22 Moreover, we note that the diagonal elements of each n? n block in ?1 ( ) ?1 (? (?0)? In)MATD (Bn) or MATD B?n (? (?0)? In) (2.35) are zeros.23 Hence, in light of above discussion, 1 ? ?1 1 ( ) ?1E?n(? (?0)? In)MAT ? ?D (Bn) ?n = E?nMATD Bn (? (?0)? In)?n = 0.n n To present our heteroskedasticity-robust quadratic moments, it proves helpful to first show that each element in the quadratic moments implied by (2.27) in Proposition 3 can indeed be expressed as a quadratic form ?n. In the vector of quadratic components of the scores (2.27), columns in V ?1g,n(?0) and V g,n(?0) are given by (n? 1 blocks of) vn(?0) = Sn (? , ? )R?10 0 n (?0)?n and vn(?0) = (IG ? Wn)vn(?0), respectively, and thus columns in Vg,n(?) and V g,n(?) are then seen to be weighted sums of ?g,n for g = 1, . . . , G. Also, recall that a typical column in U g,n is of the form Mq,nug,n and u ?1g,n = Rg,n(?g,0) ?g,n. In the following lemma we provide the details of how each individual element in mqg,n(?) can be expressed as a quadratic function of ?n(?). Lemma 1. Let Sh.n (?, ?) denotes the h-th n ? nG block of S?1n (?, ?) and ig,G denotes the g-th column of the identity matrix of dimension G. Then the element in mq,?g,n(?) that is associated with 22Recall the notational convention that Bn = (Bgh,n) where Bgh,n is the (g, h)-th n? n block of Bn. 23 ?1To see this, note that each n ? n block in (? (?0) ? In) is a diagonal matrix and the diagonal elements of each n?n block in MATD (Bn) are zeros by construction. Let An = (Agk,n) and Cn = (Ckh,n) be some nG?nG matrices, with the (g, k)-th n ? n block of An being Agk,n and the (k, h)-th n ? n block of Cn being Ckh,n. Furthermore, let Agk,n be a diagonal matrix and diagonal elements of Ckh,n be zeros (or?vice versa). Then the diagonal elements of Agk,nCkh?,n are zeros. Note that the (g, h)-th ? Gn n block in AnCn is k=1 Agk,nCkh,n and thus the diagonal elements of Gk=1 Agk,nCkh,n are also zeros. Since this holds for any (g, h)-th block in AnCn, the diagonal elements of each n? n block in AnCn are also zeros. 41 the score w.r.t. bhg can, upon replacing ? with ?, be written as 1 ?1 ( ) ?n(?) ?(? ? I )R (?) i h. ?1n n g,G ? Sn (?, ?) Rn (?)?n(?)? 1 tr(Shgn (?, ?)), (2.36)n n the element in mq,?g,n(?) that is associated with the score w.r.t. ?hg,p can be written as 1 ? ?1 ( )? (?) (? ? I )R (?) i ?W Sh. 1(?, ?) R?1(?)? (?)? tr(W Shgn n n g,G p,n n p,n n n n n n (?, ?)), (2.37) and the element in mq,?g,n(?) that is associated with the score w.r.t. ?g,q can be written as 1 ? ?1?n(?) (? ? I ?n)(ig,Gig,G ?M R?1 1 q,n g,n(?g))? (?)? tr(M ?1n q,nRg,n(?g)). (2.38)n n Proof of Lemma 1 is presented in Chapter A.2. Note that under heteroskedasticity of the ?n, i.e.,, E?n??n ?= (? ? In) the expected values of the moments defined in (2.36) - (2.38) are generally not zero. For example, under heteroskedasticity it is readily seen that in general 1 ?1 ( ) 1 E ??n(? ? In)Rn(?) i h.g,G ? Sn (?0, ?0) R?1n (?0)?n ? tr(Shgn (?0, ?0)) =? 0,n n observing that 1 ? ?1 ( )E (?n(? ? I h. ?1 n)Rn(?) ig,G(? Sn (?0, ?0) Rn (?0)?nn )? ?1 )=tr E? ? (? ? I )R (?) i ? Sh.(? , ? ) R?1n n n n g,G n 0 0 n (?0) ? 1 ( ) 1 = tr i h. hgg,G ? S n n (?0, ?0) = tr(Sn (?0, ?0)),n 42 where the inequality follows since under heteroskedasticity E?n??n ?= (?? In). This implies that in general Emqg,n(?0) ?= 0 and hence the moment vector does not yield a set of valid moment conditions under heteroskedasticity. Following the discussion at the beginning of this section, we can make the quadratic moment conditions robust to heteroskedasticity by restricting the diagonal elements of each of the matrices in the quadratic forms in (2.36),(2.37),and (2.38) to zeros. Full Information Quadratic Moments: In light of (2.36) in Lemma 1, the set of heteroskedasticity- robust quadratic moments originates from mq,?g,n(?) (i.e., the quadratic scores w.r.t. ? in (2.27)) can be written comp[actly as( [ ] ) ]G,G mq,? 1 ?1 n,R(?) = L ? ?vec ?n(?) ?(? ? I h. ?1n)MATD Rn(?)(ig,G ? Sn (?, ?))Rn (?) ?n(?) ,n h=1,g=1 (2.39) with L? = diagGg=1{L?,g}. Analogously, the set of heteroskedasticity-robust quadratic moments originate from the mq,?g,n(?) is ?? [? ( ) ] ? ?1 [ ] P,G ? L? ? h. ?1? ?,1vec ?n(?) (? ? In)MATD Rn(?)(i1,G ?Wp,nSn (?, ?))Rn (?) ?n(?) ?? p=1,h=1 ?mq,? 1(?) = ?? [ ..n,R n ? ( [ . ) ] ??? ?? . ?1 P,G L? vec ? (?)?(? ? I )MAT R? (?)(i ?W Sh.(?, ?))R? ] ? 1 ?,G n n D n G,G p,n n n (?) ?n(?) p=1,h=1 (2.40) Finally, the set of heteroskedasticity-robust quadratic moments originate from mq,?g,n(?) is [( [ ] ) ]Q,G mq,? 1 ?1 n,R(?) = L ? ?vec ?n(?) ?(? ? In)MATD i ?g,Gig,G ?M ?1q,nRn g,n (?) ?n(?) , q=1,g=1 (2.41) with L? = diagGg=1{L?,g}. 43 Note that each element in mq,?n,R(?) is of the generic form 1 ? (?)?n An?n(?) where the diagonaln ?1 elements of each the n?n blocks in An = (? (?0)??In)MA?TD (Bn) are zeros as dis{cussed}before Lemma 1. Hence, at ?0, we have E 1 ?? 1 G G n n n An?n = n g=1 h=1 tr(Agh,ndiagi=1 ?ii,hg ) = 0 where ? q,?ii,hg = E?i,h,n?i,g,n. Thus Emn,R(?0) = 0 holds under heteroskedasticity and represents valid quadratic moment conditions. Following analogous arguments, we also have Emq,?n,R(?0) = 0, Emq,?n,R(?0) = 0 and hence all represent valid moment conditions. Stacking m q,? n,R(?), m q,? n,R(?) and mq,?n,R(?) together, we can obtain the vector of heteroskedasticity-robust (full information) quadratic moments for the whole system as [ ]? mqn,R(?) = m q,? ? q,? ? q,? ? n,R(?) ,mn,R(?) ,mn,R(?) . Limited Information Quadratic Moments: If one ignores the cross-equation error struc- ture in mqn(?), we can then obtain the following limited information moment vector for the g-th equation: ?? [( [ ] ) ] ?? G? L?? ?,gvec ? (? ) ?MAT R (? )Shgg,n g d g,n g n (?, ?)R ?1 g,n(?g) ? ?g,n(?g) ? ?h=1 ???? [( ) ] ?? [ ] ?? ? ? ?? P,G ?q 1m (?, g) = ? L? vec ? (? )?MAT R (? )W Shgg,n,R ? ?,g g,n g d g,n g p,n n (?, ?)R ?1 g,n(?g) ? ?g,n(?g) ? .n p=1,h=1 ? ??? ? ??? [( ?[ ] ) ] ??Q ?? L??,gvec ?g,n(?g) ?MAT M R?1d q,n g,n(?g) ?g,n(? ) ?g q=1 ( (2.)42) Observing that each element in mq (?, g) is of the generic form 1g,n(,R ) ( ?)g,n(? ? g) MA{T Bn d }gg,n ?g,n(?g) and thus at ?g,0, we have E 1 ??g,nMAT B ? 1 d gg,n g,n = tr(MATd B n n n gg,n diagi=1 ?ii,gg ) = 0 44 where ?ii,gg = E? q i,g,n?i,g,n. Hence, Emg,n,R(?0, g) = 0 and thus represent valid quadratic mo- ment conditions under heteroskedasticity. As known to the literature, GMM estimators based on both the linear and quadratic moments may remain consistent when linear moments alone are insufficient in identifying parameters. When identification through linear moments are weak, GMM estimators based on both the linear and the quadratic moments may outperform those utilize only the linear moments. As remarked, to ensure our GMM estimators remain feasible when sample size n gets large, we would consider the approximated version of these moments, as discussed in the following section. 2.4.2 Approximated Moments Recall that E?yn in (2.32) involves inversion of an nG ? nG matrix I ?nG ? B?n, and thus may not be computable when n is large. This issue also plagues ML estimators, in light of the scores (2.16) - (2.19). In general, estimators involves inversion of matrices whose dimensions depend on sample size n could be computationally infeasible when n becomes large. To cope with this problem, we approximate for the inverse by a geometric series of finite order S, in spirit to Kelejian, Prucha, and Yuzefovich (2004), and obtain approximated moment conditions from the heteroskedasticity-robust linear and quadratic moments derived in above. We will adopt these approximated versions of linear and quadratic moment conditions to construct the GMM estimators, in order to overcome the computational issue involved with inversion of matrices whose dimensions depend(on sample)size n. Since ?1 ?1S (?, ?) = I ? ? ? ?n nG ? Bn with Bn = Bn ? In + (?n ? In)(IG ? Wn), one can 45 approximate S?1n (?, ?) with SAn (?, ?) = I ? ? S nG +Bn + ...+ (Bn) , for S being finite.24 Following our notational conventions, we let Shg,An (?, ?) denote the (h, g) -th n ? n block of SA(?, ?) and Sh.,An n (?, ?) denote the h-th n ? nG block of SAn (?, ?), for h, g = 1, . . . , G. Thus, the series approximation to Ey = S?1n n (?, ?)vec(XnC) in defining the approximated GSLIVE and GSFIVE estimators is expressed as ?S ye,An (?) = (B ? n) svec(XnC). s=0 e e Note that by definition, Y eg,n(?) and Y g,n(?) in the matrix of instruments Z e g,n(?) = [Y e g,n(?), Y g,n(?), Xg,n] e depend on yen(?). Thus, we denote the approximated versions of Y e g,n(?) and Y e,A g,n(?) as Yg,n (?) e,A e,A and Y g,n(?) and the{approxim}ated version of Zeg,n(?) as Ze,A e,Ag,n (?) = [Yg,n (?), Y g,n(?), Xg,n] and Ze,An (?) = diag G g=1 Z e,A g,n (?) . ( ? ) We also approximate R?1 ?1g,n(?) = In ? q?I ?q,gMq,n with some finite matrix polyno-q,g mial as: ? ? RAg,n(?) = In + ( ?q,gMq,n) + ...+ ( ?q,gM S q,n) . q?Iq,g q?Iq,g Approximated Linear Moments In light of the linear moment conditions in Proposition 2 and equation (2.34), the (limited information) linear moments with approximated instruments for the g-th equation can be written 24We find that a relatively low number of S would render rather accurate approximations. In our Monte Carlo simulations, setting S = 15 makes approximation error to be at 10? 6 scale. 46 as ml,A 1 (?, g) = Ze,A(?)?g,n g,n Rg,n(? ? g) ?g,n(?g). (2.43) n Moreover, in light of equation (2.33), the vector of (full information) linear moments with ap- proximated instruments is then ml,A 1 e,A ?1 n (?) = Zn (?) ?Rn(?) ?(? ? In)?n(?). (2.44) n As will be shown in Section 2.5, ml,Ag,n(?, g) and ml,An (?) will be used to construct the approxi- mated GSLIVE and the approximated GSFIVE estimators. Approximated Limited Information Quadratic Moments For the limited information quadratic moments mqg,n,R(?, g) in (2.42), its approximated ver- sion is expressed a?s? [? ( [ ] ) ] ? G ? L? vec ? (? )?MAT R (? )Shg,A? ?,g g,n g d g,n g n (?, ?)R A ? g,n(?g) ?g,n(?g) ? ?? h=1 ?? ???? [( [ ] ) ] ? ????P,G mq,A 1 ? g,n,R(?, g) = ??? L?,gvec ? (? ) ?MAT R (? )W Shg,A A ?g,n g d g,n g p,n n n (?, ?)Rg,n(?g) ?g,n(?g) ? . ?? p=1,h=1 ?? ?? ??? [( [ ] ) ] ?? ? Q ? L??,gvec ? (? ) ? g,n g MATd M A ? q,nRg,n(?g) ?g,n(?g) q=1 (2.45) Approximated Full Information Quadratic Moments To obtain the approximated version of, say [ ] ?1 ?n(?) ?(? ? I )MAT R (?)(i ? Sh.(?, ?))R?1n D n g,G n n (?) ?n(?) 47 ( ) in mq,?n,R(?), we would replace MATD Rn(?)(i h. g,G ? Sn (?, ?))R?1n (?) with [ ] MATD Rn(?)(i ? Sh.,Ag,G n (?, ?))RAn (?) , where RAn (?) and S h.,A n (?, ?) are the approximated version of R ?1 n (?) and S h., n (?, ?) introduced above. The approximated version of mq,?[ n,R(?) in (2.39) is then given by( 1 [ ] ) ]G,G mq,?,A ? ? ?1 h.,A A n,R (?) = L?vec ?n(?) (? ? In)MATD Rn(?)(ig,G ? Sn (?, ?))Rn (?) ?n(?) ,n h=1,g=1 (2.46) Analogously, to obtain the approximated version of, e.g., [ ] ? ?1?n(?) (? ? In)MATD Rn(?)(ig,G ?Wp,nSh.n (?, ?))R?1n (?) ?n(?) in mq,?n,R(?), we would replace the ?middle matrix? with [ ] MATD Rn(?)(ig,G ?W Sh.,Ap,n n (?, ?))RAn (?) . The approximated version of mq,?g,n(?) in (2.40) is then given by mq?,?,An,R (?)? [? (? ?1 [ ] ) ] ? P,G ?? L?,1vec ?n(?) ?(? ? I )MAT R (?)(i h.,A An D n 1,G ?Wp,nSn (?, ?))Rn (?) ?n(?) ? ? p=1,h=11 ? ? ? = n ?? [ .. ?( [ .?1 ] ) ] ??? . P,G ? ? ? ?L?,Gvec ?n(?) (? ? In)MATD Rn(?)(i h.,AG,G ?Wp,nSn (?, ?))RAn (?) ?n(?) p=1,h=1 (2.47) 48 [ ] ?1 Finally, for the element ?n(?)?(? ? I ? ?1n)MATD ig,Gig,G ?Mq,nRg,n(?) ?n(?) in m q,? n,R(?), we replace the ?middle matrix? with [ ] MATD i ? A g,Gig,G ?Mq,nRg,n(?) , and the approximated ve[rsion of m q,? ( g,n (?) in (2.41) is then given by [ ] ) ]Q,G mq,?,A 1 ? ?1 n,R (?) = L?vec ?n(?) ?(? ? In)MAT ? AD ig,Gig,G ?Mq,nRg,n(?) ?n(?) ,n q=1,g=1 (2.48) Stacking mq,?,A(?), mq,?,A q,?,An,R n,R (?) and mn,R (?) together, we can obtain the vector of approximated heteroskedasticity-robust (full information) quadratic moments for the whole system as [ ]? mq,A q,?,A ? q,?,A ? q,?,A ?n,R(?) = mn,R (?) ,mn,R (?) ,mn,R (?) . 2.5 LIVE and FIVE Estimators for Network SEM In this section, we define the limited and full information estimators based on the approxi- mated linear moment conditions (2.43) and (2.44) as well as the approximated quadratic moment conditions derived in the previous section. In particular, we define and present the implementation steps of our GSLIVE and GSFIVE estimators for ?0 (along with the efficient GMM estimators for ?0), as well as the One-Step GMM estimators LQ-GSLIVE and LQ-GSFIVE for ?0. Generic forms of the GSLIVE and the GSFIVE Estimators Recall that the underlying form of the generic moment conditions defined in (2.28). The 49 generic full information moment conditions are given by 1 E Zen(?0) ?R ? ?1n(?0) (? (?0)? In)Rn(?0)(yn ? Zn?0) = 0. (2.49) n The generic limited information moment conditions are obtained by replacing ??1(?0) with the identity matrix: 1 E Ze ? ?g,n(?0) Rg,n(?0) Rg,n(?0)(yg,n ? Zg,n?0,g) = 0, g = 1, ..., G. (2.50)n As discussed in Section 2.3.2, the generic forms of full information and limited information IV estimator can then be derived by solving for the sample analogues of equations (2.49) and (2.50), respectively.25 Our GSLIVE and GSFIVE estimators are special cases of the generic limited and full in- formation IV estimator obtained in this fashion. They are obtained by solving the estimator generating equations implied by the heteroskedasticity-robust linear moment conditions in (2.43) and (2.44), respectively. Specifically, the generic form of the GSLIVE estimator is given by [ ]?1 ? e,A ?g,n = Z?g,n(?0) Z?g,n(? ) Z e,A (? )?g,0 ?g,n 0 y?g,n(?g,0), (2.51) and the generic form of the GSFIVE estimator is given by [ ]?1 ? = Ze,An ?n (?0) ?(??10 ? I e,An)Z?n(?0) Z?n (? )?(??10 0 ? In)y?n(?0). (2.52) 25Specifically, the resulting generic forms of the limited information and full information IV estimators are ??g,n = [Ze?g,n(??n) ?Z ?1 e ??g,n] Z?g,n(??n) y?g,n(??g,n) and ??n = [Z e ?n(??n) ?(???1n ? In)Z ]?1Ze (?? )?(???1?n ?n n n ? In)y?n(??n). 50 2.5.1 Limited Information Estimators We now discuss, in a sequence of steps, the implementation details of the GSLIVE estimator of ?g,0 and a GMM estimator of ?g,0 based on the first stage residuals. We will then also define the One-Step GMM estimator of ?g,0 (i.e., the LQ-GSLIVE estimator) that utilizes both the (limited information) linear and quadratic moments. Step 1a: GSLIVE estimator of ?g Let ??n be some consistent initial estimates of ?0, e.g., the GS2SLS estimator considered in Drukker, Egger, and Prucha (2022). We can then compute the Cochrane-Orcutt transformed ma- trices Z?g,n(??g,n) = Rg,n(??g,n)Zg,n, Ze,A e,A?g,n(??n) = Rg,n(??g,n)Zg,n (??n) and y?g,n(??g,n) = Rg,n(??g,n)yg,n. In light of the generic form (2.51), our GSLIVE estimator for ?g,0 can be defined as [ ]?1 ?? = Ze,Ag,n ?g,n(?? ) ?Z (?? ) Ze,A ?n ?g,n g,n ?g,n(??n) y?g,n(??g,n). (2.53) We shall also utilize the following estimator for the variance of the limiting distribution of ??g,n: [ ]?1 ????gg,n(g) = ?? n ?1Ze,Agg,n ?g,n(?? ) ?Ze,An ?g,n(??n) , where ?? 1 ?gg,n = ?? ?? with ??n g,n g,n g,n = y?g,n(??g,n)? Z?g,n(??g,n)??g,n. Step 1b: Efficient GMM estimator of ?g based on GSLIVE residuals Let u?g,n = yg,n?Zg,n??g,n denote the residuals of the gth equation based on GSLIVE estimate ??g,n obtained in Step 1a. In light of the moment vector mq,Ag,n,R(?, g) in (2.45), we can write the 51 (limited information) sample moments with first stage residuals u?g,n as [( [ ] ) ]Q m?n(?g, ??g,n, ??g,n, g) = L ? ?,gvec u? ? g,nRg,n(?g) ?MAT Ad Mq,nRg,n(??g,n) Rg,n(?g)u?g,n . q=1 (2.54) The efficient GMM estimator of ?g is then defined as ?? = argminm?(? , ?? , ?? , g)?(???? ?1 ?g,n n g g,n g,n gg,n(g)) mn(?g, ??g,n, ??g,n, g), (2.55) ?g where ????gg,n(g) is an estimator of the VC matrix of the limiting distribution of the normalized sample moments m?n(?g, ??g,n, ??g,n, g). For r, s ? Ig,? (where Ig,? is the index set defined in Assumption 2), the rs-th element of ????gg,n(g) is given by 2 (?? [ ] ???? gg,n rs,gg,n(g) = ( tr MATd M A r,nRg,n(??g,n) )) (2.56)n [ ] [ ]? MAT M RAd s,n g,n(?? A g,n) + MATd Ms,nRg,n(??g,n) + ??? ????g,r,n gg,n(g)??g,s,n, with [ ( 1 [ ] [ ] ) ]? ??g,r,n = ? Ze,A?g,n(??n)? MAT A Ad Mr,nRn g,n (??g,n) + MATd Mr,nRg,n(??g,n) Rg,n(??g,n)u?g,n . Above expression of ????rs,gg,n(g) is derived in light of Theorem 2 (and Theorem 5) in Drukker, Egger, and Prucha (2022). Let ???gg,n(g) denote the asymptotic VC matrix of the sample moment vector m?n(?g, ??g,n, ??g,n, g). As remarked in Drukker, Egger, and Prucha (2022), the second term 52 in (2.56) stems from the fact that the sample moment vector depends on estimated residuals u?g,n. The LQ-GSLIVE Estimator Complementing the approximated limited information linear moments (2.43) with the ap- proximated quadratic moments mq,Ag,n,R(?, g) based on equation (2.45) and let ??n denote some consistent initial estimate for ?0, the vector of the linear-quadratic sample moments can then be formed as mn?(?g, ??n, g) ? ??? Ze,A(?? )[?? g,n n R(g,n(?? ) ? g ?g,n(?g) ? [ ] ) ] ??? ? G ??? L ? ? hg,A A ? ?,gvec ?g,n(?g) MATd Rg,n(??g,n)Sn (??n, ??n)Rg,n(??g,n) ?g,n(?g) ? ?? h=1 ?? ?1 ??? [( [ ] ) ]= ? P,G ? ??? . n ? ?L? vec ? (? )?MAT R (?? )W Shg,A A? ?,g g,n g d g,n g,n p,n n (??n, ??n)Rg,n(??g,n) ?g,n(? ?g)? ?? p=1,h=1 ???? ? ? [( [ ] ) ] ? ???Q ? L? vec ? (? )??,g g,n g MATd Mq,nR A g,n(??g,n) ?g,n(?g) ? q=1 The corresponding efficient GMM estimator can be defined as ?? ? ?1g,LQ,n = argminmg,n(?g, ??n, g) (??gg,n(g)) mg,n(?g, ??n, g), (2.57) ?g where ??gg,n(g) denotes an estimator for the asymptotic variance covariance matrix of normal- ? ized sample moments nmn(?g, ??n, g). Specifically, let the residuals be ??g,n = y?g,n(??g,n) ? 53 Z?g,n(??g,n)??g,n and ??gg,n = 1 ???g,n??g,n. The estimated VCV matrix is thenn ? ? l ???? ?? ?gg,n(g) 0 ???gg,n(g) = ?? , 0 ??qgg,n(g) where l ??gg,n??gg,n(g) = Z e,A g,n (?? ) ? n Rg,n(?? ) ? g,n R e,A g,n(??g,n)Z n g,n (??n) is the block corresponding to the linear moments and ? ? ?? ?? ?? q ??? ? ??gg,n(g) ?? ?gg,n(g) ??gg,n(g) ??? ?? (g) = ??? ????? (g) ???? (g) ????gg,n ? gg,n gg,n gg,n(g) ????? ???? ?? ? ?? gg,n(g) ??gg,n(g) ??gg,n(g) is the block corresponding to the quadratic moments. For completeness, the explicit expression of each block in ??qgg,n(g) are discussed in Chapter A.3. Due to limited space, we only present the general forms of the individual elements in ??qgg,n(g) in below. A typical element in ????gg,n(g) is of the form ( ??2 [ ]( [ ]gg,n tr MAT R (?? )Sr1g,Ad g,n g,n n (??n, ??n)R A r2g,A A n g,n (??g),n) MATd Rg,n(??) g,n )Sn (??n, ??n)Rg,n(??g,n) [ ]? +MATd Rg,n(??g,n)S r2g,A n (??n, ??n)R A g,n(??g,n) , 54 where r1, r2 ? {1, . . . , G}; a typical element in ????gg,n(g) is of the form ( ( ??2 [ ] [ ]gg,n tr MATd Rg,n(??g,n)S r1g,A n (??n, ?? A n)Rg,n(??g,n) )MATd Rg,n(??g,n)Wp,nS r2g,A(?? , ?? )RAn n (??g,n) n n g,n[ ] )? +MATd Rg,n(?? r2g,A A g,n)Wp,nSn (??n, ??n)Rg,n(??g,n) , where r1, r2 ? {1, . . . , G}, p ? {1, . . . , P}; a typical element in ????gg,n(g) is of the form ( ??2 [ ]( [ ] [ ] ))?gg,n tr MAT R (?? )Srg,A(?? , ?? )RAd g,n g,n n n n g,n(??g,n) MATd Mq,nR A g,n(??g,n) + MATd M A q,nRg,n(??g,n) ,n where r ? {1, . . . , G}, q ? {1, . . . , Q}; a typical element in ????gg,n(g) is of the form ( ??2 [ ]( [ ]gg,n tr MATd Rg,n(?? r1g,A A r2g,A A g,n)Wp n 1 ,nSn (??n, ??n)Rg,n(??g),n) MAT R) d g,n (??g,n)Wp2,nSn (??n, ??n)Rg,n(??g,n) [ ]? +MATd Rg,n(??g,n)W S r2g,A A p2,n n (??n, ??n)Rg,n(??g,n) , where r1, r2 ? {1, . . . , G}, p1, p2 ? {1, . . . , P}; a typical element in ????gg,n(g) is of the form ??2 ( [ ] gg,n ( tr MAT R rg,A d g,n(??g,n)Wp,nSn (??n, ??n)R A g,n(??[ ] [ ]g,n )n )) ? MAT M RAd q,n g,n(??g,n) + MATd M A q,nRg,n(??g,n) , where r ? {1, . . . , G}, p ? {1, . . . , P} and q ? {1, . . . , Q}; a typical element in ????gg,n(g) is of the form ( ( ??2 [ ] [ ] [ ] ))?gg,n tr MAT A A Ad Mq1,nRg,n(??g,n) MATd Mq2,nRg,n(??g,n) + MATd Mq2,nRg,n(??g,n) ,n 55 where q1, q2 ? {1, . . . , Q}. 2.5.2 Full Information Estimators We now define, in a sequence of steps, the implementation details of the GSFIVE estimator of ?0 and a GMM estimator of ?0 based on the first stage residuals. We will then also define the One-Step GMM estimator of ?0 (i.e., the LQ-GSFIVE estimator) that utilizes both the (full information) linear and quadratic moments. Step 2a: GSFIVE estimator of ? As in above defining the limited information estimators, let ??n be some consistent initial estimates of ?0. We can then compute the Cochrane-Orcutt transformed matrices Z?n(??n) = Rn(??n)Zn, Ze,A?n (??n) = Rn(??n)Z e,A n (??n) and y?n(??n) = Rn(??n)yn. In light of the generic form (2.52), our GSFIVE estimator for ?0 can be defined as [ ] ?? ?1 ? = Ze,A(?? )?(???1n ?n n n ? In)Z?n(?? e,A ? ?1n) Z?n (??n) (??n ? In)y?n(??n) (2.58) where the gh-th element of ??n is given by ?? 1 ?gh,n = ??g,n??h,n and ??g,n = y?g,n(??n g,n)?Z?g,n(??g,n)??g,n. We shall also utilize the following estimator for the variance of the limiting distribution of ???n: [ ]?1 ?? ??? = n?1Ze,A(?? )? ?? n ?n n (? ?1 n ? I )Ze,An ?n (??n) , with the gh-th element of ???n being ??? 1 ??? ??gh,n = ?g,n?h,n with ?? ???g,n = y?g,n(??g,n) ? Z?g,n(??g,n)?g,n.n We denote the ??(g, h)-th blocks of ??? as ?????n gh,n. Step 2b: Efficient GMM estimator of ? based on GSFIVE residuals 56 Let u??g,n = yg,n ? ??Zg,n?g,n denote the residuals of the gth equation based on the GSFIVE estimate ???n obtained in Step 2a, and ???g,n(?g) = Rg,n(?g)u??g,n. In light of the moment vector (2.45), we can then write the (limited information) sample moments with first stage residuals u??g,n as [( [ ] ) ]Q ?? m?n(?g, ?g,n, ?? , g) = L ? vec u??? ? A ??g,n ?,g g,nRg,n(?g) MATd Mq,nRg,n(??g,n) Rg,n(?g)ug,n . q=1 (2.59) The efficient GMM estimator of ?g is then defined as ?? ?? ?? ???g,n = argminm ?(? , ? , ?? , g)?(???n g g,n g,n gg,n(g)) ?1m?n(?g, ?g,n, ??g,n, g), (2.60) ?g where ?????gg,n(g) is an estimator of the VC matrix of the limiting distribution of the normalized sample moments ??m?n(?g, ?g,n, ??g,n, g). For r, s ? Ig,? (where Ig,? is the index set defined in Assumption 2), the rs-th element of ?????gh,n(g) is given by ( ???? ? ??2 [ ] gh,n ?rs,gh,n(g) = ( tr MATd Mr,nR A g,n(??g,n) )) (2.61)n [ ] [ ]? MAT M RAd s,n g,n(??g,n) + MAT A d Ms,nRg,n(??g,n) ??? ??+ ? ?? ??g,r,n?gh,n?h,s,n, with [ ( ) ] ?? 1 [ ] [ ]? ?g,r,n = ? Ze,A ??g,n(??n) MAT A A ??d Mr,nRg,n(??g,n) + MATd Mr,nRg,n(??g,n) Rg,n(??g,n)ug,n .n 57 The LQ-GSFIVE Estimator Complementing the approximated full information linear moments (2.44) with the approx- imated quadratic moments mq,?,An,R (?) based on (2.46) - (2.48) in Section 2.4.2, the vector of the linear-quadratic sample moments can then be formed as [ ]? mn(?, ??n) = m l (?, ?? ? ? ? ? ?n n) ,mn(?, ??n) ,mn(?, ??n) ,m ?(?, ?? )?n n , where ??n denotes some consistent initial estimates of ?0. The vector of (approximated) linear sample moments is 1 ml (?, ?? ) = Ze,A(?? )?R (?? )?n n n n n n (?? ?1 n ? In)?n(?),n and the vectors of (approximated) quadratic sample moments are [( [ ] ) ]G,G m? 1 (?, ?? ) = L? vec ? (?)?n n ? R n (?? ?1 n ? In)MAT R (?? h.,A AD n n)(ig,G ? Sn (??n, ??n))Rn (??n) ?n(?) ,n g=1,h=1 where the gh-th elements in ??n is given by ?? = 1 ?gh,n ??g,n??h,n with ??g,n = y?g,n(??n)?Z?g,n(??n)??n g,n; m?n?(?, ??n)? [? ( [ ] ) ] ? P,G ?? L ? ?,1vec ?n(?) ?(???1n ? In)MAT R (?? )(i ?W Sh.,AD n n 1,G p,n n (??n, ??n))RAn (??n) ?n(?) ?? ?? p=1,h=1 ?1 ?? ? . ?= n ?? [( [ .. ] ) ] ???? ,P,G L? vec ? (?)?(???1?,G n n ? In)MATD R?n(??n)(iG,G ?W h.,A A ?p,nSn (??n, ??n))Rn (??n) ?n(?) p=1,h=1 58 and [( ) ]G,Q ? 1 [ ] m (?, ?? ) = L?n n ?vec ? ? ?1 R n(?) (??n ? In)MAT i ?D g,Gig,G ?M Aq,nRg,n(??n) ?n(?) .n g=1,q=1 The corresponding efficient GMM estimator can be defined as ?? ? ?LQ,n = argminmn(?, ??n) ?? ?1 n mn(?, ??n) (2.62) ? where ??n denotes an estimator for the asymptotic variance covariance matrix of normalized ? sample moments nmn(?, ??n). Specifically, ? ? ???? ?? l 0 ?n ??n = ??? , 0 ??qn where l 1?? = Ze,A(?? )?R (?? )?(???1 ? I )R (?? )Ze,An n n n n n n n nn n (??n) is the block corresponding to the linear moments, and the block corresponding to the quadratic moments is of the form ?? ??? ???? ????? n n ?? ?? ? n ??qn = ???? ??? ???? ? ???? ???? ?n n n ??? ???? ? ???? ? ????n n n with each sub-matrix consists of G?G sub-blocks. For completeness, the explicit expressions of each block in ??qn are given in Chapter A.3. 59 Due to limited space, we only present the general forms of the individual elements in ??qn in below. A typical element in the gh-th block of ????n is of the form ( ( 1 [ ] [ ] tr MAT R (?? )(i ? Sr1.,A(?? , ?? ))RA(?? ) MAT R (?? )(i ? Sr2.,AD n n g,G n n n n n D n n h,G) n (??n, ??n))R A n (??n)n [ ] )? +(?? r2.,An ? In)MATD Rn(??n)(ih,G ? Sn (??n, ??n))RAn (?? ) (???1n n ? In) , for r1, r2 ? {1, . . . , G}; a typical element in the gh-th block of ????n is of the form ( 1 [ ]( [ ] tr MATD Rn(?? )(i ? Sr1.,An g,G n (?? , ?? ))RAn n n (??n) MATD R r2.,An(??n)(ih,G ?W)p,nSn (??n, ??n))R A n (??n)n [ ] )? +(??n ? In)MATD Rn(?? )(i ?W Sr2.,An h,G p,n n (?? , ?? ))RAn n n (?? ) (???1n n ? In) , for r1, r2 ? {1, . . . , G}, p ? {1, . . . , P}; a typical element in the gh-th block of ????n is of the form ( [ ](1 [ ] tr MATD Rn(??n)(i ? Sr.,Ag,G n (?? , ?? ))RAn n n (??n) MAT i i?D )h,G h,G ?M R A ) q,n h,n (??h,n) n [ ]? +(?? ? I )MAT i i?n n D h,G h,G ?Mq,nRAh,n(?? ) (???1h,n n ? In) , for r ? {1, . . . , G}, q ? {1, . . . , Q}; a typical element in the gh-th block of ????n is of the form ( ( 1 [ ] [ ] tr MAT r1.,A AD Rn(??n)(ig,G ?Wp1,nSn (??n, ??n))Rn (??n) MATD Rn(??n)(i r2.,Ah,G)?Wp2,nSn (??n, ?? ))R A ) n n (??n) n [ ]? +(??n ? In)MATD Rn(?? )(i ?W Sr2.,An h,G p2,n n (??n, ??n))RAn (??n) (???1n ? In) , for r1, r2 ? {1, . . . , G}, p1, p2 ? {1, . . . , P}; a typical element in the gh-th block of ????n is of 60 the form ( ( 1 [ ] [ ] tr MATD R r.,A A ? A n(??n)(ig,G ?Wp,nSn (??n, ??n))Rn n (??n) MA)TD i) h,G ih,G ?Mq,nRh,n(??h,n) [ ]? +(??n ? In)MAT i i?D h,G h,G ?Mq,nRA ?1h,n(??h,n) (??n ? In) , for r ? {1, . . . , G}, p ? {1, . . . , P}, q ? {1, . . . , Q}; a typical element in the gh-th block of ????n is of the form ( ( 1 [ ] [ ] tr MAT i i? ?M RA (?? ) MAT i i? ?M RAD g,G g,G q1,n g,n g,n D h,G h,G )q2,n h,n(??) h,n ) n [ ]? +(??n ? I ?n)MATD ih,Gih,G ?M Aq2,nRh,n(??h,n) (???1n ? In) , for q1, q2 ? {1, . . . , Q}. 2.6 Identification Condition To discuss the identification conditions via the linear moment conditions, we consider the following general system of G equations with spatial lags in both the endogenous variables and the disturbance process. Some parts of the discussion follow closely the relevant discussion in the Appendix F of Drukker, Egger, and Prucha (2022). Recall from (2.4) that the stacked system can be written as yn = B ? 0yn + C ? 0xn + un, un = P ? 0 un + ?n, 61 where B?0 = (B ? 0 ? In) + (??0 ? In)(IG ?Wn), C?0 = C ? ? I , and P ? = (P ?0 n 0 0 ? I 26n)(IG ?Mn). Assuming invertability of Sn(? ?0, ?0) = InG ? B0 and Rn(?0) = I ?nG ? P0 , the reduced form of the system is given by yn = S ?1 n (?0, ? ? 0) (C0xn + un) , un = R ?1 n (?0)?n. The assumption on ?n implies E?n = 0 and E?n??n = ?0?In, the expected value and the variance of yn are given by Eyn = S ?1 n (?0, ?0)C ? 0xn, (2.63) ? ? Var(y ) = S?1n n (?0, ? )R ?1 0 n (?0)(? ? I ?1 ?10 n)Rn(?0) Sn(?0, ?0) . (2.64) [ ] [ ] Recall that Zg,n = Yg,n, Y g,n, Xg,n and thus the best instruments for Zg,n is EZg,n = EYg,n,EY g,n, Xg,n . In light of the linear moments mlg,n(?) in (2.26) as well as those in Drukker, Egger, and Prucha (2022), we let Hg,n denote (generically) the matrix of instruments for the linear moments corre- sponding to the g-th equation. The linear moments conditions then follow the general form EH ?g,nug,n = 0. 26As before, we let yn = [y?1,n, . . . , y ? ? G,n] , u = [u ? n 1,n, . . . , u ? ? ? ? ? G,n] and ?n = [?1,n, . . . , ?G,n] be the stacked vector of the endogenous variables, the disturbances of the structural equation and the disturbances, respectively. Let x ? ? ?n = [x1,n, . . . , xK,n] denote the stacked vector of the K exogenous variables. The parameter matrices B0 = (blg,0)G?G (simultaneous effects), ?0 = (?lg,s,0)PG?G (spatial autoregressive parameters), C0 = (clg,0)K?G (parameters on exogenous variables), and P0 = (?g,q,0)QG?G (spatial autoregressive parameters in the disturbance process) are defined conformably. 62 Recall that ug,n = yg,n?Zg,n?g,0, where ?g,0 denotes the vector of (true) structural parameters appear in equation g, it then follows that the linear moments can be written as EH ?g,nug,n(?g) = EH ?g,n(yg,n ? Zg,n?g) = EH ? ?g,n(ug,n ? Zg,n(?g ? ?g,0)) = Hg,nEZg,n(?g,0 ? ?g). In line with Kelejian and Prucha (1998), the identification based on linear moment conditions for the g-th equation requires H ?g,nEZg,n to have full column rank, which in turn requires that EZg,n to be of full column rank.27 To provide guidance on where this condition may fail, we next discuss two such scenarios. 2.6.1 Scenario I In this scenario, let us first consider the extreme case that the model does not contain any exogenous variables, i.e., C0 = 0[. As such, Eyn = 0] by (2.63) and thus EYg,n = 0, EY g,n = 0. It then follows that EZg,n = Eyg,n,EY g,n, Xg,n is not of full column rank. This in turn implies that H ?g,nEZg,n is not of full column rank and thus the linear moments cannot identify all structural parameters ?g,0. Apart from a complete failure of identification by linear moments under this scenario, we expect the estimators based only on linear moment conditions to perform poorly, when the parameters of the exogenous variables are ?small?. Since the values of the elements in C0 depend on the chosen units of measurement of the exogenous variables, it seems intuitive that ?small? is best interpreted as to correspond to a small ratio of the variance (signal) stemming from the exogenous variables to the variance (noise) of the disturbances. As such, we expect the identification via linear moments to be strong (weak) when the signal to noise ratio is large (small). In addition to the linear moments, the quadratic moments may help identify (some) of the 27To avoid under-identification cases, we assume implicitly rank(Hg,n) ? rank(Zg,n). 63 parameters. To see this, note that Var(yn) in (2.64) is still a function of structural parameters even if C0 = 0. For contributions on identification with the help of quadratic moment conditions, see, e.g., Lee (2007) and Kuersteiner and Prucha (2020) within a single equation framework, and see , e.g., Liu (2014, 2019, 2020), Liu and Saraiva (2019), and Yang and Lee (2017, 2019) for contributions within a systems framework. 2.6.2 Scenario II In addition to low signal-to-noise ratio described in Scenario I, one may also encounter the following weak identification scenario in empirical applications when allowing for spatial lags on Xn. For the ease of illustration, we consider the following two-equation model without spatial lags in the first equation y1,n =b21y2,n + [c11x1,n + c21x2,n + c31x]3,n + ?1,n (2.65) y2,n =b12y1,n + ?22,1W1,n + ?22,2W2,n y2,n + c72x4,n + c82x5,n + c92x6,n + ?2,n Pre-multiplying both sides of the first equation by (In ? ?11,1W1,n) and collect the terms, we can obtain y1,n =b21y2,n + ?11,1W1,ny1,n + ?21,1W1,ny2,n + c11x1,n + c21x2,n + c31x3,n + c41W1,nx1,n + c51W1,nx2,n + c61W1,nx3,n + v1,n (2.66) 64 with v1,n = (In??11,1W1,n)?1,n and the following parameter restrictions satisfied simultaneously ?21,1 = ??11,1b21, c41 = ??11,1c11, c51 = ??11,1c21, c61 = ??11,1c31. (2.67) The first equation of model (2.65) and (2.66) are observationally equivalent given above parame- ter restrictions and hence the spatial parameters ?11,1 and ?21,1 are not identifiable. In connection to our discussion at the beginning of Section 2.6, note that under model (2.66), the best instru- ment for Z1,n is [ ] EZ1,n = Ey2,n,W1,nEy1,n,W1,nEy2,n, X1,n,W1,nX1,n , [ ] where X1,n = x1,n, x2,n, x3,n . Under (2.65), W1,nEy1,n = b21W1,nEy2,n +W1,nX1,n?1 where ?1 = [c11, c ? 21, c31] , and thus EZ1,n is in general not of full column rank. In light of above discussion, any point in the parameter space satisfying the parameter re- strictions in (2.67) constitutes a ?non-identification? point. It is of interest to explore the finite sample performance of the estimators under both the strong and the weak identification cases, i.e., when the true parameters values are far away from and close to these ?non-identifications? points. To do so, let ?21,1 = ?(?11,1 + ?)b21 c41 = ?(?11,1 + ?)c11 c51 = ?(?11,1 + ?)c21 c61 = ?(?11,1 + ?)c31, where the parameter ? governs the size of deviation from the corresponding ?non-identitfication? 65 points. Larger value of ? corresponds to a point in the parameter space further away from the ?non-identification? point and hence the identification through linear moments based on Xn and spatial lags of Xn are expected to be stronger than those cases with smaller ??s. Specifically, ?21,1 = ?(?11,1 + ?)b21 indicates a point deviates from the ?non-identification? in the ?negative? direction of ?21,1 with size ?b21; ck1 = ?(?11,1+?)ck?3,1 indicates a point deviates from the ?non- identification? in the ?negative? direction of ck1 with size ?ck?3,1, for k = 4, 5, 6. In general, the strength of identification via linear moments depends on the size of deviation and much less so on the sign of deviation. 2.7 Monte Carlo Simulations We investigate the finite sample property of the proposed estimators and compare their per- formance with several existing estimators in the literature, under both the strong and weak identi- fication cases. Corresponding to the discussion in Section 2.6, we consider two scenarios where weak identification issues could arise. Scenario I is designed to compare the performance be- tween estimators under high and low signal-to-noise ratio. We do so by varying the size of parameters on the exogenous variables Xn. Note that Scenario I is in line with the weak IV cases that arise in empirical applications. Scenario II extends on the case considered in, e.g., Kuersteiner and Prucha (2020), that identification becomes weak when near the ?singular point? in the parameter space. Due to space limitation, we will focus on the strong and weak identifica- tion cases under homoskedasticity as well as the strong identification case with heteroskedastic disturbances in this section. In Chapter A.4, we document results on additional robustness tests, including cases with an alternative design of spatial weight matrices as well as correlated exoge- 66 nous variables. For the purpose of comparison, in addition to the GSLIVE, the GSFIVE, the LQ-GSLIVE and the LQ-GSFIVE estimators considered in this dissertation, we also report the finite sample performance of the (quasi)-maximum likelihood estimator (MLE), the generalized spatial 2SLS (GS2SLS) that was first introduced in Kelejian and Prucha (1998) and the generalized spatial 3SLS (GS3SLS) that was first introduced in Kelejian and Prucha (2004) but extended to the simultaneous equation SARAR model of higher order spatial lags in Drukker, Egger, and Prucha (2022). We would also implement the linear quadratic GS2SLS (LQ-GS2SLS) and the linear quadratic GS3SLS (LQ-GS3SLS) that were considered in Drukker, Egger, and Prucha (2022). 2.7.1 Data Generation Process We next describe the way we generate the exogenous variables xk,n, the disturbance vector ?n, as well as the spatial weight matrices. Exogenous Matrix: Let n denote the sample size. We generate a n ? 6 exogenous matrix Xn = [x1,n, x2,n, . . . , x6,n] as follows. Elements in Xn are generated with i.i.d. normal distri- bution with mean ?x = 1 and variance ?x = 1 for j = 1, . . . , 6 and the columns in Xn?s are uncorrelated. The Xn is generated once for all Monte Carlo experiments. Disturbances: In the Monte Carlo simulations, we co[nsider tw]o equation systems (i.e., G = 2). We generate the n ? 2 disturbance matrix En = ?1,n, ?2,n as follows. Let ?i.,n = [?i1,n, ?i2,n] be the i-th row of En. The ?i.,n are generated i.i.d. normal in i, with mean ?? = 0 and variance ?? = 1. In addition, we set the covariance between the i-th element in ?1,n and ?2,n to be 0.5 for i = 1, ..., n (i.e., cov(?i1, ?i2) = 0.5 for i = 1, . . . , n). We generated 500 different 67 [ ] En = ?1,n, ?2,n for the 500 Monte Carlo trials in total. Spatial Weight Matrices: We consider the North-East modified-rook design in Arraiz, Drukker, Kelejian, and Prucha (2010) but adapt to the current model with second order spatial lags. For the convenience of readers, we repeat the details of their design in the following. First, consider a matrix in terms of a square grid with both the x and y coordinates only taking on the discrete values 1, 1.5, 2, 2.5, . . ., m. Let the units in the northeast quadrant of this matrix be at the indicated discrete coordinates m ? x ? m and m ? y ? m, where m can be seen as a cut-off value. Let the remaining units be located only at integer values of the coordinates: x = 1, 2, . . . ,m ? 1 and y = 1, 2, . . . ,m ? 1. Under this construction, the number of units located in the northeast quadrant is inversely related to the cut-off value m. As such, we refer to this matrix as a north-east modified rook matrix. For clarity, we illustrate such a matrix for the case in which m = 2 and m = 5. and with the units indicated by the starts: 5.0 ? ? ? ? ? ? ? 4.5 ? ? ? ? ? 4.0 ? ? ? ? ? ? ? 3.5 ? ? ? ? ? 3.0 ? ? ? ? ? ? ? 2.5 2.0 ? ? ? ? ? 1.5 1.0 ? ? ? ? ? 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 68 To generate spatial weights matrices W1,n and W2,n, we first define the measure of Euclidean distance between any two units, i1 and i2, that have coordinates (x1, y1) and (x2, y2), respectively, as [ ]1/2 d(i1, i2) = (x ? x )2 + (y ? y )21 2 1 2 . Given this distance measure, we define the (i, j)-th element of our row normalized weights matrix W1,n as ?n w ? ?ij,1 = wij/ wil, l=1 where w?ij = 1 if 0 < d(i1, i2) ? 1 and w?ij = 0 otherwise; (i, j)-th element of our row normalized weights matrix W2,n as ?n wij,2 = w ? ij/ w ? il, l=1 where w?ij = 1 if 1 < d(i1, i2) ? 2 and w?ij = 0 otherwise. We further set M1,n = W1,n and M2,n = W2,n. For our experiment, we consider the case in which m = 5 and m = 15 and thus the sample size is n = 486. In this specification, the North-East (NE) sector accounts for about 75% of the units in the sample. 2.7.2 Implemented Estimators To avoid confusions, we list the considered estimators in the experiments along with their notations appear in Table 2.1 - Table 2.7. 1. ML estimator, ??ML; 2. GS2SLS ?? and GMM estimator ?? considered in Drukker, Egger, and Prucha (2022), de- noted as ?? ? ? ?GS2SLS = [?? , ?? ] ; 69 3. GS3SLS ??? and GMM estimator ??? considered in Drukker, Egger, and Prucha (2022), de- noted as ??GS3SLS = [? ???, ????]?; 4. Approximated GSLIVE ??GSLIV E in (2.53) and the GMM estimator for ?g in (2.55); 5. Approximated GSFIVE ??GSFIV E in (2.58) and the GMM estimator for ?g in (2.60); 6. LQ-GS2SLS ??LQ?GS2SLS considered in Drukker, Egger, and Prucha (2022); 7. LQ-GS3SLS ??LQ?GS3SLS considered in Drukker, Egger, and Prucha (2022); 8. Approximated Linear-Quadratic GSLIVE ??LQ?GSLIV E in (2.57); 9. Approximated Linear-Quadratic GSFIVE ??LQ?GSFIV E in (2.62). As discussed, the (LQ-) GSLIVE and (LQ-) GSFIVE estimators considered in this disserta- tion require some initial estimates. In the Monte Carlo simulations, we use the GS2SLS estimates as the initial estimates for the GSLIVE, and then use the GSLIVE estimates as the initial esti- mates when implement the GSFIVE. Analogously, we use the LQ-GS2SLS estimates as the initial estimates for the LQ-GSLIVE, which in turn serves as the initial estimates for the LQ-GSFIVE. 2.7.3 Performance of Estimators under Strong and Weak Identifications Sce- nario I In light of our discussion in Section 2.6.1, it is of interest to compare the performance of the proposed estimators under both the strong and weak identification set-ups. In the following, we suppress subscript n in the following where not necessary. 70 Model Specification: The model considered in the experiment is given by [ ] y1 =b21y2 + [?11,1W1 + ?11,2W2] y1 + c11x1 + c21x2 + c31x3 + u1, y2 =b[12y1 + ?22,1W1 ]+ ?22,2W2 y2 + c42x4 + c52x5 + c62x6 + u2, ug = ?g1W1 + ?g2W2 ug + ?g, g = 1, 2. For simplicity, we denote c.1 = [c11, c21, c ? ?31] and c.2 = [c42, c52, c62] . In the simulations, we experiment with both c.1 = c.2 = [1, 1, 1] ? and c.1 = c.2 = [0.4, 0.4, 0.4] ?, corresponding to the high and low signal-to-noise ratio cases. We expect the identification power of linear moments would be weakened in the latter case. Therefore, we refer to the former case as the strong identification case and the latter case as the weak identification case in Table 2.1 - Table 2.3 below. Parameter Space: ?? ???? ?11,1W1 + ?11,2W2 b21In ?Recall that ?S(?, ?) = I ? ?2n ? B where B = ?? ? b12In ?2?2,1W1 + ?22,2W2?? ?11W1 + ?12W2 0 ?and ?R(?) = I ? ?2n ? P where P = ?? ??. For the reduced 0 ?21W1 + ??22W2 ?form (2.5) to exist, we need to ensure the existence of S(?, ?) ?1 = ?h=0(B ?)h and R(?)?1 = ? h=0(P ?)h, which in turn requires ||B?|| < 1 and ||P ?|| < 1 for some induced norm. Row- normalized W1 and W2 imply ||W1||? = 1 and ||W2||? = 1, where ||A||? gives the maximum absolute row sum of An. To ensure ||B?||? < 1, we need to restrict 71 ||b21In + ?11,1W1 + ?11,2W2||? ? |b21|||In||? + |?11,1|||W1||? + |?11,2|||W2||? = |b21|+ |?11,1|+ |?11,2| < 1, ||b12In + ?22,1W1 + ?22,2W2||? ? |b12|||In||? + |?22,1|||W1||? + |?22,2|||W2||? = |b12|+ |?22,1|+ |?22,2| < 1. This is equivalent to require max{|b21|+ |?11,1|+ |?11,2|, |b12|+ |?22,1|+ |?22,2|} < 1. (2.68) The inequalities follows from triangular inequalities for matrix norms. Similarly, to ensure ||P ?||? < 1, we need to restrict ||?11W1 + ?12W2||? ? |?11|||W1||? + |?12|||W2||? = |?11|+ |?12| < 1 and ||?21W1 + ?22W2||? ? |?21|||W1||? + |?22|||W2||? = |?21|+ |?22| < 1 This is equivalent to require max{|?11|+ |?12|, |?21|+ |?22|} < 1. (2.69) In light of above, we consider the following parameter combinations 72 1. b21 ? {?0.15, 0.15}, b12 = 0.3 2. ?11,1 ? {?0.3, 0, 0.3, 0.5}, ?11,2 ? {?0.2, 0, 0.2} 3. ?22,1 = 0.3, ?22,2 = 0.15 4. ?11 ? {?0.2, 0.2}, ?12 = 0.1 5. ?21 = 0.1, ?22 = 0 Note that the inequalities (2.68) and (2.69) are satisfied with above parameter sets. These parameter choices constitute 48 different experiment settings for each of the weak and the strong identification cases and thus 96 experiments in total. Due to the limited space, we report the Monte Carlo results for each case of c.1 = c.2 = [1, 1, 1] ? (i.e., the strong identification case) and c ?.1 = c.2 = [0.4, 0.4, 0.4] (i.e., the weak identification case) with the three parameter constella- tions in the tables below Table 2.1 Table 2.2 Table 2.3 b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3 ?11,1 = 0.3 ?11,2 = 0.2 ?11,1 = 0.5 ?11,2 = 0.2 ?11,1 = ?0.3 ?11,2 = ?0.2 ?22,1 = 0.3 ?22,2 = 0.15 ?22,1 = 0.3 ?22,2 = 0.15 ?22,1 = 0.3 ?22,2 = 0.15 ?11 = 0.2 ?12 = 0.1 ?11 = 0.2 ?12 = 0.1 ?11 = 0.2 ?12 = 0.1 ?21 = 0.1 ?22 = 0 ?21 = 0.1 ?22 = 0 ?21 = 0.1 ?22 = 0 We report the Median and the RMSE of the obtained estimates based on 500 Monte Carlo repetitions. Following, e.g., Kelejian and Prucha (1998), the RMSE is calculated as RMSE = 73 [ ]2 bias2 + IQ/1.35 , where the bias is the absolute difference between the median of the em- pirical distribution and the true parameter value, and IQ is an inter-quantile range. That is, IQ = c1 ? c2 where c1 is the 0.75 quantile and c2 is the 0.25 quantile. Note that if the dis- tribution is normal, the median is equal to the mean and IQ/1.35 is approximately equal to the standard deviation. As mentioned in Kelejian, Prucha, and Yuzefovich (2004), an important fea- ture of this modified RMSE measure is that it is based on quantiles which always exist. The standard measure of the root mean square error is based on the first and second moments which, as pointed out by Kelejian and Prucha (1999) among others, may not always exist. Thus the standard measure may not be well defined. 74 75 Table 2.1: Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 1 Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.149 0.024 0.165 0.029 0.153 0.026 0.150 0.026 0.150 0.025 0.151 0.026 0.147 0.025 0.151 0.025 0.151 0.025 ?11,1 0.300 0.301 0.052 0.307 0.067 0.303 0.061 0.303 0.057 0.300 0.053 0.309 0.065 0.300 0.059 0.304 0.057 0.302 0.054 ?11,2 0.200 0.201 0.050 0.186 0.068 0.197 0.061 0.198 0.057 0.201 0.052 0.191 0.060 0.206 0.058 0.198 0.057 0.200 0.052 ?11 0.200 0.190 0.099 0.182 0.112 0.191 0.110 0.187 0.106 0.193 0.102 0.175 0.110 0.190 0.105 0.192 0.109 0.195 0.100 ?12 0.100 0.069 0.123 0.101 0.128 0.092 0.128 0.067 0.131 0.073 0.128 0.096 0.123 0.091 0.130 0.097 0.136 0.094 0.118 b12 0.300 0.298 0.021 0.312 0.024 0.299 0.022 0.299 0.021 0.299 0.022 0.298 0.022 0.298 0.022 0.300 0.021 0.302 0.021 ?22,1 0.300 0.299 0.052 0.295 0.068 0.298 0.057 0.293 0.061 0.297 0.052 0.300 0.067 0.292 0.057 0.295 0.063 0.296 0.054 ?22,2 0.150 0.157 0.056 0.145 0.066 0.152 0.063 0.157 0.063 0.156 0.057 0.151 0.063 0.160 0.058 0.155 0.065 0.155 0.056 ?21 0.100 0.089 0.096 0.084 0.121 0.092 0.113 0.087 0.106 0.085 0.104 0.086 0.117 0.096 0.117 0.097 0.118 0.099 0.102 ?22 0.000 -0.043 0.150 -0.018 0.162 -0.026 0.163 -0.043 0.161 -0.043 0.165 -0.016 0.162 -0.020 0.168 -0.022 0.168 -0.021 0.144 Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.149 0.059 0.227 0.093 0.180 0.069 0.153 0.064 0.154 0.064 0.157 0.060 0.138 0.067 0.163 0.060 0.156 0.062 ?11,1 0.300 0.308 0.125 0.347 0.214 0.347 0.235 0.307 0.142 0.307 0.137 0.325 0.166 0.320 0.162 0.305 0.138 0.305 0.129 ?11,2 0.200 0.197 0.123 0.089 0.216 0.141 0.222 0.200 0.146 0.191 0.132 0.154 0.152 0.186 0.147 0.180 0.142 0.188 0.127 ?11 0.200 0.183 0.149 0.120 0.239 0.132 0.241 0.183 0.180 0.191 0.166 0.127 0.191 0.156 0.184 0.190 0.174 0.195 0.153 ?12 0.100 0.077 0.163 0.194 0.219 0.140 0.212 0.061 0.195 0.074 0.182 0.127 0.184 0.097 0.191 0.108 0.172 0.102 0.165 b12 0.300 0.296 0.052 0.366 0.082 0.314 0.061 0.299 0.053 0.302 0.056 0.302 0.054 0.296 0.064 0.308 0.055 0.306 0.054 ?22,1 0.300 0.294 0.122 0.296 0.196 0.307 0.213 0.284 0.152 0.291 0.137 0.298 0.148 0.292 0.151 0.286 0.153 0.288 0.124 ?22,2 0.150 0.158 0.131 0.095 0.191 0.120 0.203 0.169 0.150 0.158 0.146 0.145 0.141 0.164 0.140 0.148 0.157 0.158 0.133 ?21 0.100 0.092 0.163 0.078 0.229 0.078 0.260 0.100 0.213 0.091 0.185 0.092 0.196 0.108 0.204 0.107 0.210 0.102 0.167 ?22 0.000 -0.051 0.194 0.018 0.208 -0.009 0.220 -0.053 0.228 -0.050 0.228 -0.017 0.192 -0.044 0.205 -0.024 0.222 -0.026 0.193 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. 76 Table 2.2: Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 2 Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.149 0.025 0.165 0.029 0.153 0.026 0.150 0.026 0.150 0.025 0.151 0.026 0.147 0.025 0.151 0.025 0.151 0.025 ?11,1 0.500 0.501 0.050 0.521 0.067 0.513 0.062 0.503 0.057 0.503 0.051 0.520 0.065 0.507 0.060 0.502 0.055 0.502 0.052 ?11,2 0.200 0.202 0.047 0.171 0.068 0.187 0.057 0.201 0.054 0.200 0.049 0.180 0.061 0.196 0.054 0.198 0.055 0.200 0.047 ?11 0.200 0.191 0.099 0.160 0.122 0.177 0.116 0.184 0.111 0.196 0.102 0.163 0.116 0.181 0.112 0.193 0.116 0.197 0.099 ?12 0.100 0.070 0.122 0.110 0.124 0.098 0.125 0.070 0.129 0.072 0.127 0.107 0.123 0.099 0.125 0.097 0.128 0.091 0.117 b12 0.300 0.299 0.021 0.310 0.023 0.298 0.021 0.298 0.021 0.299 0.021 0.299 0.021 0.298 0.022 0.300 0.021 0.302 0.020 ?22,1 0.300 0.298 0.053 0.295 0.070 0.302 0.057 0.295 0.062 0.298 0.051 0.300 0.067 0.294 0.058 0.296 0.063 0.297 0.053 ?22,2 0.150 0.155 0.055 0.142 0.066 0.149 0.060 0.159 0.060 0.154 0.055 0.148 0.064 0.157 0.058 0.153 0.062 0.154 0.054 ?21 0.100 0.087 0.098 0.081 0.120 0.085 0.116 0.084 0.109 0.084 0.103 0.086 0.117 0.094 0.121 0.097 0.116 0.099 0.104 ?22 0.000 -0.045 0.148 -0.015 0.160 -0.021 0.163 -0.042 0.163 -0.040 0.167 -0.014 0.158 -0.019 0.166 -0.016 0.166 -0.024 0.151 Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.149 0.061 0.226 0.092 0.179 0.065 0.151 0.064 0.155 0.065 0.136 0.061 0.142 0.063 0.164 0.062 0.156 0.065 ?11,1 0.500 0.504 0.119 0.631 0.219 0.633 0.227 0.505 0.152 0.516 0.131 0.542 0.176 0.535 0.162 0.507 0.136 0.503 0.121 ?11,2 0.200 0.197 0.111 0.018 0.241 0.051 0.220 0.205 0.146 0.186 0.134 0.166 0.176 0.162 0.151 0.184 0.129 0.190 0.114 ?11 0.200 0.187 0.152 0.025 0.265 0.039 0.262 0.184 0.200 0.186 0.175 0.148 0.231 0.149 0.216 0.187 0.187 0.192 0.157 ?12 0.100 0.070 0.150 0.223 0.203 0.180 0.185 0.062 0.186 0.082 0.162 0.105 0.169 0.113 0.168 0.101 0.158 0.105 0.155 b12 0.300 0.298 0.051 0.356 0.074 0.307 0.057 0.297 0.054 0.303 0.055 0.287 0.051 0.294 0.060 0.307 0.055 0.305 0.051 ?22,1 0.300 0.300 0.121 0.298 0.201 0.332 0.221 0.278 0.159 0.297 0.139 0.312 0.149 0.299 0.153 0.288 0.159 0.294 0.127 ?22,2 0.150 0.155 0.125 0.073 0.192 0.094 0.201 0.176 0.160 0.155 0.154 0.153 0.143 0.152 0.145 0.148 0.154 0.151 0.127 ?21 0.100 0.092 0.161 0.062 0.234 0.048 0.269 0.109 0.226 0.082 0.190 0.076 0.199 0.096 0.208 0.106 0.211 0.104 0.168 ?22 0.000 -0.051 0.190 0.034 0.206 0.008 0.215 -0.065 0.229 -0.050 0.222 -0.020 0.185 -0.031 0.203 -0.022 0.219 -0.024 0.186 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. 77 Table 2.3: Median and RMSE of Scenario I, homoskedasticity, Parameter Constellation 3 Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.147 0.023 0.165 0.028 0.153 0.024 0.150 0.025 0.148 0.024 0.150 0.025 0.145 0.025 0.151 0.024 0.149 0.023 ?11,1 -0.300 -0.297 0.055 -0.321 0.063 -0.313 0.062 -0.298 0.059 -0.301 0.055 -0.304 0.061 -0.306 0.058 -0.298 0.057 -0.299 0.056 ?11,2 -0.200 -0.200 0.059 -0.198 0.064 -0.185 0.063 -0.203 0.065 -0.198 0.060 -0.192 0.067 -0.179 0.065 -0.204 0.063 -0.202 0.060 ?11 0.200 0.189 0.090 0.216 0.104 0.208 0.101 0.184 0.100 0.191 0.102 0.190 0.101 0.199 0.100 0.191 0.102 0.195 0.092 ?12 0.100 0.069 0.125 0.079 0.146 0.072 0.145 0.067 0.139 0.066 0.132 0.074 0.134 0.076 0.139 0.095 0.140 0.086 0.130 b12 0.300 0.297 0.022 0.313 0.025 0.301 0.022 0.298 0.022 0.299 0.022 0.299 0.022 0.300 0.024 0.301 0.023 0.300 0.022 ?22,1 0.300 0.296 0.053 0.293 0.067 0.293 0.060 0.294 0.061 0.295 0.052 0.295 0.068 0.291 0.059 0.296 0.062 0.294 0.052 ?22,2 0.150 0.157 0.056 0.154 0.067 0.157 0.059 0.157 0.062 0.156 0.054 0.153 0.067 0.164 0.059 0.155 0.064 0.155 0.056 ?21 0.100 0.091 0.092 0.094 0.118 0.099 0.116 0.088 0.102 0.086 0.101 0.089 0.115 0.102 0.113 0.099 0.116 0.098 0.095 ?22 0.000 -0.045 0.151 -0.023 0.162 -0.034 0.170 -0.046 0.164 -0.043 0.165 -0.019 0.162 -0.021 0.171 -0.027 0.164 -0.025 0.144 Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.143 0.057 0.228 0.093 0.185 0.069 0.152 0.064 0.152 0.060 0.154 0.059 0.137 0.063 0.162 0.057 0.154 0.058 ?11,1 -0.300 -0.291 0.127 -0.421 0.203 -0.417 0.201 -0.297 0.154 -0.320 0.136 -0.331 0.146 -0.324 0.148 -0.311 0.140 -0.307 0.129 ?11,2 -0.200 -0.202 0.124 -0.203 0.176 -0.148 0.187 -0.218 0.162 -0.187 0.155 -0.166 0.159 -0.170 0.168 -0.222 0.147 -0.218 0.133 ?11 0.200 0.176 0.146 0.295 0.197 0.281 0.192 0.184 0.180 0.210 0.160 0.230 0.153 0.220 0.149 0.200 0.154 0.193 0.148 ?12 0.100 0.067 0.182 0.050 0.216 0.020 0.230 0.064 0.231 0.048 0.217 0.048 0.204 0.050 0.221 0.093 0.197 0.095 0.190 b12 0.300 0.293 0.059 0.378 0.092 0.328 0.065 0.298 0.058 0.302 0.056 0.304 0.056 0.306 0.065 0.311 0.056 0.302 0.053 ?22,1 0.300 0.291 0.126 0.305 0.187 0.280 0.201 0.284 0.152 0.283 0.139 0.286 0.140 0.280 0.149 0.279 0.159 0.286 0.128 ?22,2 0.150 0.163 0.131 0.135 0.188 0.164 0.201 0.160 0.155 0.163 0.141 0.159 0.140 0.169 0.144 0.160 0.160 0.162 0.129 ?21 0.100 0.096 0.162 0.090 0.222 0.123 0.247 0.089 0.200 0.102 0.177 0.106 0.189 0.121 0.199 0.113 0.219 0.104 0.164 ?22 0.000 -0.056 0.200 -0.008 0.218 -0.046 0.243 -0.054 0.231 -0.050 0.224 -0.028 0.199 -0.050 0.212 -0.026 0.227 -0.031 0.200 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. 2.7.4 Performance of Estimators under Strong and Weak Identifications Sce- nario II Model Specification: In light of our discussion in Section 2.6.2, we consider the following model for Scenario II: y1 =b21y2 + ?11,1W1y1 + ?21,1W1y2 + c11x1[+ c21x2 + c31x3 + c]41W1x1 + c51W1x2 + c61W1x3 + ?1 (2.70) y2 =b12y1 + ?22,1W1 + ?22,2W2 y2 + c72x4 + c82x5 + c92x6 + ?2 with the following parameter restrictions: ?21,1 = ?(?11,1 + ?)b21 c41 = ?(?11,1 + ?)c11 (2.71) c51 = ?(?11,1 + ?)c21 c61 = ?(?11,1 + ?)c31. Parameter Space: To test on parameter values close to and away from the ?non-identification? points, we consider ? = 0.4, 0.8, 1 where ? = 0.4 corresponds to the case closest to the ?non- identification? point and thus the weak identification case; ? = 1 corresponds to the strong identification case. ?? ??11,1W1 b21In + ?21,1W1 ? Recall that ? ?S(?, ?) = I2n ? B?, where B? = ?? ??. Again, b12In ?22,1W1 + ??22,2W2 for the reduced form to exist, we need to ensure the existence of S(?, ?)?1 = ?h=0(B ?)h which in turn requires ||B?|| < 1 for some induced norm. Again, row-normalized W1 and W2 imply 78 ||W1||? = 1 and ||W2||? = 1. To ensure ||B?||? < 1, we need to restrict ||b21In + ?11,1W1 + ?21,1W1||? ? |b21|||In||? + |?11,1|||W1||? + |?21,1|||W1||? = |b21|+ |?11,1|+ |?21,1| < 1, ||b12In + ?22,1W1 + ?22,2W2||? ? |b12|||In||? + |?22,1|||W1||? + |?22,2|||W2||? = |b12|+ |?22,1|+ |?22,2| < 1. This is equivalent to require max{|b21|+ |?11,1|+ |?21,1|, |b12|+ |?22,1|+ |?22,2|} < 1. (2.72) The inequalities follows from triangular inequalities for matrix norms. In light of this, we con- sider the following parameter combinations 1. Deviation parameter ? ? {0.4, 0.8, 1} 2. b21 ? {?0.15, 0.15}, b12 = 0.3 3. ?11,1 ? {?0.3, 0, 0.3, 0.5}, ?21,1 depends on ?11,1, b21 and ? as specified in (2.71) 4. ?22,1 = 0.3, ?22,2 = 0.15 5. c11, c21, c31 = 1; c41, c51, c61 depend on c11,c21, c31, respectively, as well as ?11,1 and ? as specified in (2.71) 6. c72, c82, c92 = 1 Note that the inequality (2.72) is satisfied with above parameter sets. These parameter com- binations constitutes 24 different experiment settings. Due to limited space, we report Monte 79 Carlo results for the following three parameter constellations: Table 2.4 Table 2.5 Table 2.6 b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3 ?11,1 = 0.3 ?11,1 = 0.5 ?11,1 = ?0.3 ?22,1 = 0.3 ?22,2 = 0.15 ?22,1 = 0.3 ?22,2 = 0.15 ?22,1 = 0.3 ?22,2 = 0.15 and (1) ?21,1 depends on ?11,1, b21 and ? as specified in (2.71); (2) c11, c21, c31 = 1; c41, c51, c61 depend on c11,c21, c31, respectively, as well as ?11,1 and ? as specified in (2.71); (3) c72, c82, c92 = 1 for all cases. For each parameter constellation, we report results on ? = 1 (the strong identification case) and ? = 0.4 (the weak identification case). 80 81 Table 2.4: Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 1 Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.150 0.026 0.164 0.030 0.155 0.027 0.150 0.026 0.150 0.027 0.153 0.026 0.156 0.026 0.152 0.026 0.152 0.026 ?11,1 0.300 0.295 0.048 0.328 0.081 0.319 0.070 0.302 0.072 0.300 0.069 0.297 0.056 0.311 0.053 0.300 0.053 0.298 0.048 ?21,1 -0.195 -0.198 0.038 -0.194 0.044 -0.191 0.042 -0.197 0.046 -0.198 0.045 -0.199 0.044 -0.198 0.038 -0.200 0.039 -0.199 0.038 c41 -1.300 -1.298 0.090 -1.307 0.105 -1.307 0.098 -1.299 0.107 -1.297 0.099 -1.286 0.115 -1.303 0.097 -1.298 0.108 -1.300 0.092 c51 -1.300 -1.295 0.101 -1.315 0.113 -1.304 0.102 -1.303 0.118 -1.296 0.104 -1.303 0.127 -1.300 0.102 -1.300 0.119 -1.299 0.105 c61 -1.300 -1.292 0.088 -1.301 0.094 -1.303 0.092 -1.288 0.102 -1.295 0.095 -1.287 0.117 -1.298 0.089 -1.291 0.098 -1.294 0.089 b12 0.300 0.299 0.025 0.314 0.028 0.306 0.025 0.300 0.025 0.300 0.025 0.301 0.025 0.307 0.027 0.302 0.025 0.303 0.024 ?22,1 0.300 0.294 0.043 0.299 0.061 0.299 0.056 0.294 0.062 0.295 0.057 0.302 0.049 0.300 0.045 0.299 0.043 0.297 0.043 ?22,2 0.150 0.152 0.043 0.164 0.059 0.153 0.053 0.158 0.057 0.152 0.054 0.151 0.050 0.155 0.044 0.153 0.043 0.152 0.044 Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.149 0.026 0.165 0.030 0.156 0.027 0.149 0.026 0.149 0.026 0.153 0.026 0.156 0.026 0.152 0.026 0.152 0.026 ?11,1 0.300 0.288 0.068 0.414 0.207 0.382 0.175 0.286 0.186 0.295 0.171 0.294 0.078 0.317 0.079 0.297 0.073 0.296 0.068 ?21,1 -0.105 -0.106 0.034 -0.112 0.039 -0.108 0.037 -0.109 0.040 -0.108 0.036 -0.110 0.044 -0.114 0.036 -0.110 0.039 -0.108 0.034 c41 -0.700 -0.690 0.108 -0.787 0.164 -0.767 0.139 -0.695 0.183 -0.707 0.155 -0.689 0.134 -0.706 0.107 -0.690 0.121 -0.694 0.109 c51 -0.700 -0.685 0.111 -0.788 0.178 -0.759 0.151 -0.683 0.183 -0.700 0.159 -0.700 0.139 -0.711 0.109 -0.701 0.119 -0.691 0.110 c61 -0.700 -0.687 0.095 -0.781 0.167 -0.761 0.143 -0.687 0.190 -0.696 0.159 -0.682 0.117 -0.712 0.100 -0.684 0.106 -0.691 0.097 b12 0.300 0.299 0.023 0.311 0.025 0.304 0.023 0.300 0.023 0.300 0.024 0.300 0.024 0.306 0.025 0.302 0.024 0.304 0.023 ?22,1 0.300 0.296 0.041 0.289 0.063 0.299 0.055 0.294 0.061 0.295 0.054 0.299 0.047 0.297 0.044 0.298 0.043 0.298 0.043 ?22,2 0.150 0.153 0.041 0.160 0.058 0.148 0.054 0.156 0.060 0.153 0.055 0.151 0.048 0.152 0.043 0.152 0.044 0.152 0.041 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. 82 Table 2.5: Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 2 Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.150 0.025 0.164 0.030 0.155 0.027 0.150 0.026 0.149 0.027 0.153 0.026 0.156 0.027 0.152 0.027 0.152 0.026 ?11,1 0.500 0.495 0.038 0.526 0.059 0.518 0.050 0.501 0.055 0.502 0.048 0.500 0.046 0.511 0.042 0.500 0.041 0.498 0.039 ?21,1 -0.225 -0.229 0.040 -0.219 0.047 -0.214 0.047 -0.230 0.047 -0.226 0.048 -0.229 0.045 -0.225 0.039 -0.231 0.040 -0.230 0.040 c41 -1.500 -1.500 0.093 -1.492 0.098 -1.502 0.098 -1.498 0.105 -1.501 0.100 -1.486 0.116 -1.502 0.093 -1.497 0.103 -1.501 0.094 c51 -1.500 -1.498 0.104 -1.503 0.108 -1.503 0.103 -1.500 0.114 -1.499 0.105 -1.499 0.127 -1.499 0.104 -1.496 0.119 -1.500 0.106 c61 -1.500 -1.493 0.092 -1.488 0.100 -1.496 0.090 -1.489 0.105 -1.495 0.089 -1.489 0.121 -1.495 0.094 -1.489 0.105 -1.494 0.089 b12 0.300 0.299 0.023 0.317 0.029 0.309 0.026 0.302 0.026 0.301 0.026 0.302 0.025 0.309 0.026 0.303 0.024 0.304 0.023 ?22,1 0.300 0.294 0.044 0.302 0.061 0.304 0.058 0.297 0.063 0.298 0.059 0.304 0.049 0.303 0.046 0.299 0.044 0.298 0.044 ?22,2 0.150 0.151 0.044 0.170 0.063 0.155 0.055 0.155 0.062 0.155 0.055 0.152 0.057 0.157 0.048 0.150 0.046 0.153 0.046 Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.150 0.026 0.166 0.030 0.157 0.027 0.149 0.027 0.150 0.026 0.153 0.026 0.157 0.026 0.152 0.026 0.152 0.025 ?11,1 0.500 0.486 0.057 0.617 0.168 0.586 0.141 0.489 0.140 0.501 0.125 0.500 0.064 0.522 0.066 0.497 0.056 0.494 0.055 ?21,1 -0.135 -0.137 0.035 -0.132 0.036 -0.130 0.038 -0.139 0.040 -0.136 0.039 -0.139 0.043 -0.143 0.036 -0.141 0.039 -0.140 0.034 c41 -0.900 -0.891 0.097 -0.974 0.133 -0.956 0.114 -0.892 0.145 -0.906 0.123 -0.885 0.127 -0.905 0.095 -0.895 0.111 -0.895 0.100 c51 -0.900 -0.890 0.110 -0.971 0.146 -0.947 0.121 -0.892 0.156 -0.899 0.124 -0.901 0.130 -0.907 0.104 -0.900 0.115 -0.893 0.106 c61 -0.900 -0.885 0.094 -0.963 0.124 -0.954 0.119 -0.892 0.148 -0.899 0.124 -0.886 0.113 -0.908 0.094 -0.888 0.104 -0.891 0.093 b12 0.300 0.298 0.023 0.313 0.027 0.306 0.026 0.300 0.024 0.300 0.024 0.301 0.024 0.308 0.026 0.303 0.024 0.305 0.023 ?22,1 0.300 0.296 0.042 0.292 0.062 0.304 0.056 0.295 0.061 0.295 0.055 0.300 0.048 0.298 0.045 0.299 0.043 0.298 0.043 ?22,2 0.150 0.153 0.041 0.160 0.059 0.145 0.054 0.156 0.058 0.153 0.055 0.151 0.049 0.152 0.044 0.151 0.043 0.152 0.042 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. 83 Table 2.6: Median and RMSE of Scenario II, homoskedasticity, Parameter Constellation 3 Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.151 0.025 0.161 0.028 0.154 0.027 0.150 0.026 0.150 0.026 0.153 0.026 0.155 0.026 0.152 0.027 0.152 0.026 ?11,1 -0.300 -0.304 0.061 -0.314 0.092 -0.315 0.078 -0.299 0.101 -0.304 0.084 -0.313 0.070 -0.310 0.063 -0.302 0.065 -0.300 0.060 ?21,1 -0.105 -0.107 0.035 -0.115 0.040 -0.109 0.038 -0.107 0.041 -0.106 0.038 -0.108 0.044 -0.111 0.035 -0.111 0.039 -0.110 0.036 c41 -0.700 -0.692 0.102 -0.699 0.134 -0.691 0.113 -0.697 0.132 -0.694 0.117 -0.687 0.134 -0.692 0.106 -0.695 0.118 -0.696 0.103 c51 -0.700 -0.696 0.106 -0.702 0.127 -0.689 0.115 -0.698 0.130 -0.702 0.117 -0.702 0.138 -0.694 0.105 -0.702 0.115 -0.699 0.110 c61 -0.700 -0.692 0.091 -0.686 0.118 -0.682 0.114 -0.695 0.129 -0.691 0.117 -0.676 0.121 -0.691 0.096 -0.686 0.107 -0.695 0.091 b12 0.300 0.298 0.023 0.310 0.025 0.302 0.023 0.299 0.023 0.300 0.023 0.300 0.023 0.305 0.024 0.301 0.022 0.302 0.024 ?22,1 0.300 0.295 0.042 0.294 0.057 0.289 0.055 0.293 0.060 0.295 0.051 0.300 0.047 0.296 0.043 0.296 0.043 0.297 0.041 ?22,2 0.150 0.153 0.041 0.159 0.056 0.160 0.052 0.156 0.059 0.156 0.053 0.152 0.050 0.156 0.043 0.153 0.043 0.151 0.041 Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.150 0.026 0.161 0.028 0.154 0.026 0.150 0.027 0.150 0.026 0.153 0.026 0.155 0.026 0.152 0.026 0.152 0.026 ?11,1 -0.300 -0.306 0.079 -0.355 0.201 -0.368 0.185 -0.302 0.238 -0.319 0.198 -0.318 0.092 -0.318 0.090 -0.305 0.086 -0.302 0.081 ?21,1 -0.015 -0.015 0.036 -0.020 0.039 -0.017 0.037 -0.019 0.040 -0.018 0.039 -0.017 0.046 -0.018 0.037 -0.019 0.041 -0.018 0.034 c41 -0.100 -0.094 0.119 -0.058 0.216 -0.057 0.176 -0.102 0.243 -0.088 0.213 -0.082 0.146 -0.086 0.123 -0.091 0.133 -0.091 0.121 c51 -0.100 -0.091 0.117 -0.054 0.213 -0.043 0.184 -0.090 0.248 -0.094 0.207 -0.091 0.150 -0.087 0.120 -0.096 0.126 -0.097 0.119 c61 -0.100 -0.096 0.102 -0.050 0.206 -0.051 0.183 -0.093 0.238 -0.086 0.208 -0.074 0.130 -0.085 0.110 -0.085 0.120 -0.093 0.103 b12 0.300 0.298 0.023 0.309 0.023 0.302 0.022 0.299 0.022 0.300 0.022 0.300 0.023 0.304 0.024 0.302 0.022 0.302 0.024 ?22,1 0.300 0.296 0.040 0.288 0.061 0.286 0.057 0.295 0.061 0.294 0.053 0.299 0.047 0.293 0.045 0.297 0.043 0.296 0.042 ?22,2 0.150 0.153 0.041 0.161 0.058 0.161 0.053 0.157 0.060 0.155 0.052 0.153 0.050 0.155 0.044 0.152 0.042 0.152 0.042 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. 2.7.5 Heteroskedasticity The ML estimator is in general inconsistent under heteroskedasticity and could exhibit con- siderable bias. The moment conditions of the GMM estimators considered in this dissertation are robust to heteroskedasticity and hence the GMM estimators remain consistent. To document this advantage of the considered GMM estimators over the ML estimator, we conduct the following experiments. Model Specification We consider a model specification similar to that considered in Scenario I but with one less exogenous variables in both equation 1 and 2: [ ] y1 =b21y2 + [?11,1W1 + ?11,2W2] y1 + c11x1 + c21x2 + u1, y2 =b[12y1 + ?22,1W1 ]+ ?22,2W2 y2 + c32x4 + c42x5 + u2, ug = ?g1W1 + ?g2W2 ug + ?g, g = 1, 2. The xk?s (k = 1, 2, 4, 5) are columns of the matrix of exogenous variables (X) that we generated before. Spatial Weights Matrices: We adopt the ?dumbbell-shaped? design of the weight matrix that considered in Arraiz, Drukker, Kelejian, and Prucha (2010) with n = 500 but adapted to the current model of second order spatial lags. To do so, we first generate a Wn and let (approximately) the first 100 and the last 100 units to have 10 neighbors ahead and 10 neighbors behind; we let (approximately) the middle 300 units to be 3-ahead and 3-behind. We then adopt the ?ring? design and generate W1,n with the first 100 and the last 100 units to have 5 neighbors ahead and 5 neighbors behind with 84 the middle (approximately) 300 units to be 1-ahead and 1-behind. To generate W2,n, we net out W1,n from Wn and thus each of the first and the last 100 units in W2,n has the 6th to 10th unit ahead and behind as neighbors while the middle 300 units has the 2nd and the 3rd unit ahead and behind as neighbors. For clarity, we present an example of ?dumbbell-shaped? W1,n with n = 30 as follows: 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 Disturbances 85 To generate heteroskedastic disturbances, we take the i?th element of the innovation vector ?g,n as ?g,n,i = ?n,i?g,n,i ? d? = ? n,in,i n j=1 dn,j/n where ?g,n,i is i.i.d. N(0, 1) for g = 1, 2 and the correlation between ?1,n and ?2,n is 0.5. We denote dn,i as the number of neighbors the i-th unit has, which depends on the relative position of the i-th unit in the network. We note that the average standard deviations of the elements of ?g,n is ?. By setting ? = 1, we maintain the average standard deviation of ?i,g to be 1 for i = 1, ..., n and g = 1, 2, which is the same as the ?? in the homoskedasticity case. Parameter Space We report the Monte Carlo results for the case of [c11, c21]? = [1, 1]? and [c ?32, c42] = [1, 1]? (i.e., the strong identification case) with the four parameter constellations in the tables below: Para 1 Para 2 b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3 ?11,1 = 0.3 ?22,1 = 0.3 ?11,1 = 0.3 ?22,1 = 0.3 ?11,2 = 0 ?22,2 = 0.15 ?11,2 = 0.2 ?22,2 = 0.15 ?11 = ?0.2 ?21 = 0.1 ?11 = ?0.2 ?21 = 0.1 ?12 = 0.1 ?22 = 0 ?12 = 0.1 ?22 = 0 Para 3 Para 4 b21 = 0.15 b12 = 0.3 b21 = 0.15 b12 = 0.3 ?11,1 = 0.3 ?22,1 = 0.3 ?11,1 = 0.3 ?22,1 = 0.3 ?11,2 = 0 ?22,2 = 0.15 ?11,2 = 0.2 ?22,2 = 0.15 ?11 = 0.2 ?21 = 0.1 ?11 = 0.2 ?21 = 0.1 ?12 = 0.1 ?22 = 0 ?12 = 0.1 ?22 = 0 86 To save on space, we only report the estimates of the parameters in the first equation of the model. We document a sample of Monte Carlo results under heteroskedasticity in Table 2.7 below. 2.7.6 Remarks Under homoskedasticity and strong identification cases, all estimators considered in the study perform reasonably close. As expected, MLE is the most efficient estimator and shows the smallest RMSE under nearly all scenarios and parameter constellations. Both LQ-GS3SLS and LQ-GSFIVE perform very close to MLE in these scenarios. In addition, under strong identifi- cation cases, estimators utilize only the linear moments also perform close to their counterparts utilize both the linear and the quadratic moments. Specifically, GS3SLS and GSFIVE show sim- ilar RMSE in comparision to, e.g., LQ-GS3SLS and LQ-GSFIVE. Finally, since the disturbance terms ?g,n and ?h,n are correlated across equations (i.e., g ?= h), full information estimators in general outperform their limited information counterparts throughout the experiments. Thus, we see that GSFIVE and LQ-GSFIVE outperform GSLIVE and LQ-GSLIVE, respectively. As remarked, MLE is in general inconsistent under heteroskedastic disturbances. MLE shows significant bias when estimating for the spatial autoregressive parameters in the regression equation (i.e., the ??s), as well as the spatial autoregressive parameters in the disturbances (i.e., the ??s). Since the other estimators that were implemented are robust to heteroskedasticity, they show much smaller bias than MLE. Under the weak identification cases, LQ-GSFIVE performs very close to MLE in both Sce- narios I and II that we considered. In particular, we note that LQ-GSFIVE shows considerable ef- 87 ficiency gain over LQ-GS3SLS, and that LQ-GSLIVE outperforms LQ-GS2SLS under the weak identification too. In general, similar observations hold when we compare GS2SLS with GSLIVE and GS3SLS with GSFIVE. These results highlight the advantage of using approximated optimal instruments when constructing the moments and thus better exploits the underlying structure of parameters in the reduced form model. In addition, we see that LQ-GSLIVE (LQ-GSFIVE) in general outperforms GSLIVE (GSFIVE) under weak identification cases. This observation is in line with existing literature that quadratic moments could help with identification when linear moments are weak. Not surprisingly, such efficiency gain in finite sample is not significant under strong identification cases. In Chapter A.4, we document additional simulation results of settings with the ?dumbbell- shaped? design of spatial weights matrices and correlated xn?s. In general, the above observations also hold in those cases. 88 89 Table 2.7: Median and RMSE of Scenario I under Heteroskedasticity, Parameter Constellation 1-4 Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Para 1 Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.162 0.028 0.163 0.029 0.154 0.025 0.152 0.028 0.152 0.026 0.162 0.030 0.155 0.028 0.155 0.028 0.154 0.025 ?11,1 0.300 0.227 0.078 0.303 0.045 0.307 0.043 0.302 0.047 0.301 0.039 0.298 0.046 0.300 0.042 0.301 0.046 0.300 0.038 ?11,2 0.000 0.042 0.053 -0.015 0.039 -0.010 0.034 -0.003 0.035 -0.003 0.032 -0.011 0.037 -0.006 0.035 -0.006 0.035 -0.005 0.032 ?11 -0.200 -0.089 0.119 -0.233 0.144 -0.234 0.145 -0.211 0.097 -0.212 0.096 -0.219 0.127 -0.218 0.139 -0.218 0.136 -0.210 0.122 ?12 0.100 0.035 0.109 0.099 0.123 0.092 0.121 0.085 0.132 0.085 0.125 0.090 0.134 0.091 0.141 0.095 0.134 0.095 0.121 TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Para 2 Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.164 0.030 0.163 0.030 0.155 0.026 0.152 0.028 0.152 0.027 0.162 0.030 0.155 0.030 0.154 0.029 0.154 0.026 ?11,1 0.300 0.233 0.072 0.304 0.045 0.307 0.043 0.303 0.044 0.301 0.038 0.299 0.043 0.300 0.041 0.302 0.044 0.301 0.039 ?11,2 0.200 0.242 0.051 0.185 0.039 0.190 0.036 0.196 0.036 0.197 0.031 0.190 0.039 0.194 0.035 0.194 0.036 0.195 0.033 ?11 -0.200 -0.092 0.117 -0.233 0.147 -0.233 0.145 -0.211 0.097 -0.213 0.097 -0.219 0.130 -0.217 0.138 -0.216 0.139 -0.208 0.124 ?12 0.100 0.042 0.105 0.097 0.125 0.090 0.119 0.084 0.131 0.084 0.124 0.088 0.133 0.090 0.144 0.099 0.133 0.100 0.123 TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Para 3 Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.155 0.032 0.165 0.035 0.155 0.033 0.152 0.035 0.152 0.033 0.164 0.036 0.155 0.037 0.156 0.035 0.154 0.034 ?11,1 0.300 0.269 0.042 0.302 0.040 0.305 0.037 0.301 0.040 0.300 0.037 0.300 0.041 0.302 0.037 0.299 0.040 0.299 0.037 ?11,2 0.000 0.018 0.039 -0.018 0.044 -0.010 0.041 -0.003 0.042 -0.003 0.040 -0.015 0.043 -0.008 0.042 -0.005 0.043 -0.004 0.041 ?11 0.200 0.150 0.076 0.177 0.121 0.178 0.119 0.205 0.099 0.199 0.098 0.183 0.111 0.189 0.123 0.193 0.121 0.198 0.103 ?12 0.100 0.062 0.089 0.104 0.110 0.095 0.107 0.109 0.109 0.096 0.110 0.099 0.109 0.102 0.118 0.098 0.121 0.095 0.106 TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Para 4 Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.159 0.034 0.166 0.036 0.155 0.035 0.152 0.035 0.152 0.034 0.165 0.037 0.156 0.039 0.156 0.036 0.154 0.034 ?11,1 0.300 0.275 0.037 0.303 0.040 0.305 0.038 0.302 0.040 0.300 0.036 0.301 0.039 0.303 0.038 0.300 0.041 0.299 0.036 ?11,2 0.200 0.210 0.035 0.183 0.045 0.190 0.041 0.197 0.042 0.197 0.040 0.185 0.044 0.191 0.041 0.194 0.043 0.196 0.039 ?11 0.200 0.146 0.078 0.177 0.122 0.179 0.122 0.205 0.102 0.199 0.097 0.183 0.113 0.189 0.124 0.196 0.119 0.199 0.106 ?12 0.100 0.072 0.086 0.102 0.109 0.094 0.105 0.108 0.108 0.094 0.107 0.097 0.109 0.101 0.118 0.101 0.118 0.100 0.112 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. 2.8 Concluding Remarks In this chapter, we proposed a new class of generalized method of moments (GMM) esti- mators for the simultaneous equation models (SEM) with higher order network interdependence. In essence, these estimators utilize approximations to the optimal instruments towards construct- ing the linear moments, in the same spirit to Lee (2003) and Kelejian, Prucha, and Yuzefovich (2004). We also considered the GMM estimators that utilize both the linear moments and the quadratic moments that originate from the scores of the log-likelihood function. Towards deriv- ing the estimators, we showed that (1) the linear parts of the ML scores can be viewed as a set of estimator generating equations from which a generic form of instrumental variable (IV) estima- tors can be derived. This result extends on its relevant counterparts in Hausman (1975), Hendry (1976) and Prucha and Kelejian (1984) in the context of classical SEMs; (2) The new estimators incorporate the underlying ideas of the LIVE and the FIVE estimators proposed by Brundy and Jorgenson (1971) in the context of classical SEMs. Towards constructing the instruments, the new estimators take into account the nonlinear a prior parameter restrictions in the reduced form when estimating for the expected value of the endogenous components. Furthermore, our new GMM estimators that utilize both the linear and the quadratic moments remain robust to het- eroskedasticity of unknown form and computationally feasible even when sample size (i.e., the size of the networks) becomes large. The Monte Carlo results show that the new GMM estimators outperform their existing counterparts, e.g., the 2SLS-type and 3SLS-type estimators considered in Drukker, Egger, and Prucha (2022), when the instruments are weak. 90 Chapter 3: Empirical Application: Demand Estimation for Retail Gasoline Mar- ket with Network Dependence 3.1 Introduction In this chapter, we illustrate the empirical relevance of the considered model with spa- tial/network interactions and estimation methods with approximated optimal instruments, i.e., the GSLIVE and the GSFIVE estimators. Specifically, we estimate a demand system with a spatial network component. The example we choose is the retail gasoline market of several sub- regions of Greater Vancouver, Canada. We seek to estimate the station-level demand elasticities as well as the (spatial) elasticity of substitution under a variety of popular network structures based on different proximity measures. Demand elasticity for gasoline at aggregate level are well documented in the literature, while estimates at station level are relatively scarce, perhaps due to data limitations.1 We also compute the impact measures in spirit of Anselin et al. (2001) that interprets the estimates of the coefficients on the exogenous regressors in network models. Changes in gasoline prices attract a lot of attention from consumers and regulators for sev- eral reasons. Firstly, households spend a considerable share of their income on gasoline products. According to U.S. Energy Information Administration (EIA), in 2017, the average U.S. house- 1For some of the recent works on aggregate demand elasticity, see Hughes et al. (2008), Park and Zhao (2010), Levin et al. (2017), among others. 91 hold expenditure on gasoline is $1, 977, which is equivalent to about 4% of annual household expenditure. For Canada, gasoline (including diesel) also accounts for about 50% of total energy expenditure per household. Second, pricing of gasoline in the retail market is relatively transpar- ent and but changes frequently. It is common to observe intra-day fluctuations in retail prices for over 5%. Classical spatial competition models, e.g., the Hotelling model, typically assume that the cost of switching stations is related to the physical location of firms. Thus firm locations are often treated as the unique aspect of product differentiation in these models. Our application fits the context of spatial competition models reasonably well. Regular retail gasoline is a nearly homogeneous good in terms of chemistry contents. However, gas stations differ in terms of ge- ographical location as well as station attributes, e.g., retail brand and menu of services offered at the stations, and thus create product differentiation. In the context of retail gasoline markets, consumers face travel cost or search cost when switching between gasoline stations. This allows retailers to exercise local market power and generate price dispersion. In other words, these costs lead consumers to consider nearby gas stations as close substitutes after controlling for station characteristics or perceived product quality (e.g., brand name). Given that spatially differenti- ated gas stations compete with the neighboring stations in price, we think that the equilibrium prices and sales volume of all stations are simultaneously determined in a competitive system. Therefore, it could be desirable to account for the network structure explicitly when modeling the demand system for the retail gasoline markets. We consider a theoretical model of spatial competition based on Pinkse and Slade (2004), in which sellers are downstream firms and buyers are households or individuals. This model is simple yet flexible enough to allow for spatial dif- ferentiation among individual stations and thus captures the main feature of spatial competition 92 in this market. We then deduce the econometric specification for the demand equation that maps to the simultaneous equation model with spatial dependence (SE-SARAR) that considered in the previous chapters of the dissertation. For the study of competition in the retail gasoline market, properly defining the extent of market is important for obtaining consistent estimates of structural parameters. For our applica- tion, this issue is equivalent to properly specify the spatial network matrix Wn that appear in the demand equation. However, as discussed in the theoretical chapters of this dissertation, construc- tion of the spatial network matrix often requires prior knowledge of the market as well as certain assumptions about the nature of the competition. For example, in the literature, empirical re- searchers typically assume that individual stations only compete with their immediate neighbors but not stations that located further than certain threshold distance, e.g., 2-miles radius. In this ap- plication, we consider six different metrics in constricting the spatial network matrices that based on different measures of closeness. For example, we consider measures like common boundaries, same street dummies, travel distance, nearest neighbors and other hybrid measures. We find that metrics based on related measures often yield similar results for the own-price elasticity as well as the spatial elasticity of substitution. As noted in, e.g., Anselin et al. (2001), interpreting the estimated coefficients of regres- sors of exogenous variables are less direct in the presence of network structure Wn?s than the regressions without network interactions at individual level. For example, if the change in k-th exogenous variable of station i has an effect via spatial correlation on other stores and they, in turn, feedback to store i, the corresponding coefficient say ?k does not denote the total effect of a unit change in xk on the endogenous variables, e.g., sales volume. Therefore, we compute the direct and the indirect impact measures that have been adopted by the spatial literature for 93 consistent interpretations of the coefficients of station and market characteristics in our spatial network model. The rest of this chapter is organized as follows: in Section 3.2, we discuss the literature on price and spatial competition in retail gasoline markets. We highlight how our application relates to and differs from these existing works. In Section 3.3, we present our adaptation of the theoretical model considered in Pinkse and Slade (2004) as well as the econometric specification that will be used for estimation. We also describe in details the six different measures of closeness and their corresponding spatial network matrices in this section. Due to limited space however, we focus on the common boundary measure in the main text and document results associated with other metrics to the Chapter B. In Section 3.4, we describe the data set that we assembled from various sources, including the online platform Gasbuddy.com, the marketing agency Kalibrate, as well as the 2016 Canadian Census. In Section 3.5, we discuss the instruments that we will be using to address the endogeneity of the prices. Section 3.6 documents the estimation results as well as the impact measures computed based on these estimates. Section 3.7 concludes. 3.2 Related Literature The literature on price competition in retail gasoline markets is vast.2 One major line in the literature concerns the observed dynamics of retail prices. Eckert (2003), Noel (2007) and Doyle, Muehlegger, and Samphantharak (2010) find that Edgeworth cycles in retail gasoline prices are associated with lower market concentration and greater presence of independent stations that operate convenience stores. Noel (2007) and Atkinson (2009) use high frequency station-level 2We only provide a partial review here due to limited space. For a more comprehensive review, please refer to, e.g., Eckert (2013). 94 retail price data and find that price cuts in an Edgeworth cycle are typically initiated by smaller stations and price restorations are lead by larger brands. Barron, Umbeck, and Waddell (2008) and Slade (1992) focus on dynamic price responses between neighboring stations. Atkinson, Eckert, and West (2014) studied the price pattern and volatility changes in Canadian cities with high-frequency price data obtained from GasBuddy.com. Compared to these studies, our empiri- cal study abstracts from the dynamic aspect of stations? pricing strategies but with an emphasize on the network generated cross-sectional interdependence in demand. As discussed in below, our study also differs from the existing empirical works that focuses on spatial price competition and/or strategic pricing behaviors among individual stations. There have been many works studied determinants of price dispersion and uniformity, uti- lizing station level price data. With city level data, Sen (2003, 2005) reveal that wholesale prices and share of smaller firms in the market play major roles in determining retail price levels. There are mixed results about the effect of local concentration measure on retail prices. Van Meer- beeck (2003), Barron, Taylor, and Umbeck (2004), Eckert and West (2004) and Go?tz and Gugler (2006), among others, find that higher station density, measured by number of stations within a radius of certain travel distance, is associated with lower prices and a lower level of price dis- persion. While Hosken, McMillan, and Taylor (2008) report no association between local station density and price when all brands are included in the regression, but distance to the closest sta- tions shows a positive sign on price. With a spatial lag model Pennerstorfer (2009) finds that competition-increasing effect of independent retails is muted by a ?composition effect?, which implies branded stations can charge higher prices when a local market is populated by unbranded stations that are perceived to be of lower quality. There are also a number of works adopt con- centration measure not necessarily related to travel distance, but based on spatial adjacency. 95 Pennerstorfer and Weiss (2013) document that increase in spatial clustering of same brand sta- tions reduces degree of competition between firms and increases equilibrium prices. Clemenz and Gugler (2006) find evidence that concentration within a station?s ZIP code is associated with higher margins. Using price data in Sheffield, England, Ning and Haining (2003) also find ev- idence of positive association between a station?s price and prices of stations in the same local cluster. These works motivate the specifications of the spatial network matrices Wn in our empir- ical study. In general, we construct our Wn?s with measures of closeness based on the presence of immediately adjacent stations (common boundary), a radius of 2-mile travel distance, or the presence of other stations on the same street. Overall, we find that the spatial parameter is positive and significant for the specifications based on common boundaries or same street measures, but tend to be smaller for measures based on travel distance.3 We interpret these results as evidence that stations tend to compete more rigorously with their neighbors or those stations located along a main commuter route, rather than all stations within an arbitrary area. However, as discussed, the econometric model we adopt differs from these works so this should not be interpreted as a direct comparison. Eckert and West (2004, 2005), Ning and Haining (2003), Barron, Taylor, and Umbeck (2004) and Hosken, McMillan, and Taylor (2008), among others, also find evidence between the services offered at retail stations and price levels. We would control for these factors as station characteristics in this empirical study. Instead of aggregate demand elasticity, we focus on the demand estimation at the level of individual stations. However, results at station level are scarce, possibly due to data limitation.4 Houde (2012) used a relatively long panel of station-level prices and sales volume data for Quebec 3To see this, one may compare results in Table 3.5 and Table B.7 in the Chapter B.4. 4For example, it is often costly to obtain accurate data on sales volume at station level even at monthly frequency. 96 City to estimate a structural spatial model taking into account road-network and commute patterns of residents.5 6 In the current study, we compliment the price data with sales data which enables us to estimate the price elasticities. In comparison to Houde (2012), we use a much simplified framework with network structure being explicitly specified. We note that our results imply that demand is less elastic than that in Houde (2012). One possible explanation is that his multi- address model allows consumers to purchase gasoline from stations along commuting route in addition to those near their home and thus each station in the market faces more competition. From modeling perspective, his model allows for a more flexible substitution pattern. In contrast, we adopt, e.g., the common boundary measure when construct network matrix W , and thus the underlying assumption is more align with the single address model in Houde (2012)?s terms. However, one redeeming merit of the current framework is much less demanding on data for road structure and auxiliary information, e.g., traffic flows, commute pattern of residents, etc., that may be unavailable for many interested markets. Of course, the current model is also much less computational expensive. Therefore, the class of network models considered in this paper may still be appealing in early stages of policy analysis on a differentiated product market. We will provide more details in below after presenting our econometric model. 5For work involves estimation of a structural model, see also Manuszak (2010) uses data on volumes, prices and characteristics for stations in Maui and Kauai to estimate a model designed for upstream merger. 6Houde (2012) focus primarily on the multi-address model mainly since it allows consumers to purchase gasoline from stations along commuter routes or shopping paths. Thus it is reasonable to assume that multi-address model is more realistic than the traditional single-address model which assumes consumers to purchase gasoline only from stations near residence. 97 3.3 Model 3.3.1 Theoretical Motivation Following Pinkse and Slade (2004), the demand model is based on a linear-quadratic indirect- utility function in which the prices of the differentiated products as well as individual incomes have been normalized by the price of the outside good. For the ease of aggregation to station- level demands, we assume the individual indirect-utility functions are in Gorman polar form and thus aggregation does not depend on the distribution of (unobserved) consumer heterogeneity or income levels. In this setting, the aggregate-demand equation can be shown to be linear in both (log) normalized prices and income.7 Demand for station i is then given by ? ln(qi,n) = ai,n + ? wij,nln(pj,n) + ?ln(yi,n), i = 1, . . . , n (3.1) j where wij,n is the ij-th element in n?n matrix Wn and pj,n is the j-th element in the (normalized) vector of prices pn, yi,n denotes the (aggregate) income of the consumers within the census tract on which the station i resides. For generality, it can be assumed that ai,n is a linear function of the station i?s characteristics xi,n, i.e. ai,n = xi,n?x. As suggested in, e.g., Pinkse et al. (2002) and Pinkse and Slade (2004), the diagonal ele- ments of Wn, which can be interpreted as the own-price elasticities, are also assumed to depend on the station characteristics, or wii,n(xi,n). The off-diagonal elements of Wn represent cross- price elasticities and are assumed to be functions of a vector of measures of the distance between stations by some metric. In Pinkse et al. (2002)?s semi-parametric settings, wij,n is assumed to 7Please refer to Chapter B.1 for more details. 98 ? be a series of the form k ?kdk,ij where dk,ij is the k-th basis function of some distance measure and ?k is its parameter to be estimated. Hence, to adapt their model to our parametric scheme, we normalize Wn = (wij,n) such that w ?ii,n = b and wij,n = ?wij,n for i =? j, where b, ? are parameters and w?ij,n?is a (nonlinear) function of some distance measu?re. In other words, we de- compose the term ? j wij,nln(pj,n) into the sum of bln(pi,n) and ? ? j ?=i wij,nln(pj,n). Thus b represents the own-price elasticity of station i and ? is the spatial autoregressive parameter that represents the cross-station elasticity of substitution. In the following, we assume elements in Wn are exogenous and w 8ii,n = 0 for i = 1, . . . , n by abuse of notation. Collecting ai,n = xi,n?x and ?ln(yi,n) into Xi,n?, we can then stack the demand schedule (3.1) for individual stations over i and rewrite the demand system with matrix notation: ( ) ln qn(pn) = bln(pn) + ?Wnln(pn) +Xn? + un, (3.2) where Xn includes a constant term and the set of station as well as local market characteristics. In addition, un captures unobserved station or regional characteristics. We note that un can be heteroskedastic and correlated across stations. To cope with these possibilities, we assume that un follows a spatial autoregressive process: un = ?1Wnun + ?n, where Wnun captures the cross-sectional correlation between stations. Of course, the estima- tion theory we proposed in the previous chapters accommo?date this specification. We assume8Note that in (3.1), wii,n ?= 0 in general. In the decomposition of ? ?j wij,nln(pj,n), wii = 0 and thus the Wn in (3.2) in fact should be denoted as W ? ?n . We drop the for the presentation that follows. 99 E[?i,n|Xn,Wn] = 0, which in turn implies that the unobserved characteristics un are mean inde- pendent of the observed characteristics, i.e., E[ui,n|Xn,Wn] = 0. For the supply side, we assume each station faces a marginal cost of the form ci = exp(x c i,n? + vi), (3.3) where xci,n represent factors that shifts marginal costs of station i. In principle, we can allow for the marginal cost factors to be spatially correlated. In other words, let xk,n be one ?basic? factor and one column in xcn could be Wnxk,n. Above formulation allows for the existence of a Nash equilibrium in pure strategies, such that the prices satisfy the first-order condition dpi,n pi,n = ci,n ? qi,n, (3.4) dqi,n which in turn can be derived as the price response function from a profit-maximizing problem of standard form: maxp ?i(pn) = (pi,n i,n ? ci,n)qi,n(pn)? Fi,n with the quantity of demand is given by ? ? ? qi,n(p, n) = exp??i + bln(p ?i,n) + ? wij,nln(pj,n) + xi,n? + ui,n , j as discussed and the term Fi,n denotes the fixed cost. Note that dpi,n q = p /b under our dq i,n i,ni,n 100 specification, and thus one may alternatively work with the log-transformed (3.4): ln(pi,n) = ? + ln(ci,n) = ? + x c i,n? + vi,n, (3.5) where ? = ln(1 + 1). b 3.3.2 Econometric Specification In light of above discussion and (3.2), we consider the following model specification of the supply and demand system with network component ln(qn) = bln(pn) + ?Wnln(pn) +Xn? + un, (3.6) un = ?1Wnun + ?n. For easier interpretation, we adopt a log-log specification of the demand equation. We let ln(qn) denote the (log) quantity of demand at price ln(pn); Xn collects the common regressors that are expected to shift the demand equation and it includes the constant. The parameters of primary interest are b and ?. As in the classical system of supply and demand, the b in this case can be interpreted as the (market-average) own-price elasticity. As we will explain below, ? captures the competition intensity in the market. In spirit of Pinkse and Slade (2004), ?Wn can be viewed as an approximation for the cross-price elasticities between stations. As in most of the empirical IO studies, we rely on the supply-side instruments to address the endogeneity of (log) prices, ln(pn). 9 Details of the instruments will be discussed in Section 3.5 below.10 9See e.g., Berry, Levinsohn, and Pakes (1995) and Nevo (2001). See also MacKay and Miller (2021) for a more detailed discussion of identification strategies via supply-side instruments, demand-side instruments and the covariance structure of a supply and demand system. 10The price equation is modeled in a reduced form fashion as ln(pn) = X1,n?1 + Z1,n?2 + vn, where X1,n 101 The deviation from a classical supply and demand system comes with the inclusion of spatial lags Wnln(pn) and the spatial coefficients ?. As remarked before, the spatial weights matrix Wn is specified with some measure of proximity between units. We experiment with six different metrics that appear in the literature:11 1. A binary W1 = (w1,ij) with w1,ij = 1 is i and j are sharing a common boundary (explained below); 2. A binary W2 = (w2,ij) with w2,ij = 1 is i and j are with 2-miles of travel distance and wij,n = 0 otherwise; 3. A numerical W3 = (w3,ij) with w3,ij being the reciprocal of the travel distance between station i and station j, within the adjacent census blocks of i. The implicit assumption is that stations that are geographically close compete more vigorously; 4. A binary W4 = (w4,ij) with w4,ij = 1 if j is i?s nearest neighbor, where closeness is measured by travel distance. Note that this is the most local measure and the relationship need not to be symmetric. By construction, a station competes directly with only one rival; 5. A binary W5 = (w5,ij) with w5,ij = 1 if stations i and j are on the same street and zero otherwise. This construction reflects the empirical findings that most competition occurs along main streets and commuter routes;12 denotes the included instruments (with a column of constant) and Z1,n denotes the excluded instruments. Note the supply schedule of individual firms are typically unobserved and not possible to specify. Hence, a supply equation in tradition form ln(qsn) = b sln(psn) + X1,n?1 + Z1,n?2 + vn is not always justifiable in oligopoly settings and thus bs is not readily interpretable. Hence we abstract from modeling the supply-side in a structural way. 11For example, Pinkse and Slade (1998) considered six different metrics based on both Euclidean distance as well as common street/boundary measures. The metrics considered in this application resembles those considered in their paper. However, we consider the travel distance between stations rather instead of the Euclidean distance, with the former better reflects the local traffic conditions and road structures. 12See, e.g., Houde (2012) and Pennerstorfer (2009), among others. 102 6. A numerical W6 = (w6,ij) based on combined measures of W3 and W5. Under this scheme, w6,ij is the product of w3,ij and w5,ij , i.e., the reciprocal of the travel distance between station i and j that are located on the same street with pairwise travel distance less than (or equal to) 2-miles. The underlying assumption concerning competition is that stations primarily compete with other stations located on the same street, and that the intensity of competition diminishes with travel distance. For the ease of presentation, we focus on the specification with W1,n in the main text. Addi- tional results associated with other weight matrices are documented in the Chapter B.4. Figure 3.1 below illustrates the first construction for the market of Vancouver area. Each individual station is marked by a red cross. The edges are segments that bi-secting the distance between any two stations, conditional on there is no other stations in between. Therefore, two stations sharing the same edges are treated as closest neighbors in the first specification of Wn. We color each sub-market area with respect to the (log of) local population density, measured by counts of residence per square kilometer. A polygon with darker color means an area of higher population density. In second specification of Wn, wij,n = 1 if the travel distance between station i and j is smaller than 2-miles, and wij,n = 0 otherwise. In the third specification, we consider some continuous measure and let elements in Wn to depend on the pair-wise travel distance between stations located in the adjacent census blocks. The fourth specification is the most local one, as we assume stations only compete with their closest neighbors. In other words, each station only has one competitor under this construction. The fifth specification requires additional information about the local road structure. In particular, we treat station i and j located along the same street 103 Figure 3.1: Market Area based on Common Boundaries (up to 3 blocks away) as competitors and assign wij,n = 1. We let wij,n = 0 otherwise. The last specification builds on W3 and W5. Specifically, we restrict competition to stations located on the same street as in W5 but in addition allow such competition to decay with travel distance between station i and j. As a common practice in the empirical literature, we normalize these Wn by its maximum row sum when comes to estimation. We emphasize that all Wn?s need not to be symmetric by construction. Specification of the demand equation is in line and extended based on the network demand equation specified in Pinkse and Slade (2004), where they analyze the effects of mergers on brand competition and pricing in the UK brewing industry. In their specification, the own-price and the 104 cross-price elasticities are captured by a n ? n matrix Bn. The term Bnln(pn) in their model is equivalent to bln(pn) + ?Wnln(pn) in above model (3.6). They estimated the cross-product de- mand elasticity matrix semi-parametrically and utilized the information on the proximity between brands based on product similarities (e.g., alcohol content, flavor, etc.). Our construction shares the same spirit in that Wn can be viewed as being constructed with some measure of inverse travel distance between stations. Note that this proximity measure is the key factor of differentiating products in the retail gasoline market, given that gasoline is largely homogeneous in terms of con- tent. The difference is that, we reparameterize the elasticity of cross-station substitution matrix as ?Wn and thus ? can be viewed as an measure of competition intensity between neighboring stations. Indeed, our construction is more restrictive than theirs as b is a single parameter and thus captures only the average own-price elasticity of the entire market. Also, since Wn is pre-defined based on some distance measure, it is likely to be less flexible than the semi-parametric construc- tion considered in Pinkse and Slade (2004) and thus miss some of the substitution pattern. These observations suggest potential interest to develop semi-parameter or non-parametric estimation methods for this class of general network SEMs in future research. 3.4 Data The data set consists of several components from different sources and we now describe each in turn. We collect retail prices of regular gasoline at station level from GasBuddy.com. Each gas station is associated with a unique ID number that enables us to collect price information at a set frequency with an algorithm. The gasoline prices are in Canadian cents. We collect daily 105 data for 151 retail stations that belong to several sub-regions in the Greater Vancouver area, including Vancouver City and adjacent suburbs.13 The sample covers two periods: 09/12/2019 to 10/16/2019 and 03/11/2020 to 04/08/2020 . Approximately 91% of the stations appear in our sample get one or more report on price information within a day. The rest 9% of the stations are typically located on roads with less traffic and thus price information could only be updated every a few days for some stations.14 The sales volume data is proprietary and purchased from Kalibrate Ltd., a leading marketing and consulting firm.15 The observations are the (monthly) sales volume of the 151 stations for the survey period of September 2019 and March 2020. Existing works rely on survey data often rely on the price information in the survey. Since these surveys are often conduct only at monthly frequencies, such price information is incomplete and may be misleading. The issue could be mitigated if one works with a long panel span many months or the distribution of prices among stations are unchanged in terms of ranking. To alleviate such concerns, we compute for the average of the daily prices for stations in the sample matched for the survey periods of the volume data. The final data set is a cross-sectional monthly price and volume data for the 151 stations in the sample. Table 3.1 below provides some summary statistics on the price and sales variables in the sample. Since our price data is an average over the sampling period, one may worry about the effect of price movements on consumers? decisions and thus the demand. One such example is the Edgeworth cycle generate by price wars between gas stations in the local market. If consumers 13The complete list of regions include Vancouver City, Burnaby, New Westminster, North Vancouver, Port Co- quitlam, Port Moody, and Richmond. 14Atkinson (2008) discussed sample selection issues with consumer-reported data (e.g., Gasbuddy). In general, he finds that consumer reported data are in general reliable for answering questions that require daily and major brand station prices. However, it is risky to use them to study issues concerning within-day price dynamics. 15Kalibrate acquired Kent Marketing Ltd. recently, which was a consulting firm specialized in Canadian petroleum industry. 106 perceive that stations are actively undercutting their prices to match their competitors, they may delay purchases and wait for the price to be lower. On the other hand, if consumers expect the price war will end soon or prices would be increasing in near future, they may stock up some inventory now while prices are low. In Figure 3.2 below, we plot the average station level retail price in this market during the period of August 1st to November 30th, 2019 as well as the period of February 1st to May 31st, 2020. The shaded area indicates the time periods of our empirical exercise, i.e., the time period for which we collect the sales volume data. During the two periods, although there are significant trends of prices, there is no significant pattern of the Edgeworth cycle that has been widely documented in the literature during late 90s and early 2000.16 In Chapter B.2, we provide a more detailed discussion and additional result on margins. As suggested in, e.g., Eckert (2003) and Noel (2007), greater presence of independent stations should lead to more cycling activity and less sticky pricing, as they often initiate a undercutting process and generate price movements in markets akin to an Edgeworth cycle. In light of this, one possible explanation for the price pattern showed in Figure 3.2 is the declining share of independent stations in Vancouver since 1990s. In the current sample, only 10% of the stations are of minor brands or being independent.17 In addition, since the upward trend is reasonably mild during the period, we assume that intertemporal effect on consumers? purchases is of minor concern. In constructing the instruments for prices discussed below, we also collect the rack prices for Suncor and Shell that are sold in local distribution terminals.18 In the retail gasoline market of 16For relevant literature, see Eckert (2003), Noel (2007), Noel (2007), Noel (2008), Atkinson (2009), among others. 17Table 3.1 documents the list of major brands that operate in Vancouver. 18Parkland is now the only operating refinery in this area, which supplies Chevron and Esso, along with several other minor brands. Unfortunately, rack prices for Parkland is not available. 107 Vancouver, stations typically are supplied by major refineries under some type of contract.19 For example, Petro-Canada stations are operated and supplied directly by Suncor. We assume most independent stations purchase supplies on spot market.20 Table 3.1: Summary Statistics of Retail Prices, Sales Volume Mean Std Dev Mean Std Dev September 2019 Retail Price Sales All 155.10 3.15 12.91 0.59 Shell 155.84 2.80 13.11 0.53 Suncor 155.38 2.23 12.92 0.41 Chevron 156.33 1.14 12.89 0.57 Husky 154.84 2.05 12.48 0.72 Rack Price Suncor 90.35 Shell 90.47 March 2020 Retail Price Sales All 97.30 3.68 12.49 0.73 Shell 96.42 2.81 12.70 0.52 Suncor 98.03 3.93 12.52 0.44 Chevron 97.05 2.68 12.39 0.94 Husky 96.97 4.16 12.26 0.88 Rack Price Suncor 49.50 Shell 49.55 1 Prices are in Canadian cents; sales volume are in logged liters. We collect information on station-characteristics such as availability of car wash, service station, size of convenience store from both Gasbuddy.com and the data provided by Kent Mar- keting Ltd. Table 3.2 reports the fraction of stations overall and of the four major brands in our 19Slade (1998) lists four different vertical arrangements between a supplier and a station displaying its brand of gasoline in Canadian markets: (1) company owned and operated stations; (2) commissioned agent stations; (3)lessee dealer stations and (4) dealer owned stations. 20See Houde (2012) for a similar assumption. 108 (a) August - November, 2019 (b) February - April, 2020 Figure 3.2: Dynamics of Average Retail Price 109 sample that offer each type of service. As suggested in the literature, these variables of station characteristics could partially explain the price levels observed at individual stations. We would control for car wash and service station in the regression as dummy variables and size of con- venience store as a continuous one. These three variables are treated as the common regressors in both the demand equation and the price equation. We abstract from controlling for brand- fixed effects as they appear to be less important in the another set of preliminary results and the relatively small sample we am working with. Table 3.2: Summary Statistics of Station Characteristics All Shell Suncor Chevron Husky Character variable C-store 90.07 90.91 87.93 92.86 92.86 Car Wash 12.58 22.73 20.69 0.00 14.29 Service Station 9.27 9.09 17.24 0.00 0.00 1 Fraction of stations with each type of facility; numbers in percentage. For the variables that potentially shift consumer demand, we collect data of the number of drivers (unit of measure is 1, 000 driver per census tract), median income of residents (in 1, 000 Canadian Dollars), fraction of long-distance commuters as well as a measure of transportation mode for each census tract, from the 2016 Canada Census Database. We note that there are 126 census tracts in the sample and there are at most 4 gasoline stations within a single census tract. Moreover, 71% of the census tract in the data set has only one station and 37% of the stations have no neighboring station within the same tract. Therefore, regional variables at census tract level may provide some additional ?quasi-station specific? variations in cross-section. In the data, there are four level of commuting distance: (1) commute within census subdivision (CSD) of residence; (2) commute to a different CSD within census division (CD) of residence; (3) commute 110 to a different CSD and CD of residence and (4) commute to a different province or territory, with increasing distance of daily transportation. With the number of commuters of each type for each census tract (or census subdivision (CSD)), we construct an index assigning weight of 1 to the first type commuter, and 2 to the second type commuter, etc. Thus the index can be viewed as a proxy of average daily commuting distance for an census tract. The presumption is that long- distance commuters naturally demand more gasoline if they drive. A drawback of this argument is that the commuters do not necessarily refuel their tanks near their place of residence, and thus preferably a multi-address model should be considered, as showed in Houde (2012). In principle, such additional information could be embodied in the network matrix Wn or a sequence of Wn?s that represent different layers of networks. Unfortunately, due to data availability, we abstract from this extended analysis. Finally, we also computed the fraction of commuters that drive to work (?Travel Mode?), given that some of them would choose alternative mode of transportation and thus demand less gasoline (e.g., take public transportation or just walk). Another limitation of the data is that we do not observe the transportation mode taken by the long-distance commuters. However, given that public transportation is usually available at some densely populated areas and such areas accounts for only a small fraction of census tracts in the sample, we argue that this last concern could be minor. 3.5 Instruments and Identification In the model, we assume managers of gasoline stations know the unobservable (to econo- metricians) station specific tastes and the unobservable cost factors before they choose the retail prices, which is thus likely to be correlated with the unobservables. Instrumental variables are 111 used to address this endogeneity concern. Price is determined by marginal costs and markups, and valid instruments may shift either component. In light of this, we consider two types of supply-side instruments. The IVs belong to the first type may be viewed as related to some measure of market power. Specifically, we include number of competitor and the average size of competitor (measured by the number of pumps), following Houde (2012). As discussed in, e.g., Pennerstorfer and Weiss (2013), stations with fewer neighboring competitors tend to exhibit larger (local) market power and hence higher ability to exercise higher markup. In addition, inspecting the sample suggests that larger stations (measured by the number of pumps) tend to be vertically integrated and thus may have stronger local market power.21 Moreover, as suggested in the 2019 market report by Kent Marketing Ltd. (now part of Kalibrate), larger stations also tend to be newer and have better amenities, which may also contribute to their local market power.22 In the context of retail gasoline market, station i?s direct rival(s) often compete with another set of stations simultaneously. The sub-market served by station i may also (partially) overlap with the sub-market served by its direct rivals. As such, rivals? characteristics (and potentially through interaction with their prices) may affect station i?s demand through network interactions. For example, high price set by a high quality rival (e.g., station with high brand loyalty) will have a different effect on station i than a high rival price set by a low quality rival (e.g., an independent station). The model we considered does not rule out this endogeneity and we need additional instruments. The instruments we consider is the station/sub-market characteristics of the indirect 21The average number of pumps for Shell, Chevron, Husky and independent brands in the sample are 8.5, 8.8, 7.4, and 6.6, respectively. 22Report can be purchased at https://kalibrate.com/insights/report/data-intelligence/ 2019-fuel-census/. 112 rivals of station i, i.e., the direct rivals of station i?s rivals that do not compete directly with station i. This approach follows closely Fan (2013) in the context of newspaper market, in which both prices and product characteristics are assumed to be endogenous. One main similarity between the retail gasoline market and the newspaper market considered in Fan (2013) is that, firm A has a (partially) overlapping sub-market with firm B, which in turn compete with its direct rivals in the other overlapping (sub)-markets. The intuition for why the characteristics of indirect rivals can be used as instruments is as follows. Consider three stations A, B and C, such that A compete directly with B and B compete directly with C, but A does not compete directly with C. The variation in station/sub-market characteristics of station B influence the demand for station B and thus affect its prices. Because station A and B are competitors, B?s decision on prices affects A?s decision. Station C?s characteristics would also affect B?s decision in an analogous fashion, but since C and A are not compete directly, C?s characteristics would not affect A?s demand. In summary, we assume that variation in C?s characteristics would shift B?s prices, in a way that should not affect A?s demand.23 Specifically, the instruments we consider include the station characteristics (dummies of car wash and service station) of A?s indirect rivals, as well as their count and average sizes. Table 3.3: Correlation of Station Characteristics in Neighboring Markets Car Wash Service No. nb Avg. Size Nb Correlation 0.252 0.311 0.591 0.046 Note that the availability of car wash and service station of station i are included instruments. Table 3.3 reports the correlation between the included and the excluded instruments. Specifically, we report the correlation between the availability of car wash and service station of station i?s 23In spatial/social network literature, this type of instruments is often referred to as neighbor?s neighbor?s charac- teristics. 113 indirect rivals and those of station i, as well as the correlations between the count (and average size) of station i?s indirect rivals and those of station i?s direct rivals. In general, the included instruments are not highly correlated with the excluded instruments. Heuristically, these results give us some confidence that the excluded instruments affect endogenous prices differently than the included counterparts and thus affect the sales volume primarily through prices. The second type of supply-side IVs can be viewed as cost-based ones, as they are related to the rack prices at which independent stations could purchase inputs from local outlets operated by major brands. This set of instruments include the interaction terms of presence of branded stations with the rack price of those brands as well as an interaction term of brand and travel distance to the corresponding distribution terminals.24 Following Houde (2012), we also exploit the fact that there exists a small amount of cross-sectional dispersion in the posted rack prices of the Shell and the Suncor outlets located in this area. We construct instrumental variables that focus only on Shell and Suncor rack prices, assuming unbranded retails purchase gasoline from the spot market from a supplier with lowest price. In particular, for each station i, we construct an instrumental variable that interacts Shell?s rack price with a dummy variable equal to one if another Shell station was located within the same neighborhood. Note that the definition of local market depends on the specification of spatial matrices Wn or specifically, wij = 1 if station i and j are located within the same neighborhood. A similar instrumental variable is constructed for Petro-Canada stations, which are operated by Suncor. These two sets of instruments capture two sources of variation that are correlated with price: the presence of Shell and other vertically integrated stations in the same locality, and the dispersion of their rack prices. Finally, the inter- action term of brand and travel distance to the corresponding refinery exploit the cross-sectional 24Shell still operates a local terminal for distribution of petroleum products. 114 variation in travel distance from refineries to retail stations. If a station belongs to a brand, it may get favorable prices of inputs than the independent stations that are assumed to purchase inputs from spot market. The source of lower costs could either be pre-planned and optimized route of transportation or other types of internal discounts that are unobserved to econometricians. Table 3.4 documents the first-stage OLS results and F-statistics for the main specification for all six constructions of Wn?s. In the main specification, we includes all of the supply-side instruments described above, including characteristics of station i?s direct rivals as well as the direct rivals of i?s direct rivals net of i?s direct rivals (i.e., station i?s indirect rivals) and cost- based IVs discussed above. In the Chapter B.3, we also report the results for specification (2) that focuses on the characteristics of i?s direct rivals and the cost-based IVs as well as Specification (3) that focuses on the characteristics of i?s indirect rivals, along with results based on alternative specifications of W ?s.25n In general, we see that selected instruments are reasonably strong in light of the F-statistics.26 27 3.6 Estimation Results and Impact Measures Table 3.5 below documents the main estimation results based on the first specification of the spatial network matrix Wn discussed above. We compare the estimates with different sets of supply-side instruments, under the columns of specifications (1) - (3). Table B.6 - Table B.10 in the Chapter B.4 document the estimation results based on the rest of the specifications of the spatial networks listed above, i.e., W2 to W6. More detailed definitions of variables are reported 25The same set of specifications are adopted in reporting the estimation results in the next section. 26Note that in the settings with a single endogenous variable the Kleibergen and Paap (2006) Wald statistic is equivalent to a heteroskedasticity-robust F-statistic. The explicit expression is given in the Chapter B.3. 27To the best of our knowledge, tests for over-identifying restrictions (e.g., Hansen?s J-test) have not been formally developed for spatial SEMs at this moment. 115 Table 3.4: First-stage OLS Regression and tests for IV power W1 W2 W3 W4 W5 W6 No. Nb -0.041 -0.081?? -0.020?? -0.041 -0.041 (0.039) (0.020) (0.004) (0.037) (0.037) Avg. Nb Size 0.030 -0.034?? 0.012 0.065?? -0.013 -0.013 (0.043) (0.005) (0.011) (0.020) (0.027) (0.027) No. Indirect Nb -0.142?? -0.050?? 0.008? 0.010 0.010 (0.017) (0.010) (0.004) (0.023) (0.023) Avg. Indirect Nb Size 0.104 -0.197?? -0.144 0.056? 0.151?? 0.151?? (0.075) (0.054) (0.099) (0.020) (0.024) (0.024) Indirect Nb. Car Wash -0.066?? -0.104?? -0.033?? 0.034?? -0.016 -0.016 (0.022) (0.012) (0.004) (0.008) (0.011) (0.011) Indirect Nb. Service 0.024? 0.035?? -0.052?? 0.026?? -0.203?? -0.203?? (0.009) (0.005) (0.011) (0.003) (0.033) (0.033) Suncor X Rack 0.163?? 0.137?? 0.034? 0.092?? 0.189?? 0.189?? (0.052) (0.037) (0.013) (0.027) (0.030) (0.030) Shell X Rack 0.055? 0.102?? 0.104 0.157?? -0.028 -0.028 (0.024) (0.026) (0.062) (0.033) (0.023) (0.023) Dist Refinery 0.012?? 0.044 -0.315 0.145?? 0.141?? 0.141?? (0.002) (0.200) (0.206) (0.018) (0.017) (0.017) Period Dummy Yes Yes Yes Yes Yes Yes Weak IV (F-stat) 36.49 59.81 65.24 46.54 27.80 27.80 ?2 crit-val (5%) 20.53 20.53 20.53 19.86 20.53 20.53 Sample size 302 302 302 302 302 302 1 Hetero-robust SEs are in parenthesis; * (p < 0.05), ** (p < 0.01). 2 We report effective fist-stage F-statistics based on Olea and Pflueger (2013) and Stock-Yogo weak ID test critical values. in Table B.11. In addition, we test for the presence of the network spillovers in the data, based on the test proposed in Liu and Prucha (2018). The details of this test statistics and relevant results are reported in Table B.13. In general, we find strong evidence of the presence of spatial interdependence in prices in the demand equation. 3.6.1 Main Estimation Results With specification (1), the OLS estimate of the demand elasticity is bias downward in terms of magnitude and the spatial autoregressive parameter ?1 (on term Wln(price)) in the demand 116 system is not significant. The IV estimates of demand elasticity and spatial auto-regressive pa- rameters are close in magnitude and are in general significant. This gives additional confidence to the estimates as both ??LQ?GS2SLS and ??LQ?GSLIV E are expected to yield similar results at least when IVs are reasonably strong. The demand elasticity is estimated to be ranging from ?5 to ?6, depending on the set of IVs used in the estimation. These estimates are lower than those reported by Houde (2012) in magnitude. Recall that Houde (2012) reports an average store-level price elasticity of demand between ?22 and ?15 for the multi-address model, depending on IV choices. This could relate to two differences in the model specification. First, towards construct- ing the network matrix W , we do not account explicitly for the possibility of purchasing gasoline along the route of daily commute or shopping trips. In Houde (2012)?s terminology, the current model should be viewed as a variant of ?single-address? model.28 Although this specification is popular in the literature, it likely to bias the demand elasticity downwards (in magnitude) since it only allows for substitution among closest rivals and exclude those located further away but along a connected street or highway of high traffic flows. For the single-address model, Houde (2012) reports estimates ranging from ?15 to ?6, and thus closer to our estimates. Second, Houde (2012) adopts demand specification of random coefficient logit, which allows for a more flexible substitution pattern than the log-linear form we assumed.29 Recall that in specification (3), we focused on the characteristics of indirect rivals when in- strument for prices at own stations. One question is that whether we have included the proper set of indirect rivals when constructing the instruments. With W1,n, such a concern could be 28In essence, the ?single-address? model assumes a consumer? trip to gas stations always starts from his/her home. This construction omits the possibility of re-routing to a gas station located near daily commute route or shopping routes. 29To the best of our knowledge, accommodating flexible forms of demand schedule in a spatial model has not been well explored in the literature. 117 reasonable, observing that the demand elasticity in specification (3) is somewhat smaller than that obtained under specification (2), in which we focus on the characteristics of direct rivals. One possible explanation is that the number of indirect rivals? of own station under W1,n is larger than the number of stations that indirectly affect station i?s demand in reality. In our sample, the average number of indirect rivals of each station is 14. Hence, the variations of the average characteristics of these indirect rivals may partially masked the variations that fits our story about using the characteristics of indirect rivals. In the context of current model, such ?over-couting? could be driven by the fact that W1,n partially ignores the traffic and road conditions in local markets, e.g., i and j may share a common boundary by our construction but they do not com- pete with each other since there is a highway/river between them. Obviously, such hypothesis could be tested by varying construction of Wn?s, which controls the scope of direct and indirect neighbors of station i. We notice that under alternative constructions of Wn, above observations could be reversed. For example, the scope of indirect competition under W3,n is more local than W1,n, in the sense that it takes into account the travel distance between stations and thus put more ?weights? on the close neighbors in modeling the competition. Also, W4,n is constructed based on the closest neighbor only and thus the scope of competition is much smaller, with the number of direct competitors being 1. Under these constructions, the aforementioned concern of over-counting indirect rivals is much alleviated. The results (see Table B.7 and B.8) show that de- mand estimates using characteristics of indirect rivals (i.e., specification (3)) are larger than those obtained by using only characteristics of direct rivals (i.e., specification (2)). Above discussion calls for a systematic way of identifying sub-markets. In the context of retail gasoline markets, recent contributions include, Perdiguero and Borrell (2012), Bantle et al. (2018), and Ulrick et al. (2020). However, to our knowledge, an universally accepted approach, especially in the literature 118 of spatial econometrics, has not been developed. As remarked above, parameter b can be interpreted as the market average own-price elastic- ity. In the spatial literature, ? typically captures the intensity of spillovers among cross-sectional units. In model (3.6), the matrix ?Wn is equivalent to the cross-price elasticity matrix in Pinkse and Slade (2004). Furthermore, recall that we normalize Wn by its maximum row sum, wij,n in general is not a binary entry. For example, if the maximum row sum of Wn is 10, then wij,n = 0.1 of i and j are neighbors by the selected metric (i.e., wij,n ?= 0). Hence, ? can then be viewed as a market weighted-average measure of cross-price elasticity between stations. Also note that wi,.,npn gives the weighted average prices of stations that are neighboring to station i. For the W1,n, its ij-th element wij,n = 0.083. Hence, an estimate of ??1 = 3.72 (e.g., ??LQ?GS2SLS in specification (1)) implies that, on average, 1% price increase in a neighboring station j would imply, on average, a 0.31% (3.72%? 0.083) increase in sales to station i. 3.6.2 Impact Measures Interpreting the estimates on non-price regressors are less direct in the presence of network structure Wn?s than the regressions without network interactions at individual level. Specifically, if the change in k-th exogenous variable of station i has an effect via spatial correlation on sales at the other stores and they, in turn, feedback to store i, the corresponding coefficient, say ?k, does not denote the total effect of a unit change in xk on the sales of station i. The final effect is sometimes referred to as the Average Total Direct Impact, following LeSage and Pace (2009)?s terms. The direct impact measure o?n the (log) sales of station i associated with its k-th exogenous variable in X is computed as 1 n ?Eln(qi)n i=1 . Table 3.6 reports the estimated Average Totaln ?xi,k 119 Direct Impact for variables of station/neighborhood characteristics that appear in the demand equation, for both W1,n and W2,n. In light of the formulae, it should be clear that the magnitude of impact measures depend on the specific construction of Wn?s. For interpretation, we focus on the results obtained with W1,n as an example. Under the W1,n, adding car wash service at station i would decrease the demand at station i by 3.8% ? 4.0% (depending on the estimator) and adding a service station is associated with 1.38% ? 1.51% lower demand. In additional, larger convenient stores seems to be correlated with lower sales volume, with about 4.3% lower in sales for 100-square meter (about 1, 000 square feet) space increase of C-store. The direct impact of 1000 more drivers in the census-tract that station i locates on is associated 3.4% - 4.4% increase in sales volume. In contrast, the direct impact of 1, 000 more Canadian dollar income in the census-tract that station i locates on implies 0.6%?3.2% decrease in sales volume for station i. One possible explanation for this seemingly contradictory result is that the price charged by stations located in neighborhoods of higher income tend to be higher and hence some of the residents tend to opt for stations located along their commuter route for lower prices.30 In addition, one can computed the impact on sales at station i if xk of j-th (j ?= i) station changed and propagates through the network. Following LeSage and Pace (2009), we refer to this as the Average Total Impact from an Observation. In our context, one may interpret this measure as the average impact of the a (unit) change in a station characteristic at station j to demand at station i. The impact on th?e (log)?sales of station i associated with the k-th exogenous variable of station j is computed as 1 n n ?Eln(qi)j=1 i=1 . Under W1,n, when station i competes with stationn ?xj,k j that offers car wash service, it would lower the demand at station i by about 10%?13%, taking into account network propagation. Similarly, addition a service station at j is associated with 30See Houde (2012) for a similar argument. 120 10% ? 15% deduction demand at station i. In additional, demand at station i is 3.8% ? 4.0% lower if it competes with a station with 100 square meter larger C-store. Sales volume at station i is 3.4% - 4.4% higher if the census tract of its competitor has 1000 more drivers, after accounting for the propagation through the network. In contrast, the indirect impact of 1, 000 more Canadian dollar income in the census-tract that station i?s competitors located on implies 0.6% ? 3.2% decrease in sales volume for station i. As remarked, these estimated impact measures depend crucially on the construction of network matrix Wn. 121 122 Table 3.5: Estimation Results with W based on common boundaries (1) (2) (3) OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ln(price) -3.807?? -5.945? -5.952? -6.063? -6.036? -5.082? -5.012? (1.242) (2.432) (2.382) (2.714) (2.689) (2.768) (2.429) W*ln(price) 2.090 3.723? 3.734? 4.242 4.224? 3.190? 3.210? (1.361) (1.811) (1.725) (2.179) (1.847) (1.437) (1.429) Car wash 0.038 0.026 0.048 0.028 0.075 0.047 0.052 (0.111) (0.126) (0.157) (0.127) (0.150) (0.116) (0.139) Service Station 0.051 0.077? 0.081? 0.082? 0.089? 0.060 0.057 (0.037) (0.038) (0.036) (0.038) (0.037) (0.038) (0.036) C-store Size 0.050 0.046 0.047 0.047 0.048 0.050 0.054 (0.035) (0.035) (0.034) (0.036) (0.035) (0.035) (0.033) No. Drivers 0.025? 0.044? 0.034? 0.043? 0.026 0.048? 0.050?? (0.012) (0.015) (0.014) (0.015) (0.015) (0.015) (0.014) Med Income 0.005 -0.032 -0.006 -0.035 0.000 -0.095 -0.025 (0.233) (0.246) (0.281) (0.247) (0.287) (0.246) (0.276) Commute Dist. -1.877? -1.769 -1.793 -1.762 -1.827 -1.897 -1.893 (0.902) (1.018) (1.455) (1.028) (1.496) (1.020) (1.430) Travel Mode -0.016? -0.017? -0.019? -0.016? -0.014? -0.020? -0.022? (0.006) (0.007) (0.010) (0.007) (0.011) (0.007) (0.010) ?1 0.261?? 0.127 0.445 0.130 0.444 0.127 0.445 (0.074) (5.749) (2.351) (4.041) (2.247) (4.258) (2.517) Period Dummy Yes Yes Yes Sample size 302 302 302 R-square 0.15 1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), ** (p < 0.01). wij = 0.13 if i and j are neighbors Table 3.6: Impact Measures ??LQ?GS2SLS ??LQ?GSLIV E Direct Indirect Direct Indirect W1 Demand Side Var. Car wash -0.0401 -0.0988 -0.0378 -0.1260 Service Station -0.0138 -0.1048 -0.0151 -0.1518 C-store Size -0.0433 -0.0392 -0.0440 -0.0379 No. Drivers 0.0439 0.0439 0.0340 0.0340 Med Income -0.0315 -0.0315 -0.0056 -0.0056 W2 Demand Side Var. Car wash -0.0926 -0.0924 -0.0671 -0.0669 Service Station 0.0132 0.0133 -0.0261 -0.0261 C-store Size -0.0975 -0.0973 -0.1548 -0.1546 No. Drivers 0.0532 0.0532 0.0522 0.0522 Med Income -0.1592 -0.1592 -0.1235 -0.1235 1 W1:network based on closest neighbors ;W2: network based on neighbors with 2- miles radius. 2 Estimates are based on regression specification (1). 3 Direct refers to the the ?Average Total Direct Impact?; Indirect refers to the ?Average Total Impact from an Observation?. 3.7 Concluding Remarks In this application, we estimate a demand equation with network dependence for (a sub region of) the retail gasoline market in Vancouver, Canada. The data set includes observations on both the sales volume and the station level prices, along with station and census tract level characteristics. We find that the own-price elasticity is between ?12 and ?4, depending on the set of IVs and the specific construction of the network matrices Wn, which governs the degree of competition to some extent. The cross-station price elasticity is in general between 0.6 ? 6, depending on the network specification. These results are largely consistent with economic theory. However, Houde (2012)?s multi-address model reports own-price demand elasticity to be 123 within the range of ?22 to ?15, which is much more elastic than our estimates. If we ignore the possible unobserved city level or market level fixed effects, one possible explanation is that we build networks based on different sources of information. Houde (2012) exploits heavily on the local traffic pattern and road structure and thus allow a station to compete with another located far away if they are located on a main commute route or a segment of highway. Our approach of constructing the spatial network matrices lies in line with the majority of existing literature. 31 Our proximity measures exploit the information of stations? neighborhoods or the pair-wise travel distance between close-by stations. Thus, we implicitly assume that the competition is largely local. Allowing for a more flexible network structure in the current econometric framework is one possible direction for future works. We also computed the direct and indirect measures associated with the station and neighborhood characteristics. In general, the number of drivers in local neighborhood and the portion of drive to work residents are important factors for demand in retail gasoline. Finally, we do find that the estimation results are heavily influenced by the construction of network matrices, which is a well documented phenomenon in empirical spatial literature. 31See, e.g., Pennerstorfer (2009), Pennerstorfer and Weiss (2013), Pinkse et al. (2002), among others. 124 Appendix A: Appendix to Chapter 2 A.1 Appendix: Example expression of EYn For the ease of presentation, we drop the subscripts n on matrices when appropriate and subscript 0 on parameters in below. Consider the following SE-SARAR model consists of two equations, i.e., G = 2: [ ] y1 =b21y2 + [?11,1W1 + ?11,2W2] y1 + c11x1 + c21x2 + ?1, y2 =b12y1 + ?22,1W1 + ?22,2W2 y2 + c32x3 + c42x4 + ?2, ?? ? ? ?? ?11,1W1 + ?11,2W2 b21In ?? ?? c11In c21In 0 0 ?and thus ? ?B = ?? ??, C? = ?? ??. b12In ?22,1W1 + ?22,2W2 0 0 c32In c42In We consider an approximation to Ey = [Ey? ? 1,Ey ? 2] ? up to the second order: 2 (B?)sC?s=0 x, 125 where x = vec(X) and X = [x1, x2, x3, x4]. In particular, note that ?? ?? c B?11 11 c B?? ? 21 11 b21c32In b21c42In ??B C = ?? ?? , ? b12c11In b12c21In c32B?22 c B?42 22 ?? ( ) ( ) ? 2 2 c11 b21b ? 12In +B11 c21 b21b12In +B ? ? 11 b21c32 (B11 +B ? ? ? ? ? 22 ) b21c42 (B ? 2 11 +B22) ? (B ) C = ?? ( ) ( ) ? , b c (B? +B? 2 2 ? 12 11 11 22) b ? 12c21 (B11 +B ? ) c ? ?22 32 b21b12In +B22 c42 b21b12In +B22 with B?11 = ?11,1W1 + ?11,2W2, (A.1) B?22 = ?22,1W1 + ?22,2W2, (A.2) ?2B = ?2 W 2 + ?2 211 11,1 1 11,2W2 + ?11,1?11,2(W1W2 +W2W1), (A.3) 2 B?22 = ? 2 2 2 2 22,1W1 + ?22,2W2 + ?22,1?22,2(W1W2 +W2W1). Consequently ? ? ?2 ? ? ? ??? D11 D12 D13 D14 ??D = (B )sC = ?? , (A.4) s=0 D21 D22 D23 D24 126 with ??? ?2 sD11 = B? ?11 + b12b21In c11 ( s=0 ) =?In + ?11,1W1 + ? ?W + ?2 2 2 211,2 2 11,1W1 + ?11,2W2 + ?11,1?11,2(W1W2 +W2W1) c11??2 sD12 = B? ?11 + b12b21In c21 ( s=0 ) = In + ?11,1W1 + ?11,2W + ? 2 W 22 ( 11,1 1 + ? 2 2 11,2W2 + ?11,1?11,2(W1W2 +W) 2W1) c21 D13 =(In +B11 +B22) b21c32 = (In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2) b21c32 D14 =(In +B11 +B22) b21c42 = (In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2) b21c42 D21 =(In +B11 +B22) b12c11 = (In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2) b12c11 D22 =?(In +B11 +B22) b12c?21 = In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2 b12c21?2 s D23 =? B?11 + b ?12b21In c32 ( s=0 ) =?In + ?22,1W1 + ? 2 2 2 222,?2W2 + ?22,1W1 + ?22,2W2 + ?22,1?22,2(W1W2 +W2W1) c32??2 ?sD ?24 = B11 + b12b21In c42 ( s=0 ) = I 2 2 2 2n + ?22,1W1 + ?22,2W2 + ?22,1W1 + ?22,2W2 + ?22,1?22,2(W1W2 +W2W1) c42. Given above results and recall that x = vec(X) with X = [x1, x2, x3, x4], it follows that the second order approximation to Ey = [Ey? ? ?1,Ey2] can be expressed as ? ? ?? (? ) ? 2 ? 2 ?s? s=0B11 + b12b21In (c11x1 + c ? ? ? ? 21 x2) +((?In +B11 +B22) (b21c)32x3 + b21c42x4) ??(B )sC x = ?? . s=0 ? ? s(In +B11 +B22) (b c x + b c x ) + 2 12 11 1 12 21 2 s=0B ? 22 + b12b21In (c32x3 + c42x4) 127 With expressions in (A.4) and the expanded terms, we thus can express the approximated Ey1 and Ey?2 as?? ?2 sEy1 ? B?11 + b12b21I ?n (c11x1 + c21x2) + (I ? ?n +B11 +B22) (b21c32x3 + b21c42x4) [ s=0 ] = (1 + b12b21)In +[?11,1W + ? 2 2 2 2 1 11,2W2 + ?11,1W1 + ?11,2W2 +]?11,1?11,2(W1W2 +W2W1) (c11x1 + c21x2) + In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2 (b21c32x3 + b21c42x4) =c11(1 +( b12b21)x1 + c21(1 +) b12b21)x2(+ b21c32x3 + b21c42)x4 (A.5) + c11 ?1[1,1W1 + ?11,2W2 x1 + c21 ?11,1W1 +]?11,2W2 x2 + b21c32 [(?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2]x3 + b21c[42 (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2 x4 ] + c11 [? 2 W 211,1 1 + ?11,1?11,2(W1W2 +W W ) + ? 2 W 22 1 11,2 2 ]x1 + c ?2 221 11,1W1 + ? 2 2 11,1?11,2(W1W2 +W2W1) + ?11,2W2 x2, 128 and ?? ?2 Ey2 ? s (In +B ? 11 +B ? 22) (b12c ? ? ?11x1 + b12c21x2) + B22 + b12b21In (c32x3 + c42x4) [ ]s=0 [ = In + (?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2 (b12c11x1 + b12c21x2) + (1 +] b12b21)In +?22,1W1 + ?22,2W2 + ? 2 22,1W 2 + ?2 21 22,2W2 + ?22,1?22,2(W1W2 +W2W1) (c32x3 + c42x4) =b12c11x1 +[ b12c21x2 + c32(1 + b12b21)x3 + c42(1 +] b12b21)x4 (A.6) + b12c11 [(?11,1 + ?22,1)W1 + (?11,2 + ?22,2)W2]x1 + b12c(21 (?11,1 + ?22,1)W)1 + (?11,2 (+ ?22,2)W2 x2 ) + c32 [?22,1W1 + ?22,2W2 x3 + c42 ?22,1W1 + ?22,2W2 ]x4 + c 2 2 2 232 [?22,1W1 + ?22,1?22,2(W1W2 +W2W1) + ?22,2W2 ]x3 + c 2 2 2 242 ?22,1W1 + ?22,1?22,2(W1W2 +W2W1) + ?22,2W2 x4. In light of (A.5) and (A.6), we see that the second order approximation for Ey1 and Ey2 are sums over terms of the generic form ?(?)W s1W s2l x or equivalently W s1W s2x ?(?), (A.7) 1 l2 k l1 l2 k with ?(?) denotes a (scalar) function of structural parameters, s1, s2 = 1, 2, l1, l2 = 1, 2 and k = 1, . . . , 4. Recall that X = [x1, x2, x3, x4], the second order approximation for Ey1 in (A.5) 129 can be written as ? ? Ey 0 01 ? W1W2X?1,(1,0),(2,0) + W 1W 01 2X?1,(1,1),(2,0) l1=1,l2?=2 l1=1,l2?=2 + W 01W 1 2X? 2 0 1,(1,0),(2,1) + W1W2X?1,(1,2),(2,0) l1=?1,l2=2 l1=?1,l2=2 + W 0 21W2X? 1 1 1,(1,0),(2,2) + W1W2X?1,(1,1),(2,1) l1=?1,l2=2 l1=1,l2=2 + W 1W 12 1X?1,(2,1),(1,1) ?l1=?2,l2=?1 = W s1 s2? ?? ? l W1 l X?2 1,(l1,s1),(l2,s2), (A.8)l1,l2 s1=0 s2=0 s1+s2?2 with [ ]? ?1,(1,0),(2,0) = [(b12b21 + 1)c11, (b12b21 + 1)c21, b21c32, b21c42 , ]? ?1,(1,1),(2,0) = [?11,1c11, ?11,1c21, b21c32(?11,1 + ?22,1), b21c42(?11,1 + ?22,1)] ,? ?1,(1,0),(2,1) = [?11,2c11, ?11,2c21, b21c]32(?11,2 + ?22,2), b21c42(?11,2 + ?22,2) ,? ? 2 21,(1,2),(2,0) = [?11,1c11, ?11,1c21, 0, 0] ,? ? 2 21,(1,0),(2,2) = [?11,2c11, ?11,2c21, 0, 0 , ]? ?1,(1,1),(2,1) = [?11,1?11,2c11, ?11,1?11,2c21, 0, 0]? ?1,(2,1),(1,1) = ?11,1?11,2c11, ?11,1?11,2c21, 0, 0 , and other ?1,(l1,s1),(l2,s2)?s being restricted to zeros. One can check that, in above, the generic form of a summand is ?4 W s1W s2 s1 s2l1 l X?2 g,(l1,s1),(l2,s1) = Wl Wl xk?1 2 g,(l1,s1),(l2,s1),k, k=1 130 where ?g,(l1,s1),(l2,s1),k denotes the k-th element in ?g,(l1,s1),(l2,s1), s1, s2 denotes the power of each Wp and index g corresponds to yg that we are approximating for. We note that the summands W s1 s2l W1 l x2 k?g,(l1,s1),(l2,s1),k appear in the right hand size of above equation conforms with the generic form given in (A.7). We also note that, above expression allow for both W1W2xk?g,(1,1),(2,1),k and W2W1xk?g,(2,1),(1,1),k as summands that appear in the second order approximations. Analogously, the second order approximation to Ey2 can then be expressed as ? ? Ey2 ? W 0 01W2X? 1 02,(1,0),(2,0) + W1W2X?2,(1,1),(2,0) l1=1,l2?=2 l1=1,l2?=2 + W 0W 1X? + W 2W 01 2 2,(1,0),(2,1) 1 2X?2,(1,2),(2,0) l1=?1,l2=2 l1=?1,l2=2 + W 0 2 1 11W2X?2,(1,0),(2,2) + W1W2X?2,(1,1),(2,1) l1=?1,l2=2 l1=1,l2=2 + W 1 12W1X?2,(2,1),(1,1) ?l1=?2,l2=?1 = W s1 s2? ?? ? l W1 l X?2 2,(l1,s1),(l2,s2), (A.9)l1,l2 s1=0 s2=0 s1+s2?2 131 with [ ]? ?2,(1,0),(2,0) = [b12c11, b12c21, (b12b21 + 1)c32, (b12b21 + 1)c42 , ]? ?2,(1,1),(2,0) = [b12c11(?11,1 + ?22,1), b12c21(?11,1 + ?22,1), ?22,1c32, ?22,1c42] ,? ?2,(1,0),(2,1) = [b12c11(?11,2 + ?22,2),]b12c21(?11,2 + ?22,2), ?22,2c32, ?22,2c42 ,? ? 2 22,(1,2),(2,0) = [0, 0, ?22,1c32, ?22,1c42] ,? ? 2 22,(1,0),(2,2) = [0, 0, ?22,2c32, ?22,2c42 , ]? ?2,(1,1),(2,1) = [0, 0, ?22,1?22,2c32, ?22,1?22,2c42]? ?2,(2,1),(1,1) = 0, 0, ?22,1?22,2c32, ?22,1?22,2c42 , and other ?2,(l1,s1),(l2,s2)?s being restricted to zeros. Combing (A.8) and (A.9), the second order approximation to EY is then a special case of ?? ? EY ? W s1l W s2 l X?1 2 (l1,s1),(l2,s2), (A.10) l1,l2 ?s1=?0?s2=?0 s1+s2?2 with ?(l1,s1),(l2,s2) = [?1,(l1,s1),(l2,s2),?2,(l1,s1),(l2,s2)] and some of the ?(l1,s1),(l2,s2)?s are restricted to zeros. Note that above expression conforms with and is a special case of (2.31) as stated in Section 2.3.3. Since we can cover the cases where l1 = 1, l2 = 2 and l1 = 2, l2 = 1 with choices of s1 and s2, we can write (A.10) more explicitly by ? ? ? ? EY ? s1 s2 s1 s2n ? ?? ?W1 W2 X?(1,s1),(2,s2) +s1=0 s2=0 ?s1=?0?s2=?W2 W1 X?(2,s1),(1,s2).0 s1+s2?2 s1+s2?2 132 A.2 Proofs of Chapter 2 A.2.1 Preliminary Results In addition to the notations introduced in the main text, we introduce the following two types of matrices that are needed in deriving the scores of the log-likelihood functions. Commutation Matrix: For any matrix A of size n ? m, the commutation matrix Kn?m is defined by vec(A?n) = Kn?mvec(An). It is readily checked that ? ? ? ???? Im ? i ?1,n ?? K = ??? . ?n?m .? . ???? Im ? i?n,n observing that A? = [a? , . . . , a? ] and (I ? i? ?n 1.,n n.,n m j,n)vec(An) = aj.,n. For simplicity, we would refer this Kn?m as the commutation matrix on An. Stacked Blocked Matrices: Let An = (Agk,n) be a nG ? nK matrix with blocks Agk,n of dimension n ? n, and let Bn = [B? , . . . , B?1,n H,n] be an nH ? n matrix with blocks Bh,n of dimension n? n, then vecn(An, Bn) = [tr(A11,nB1,n), . . . , tr(A11,nBH,n), . . . , tr(AG1,nB1,n), . . . , tr(AG1,nBH,n()A, ..1. .1,) tr(A ?1K,nB1,n), . . . , tr(A1K,nBH,n), . . . , tr(AGK,nB1,n), . . . , tr(AGK,nBH,n)] 133 denotes the GKH ? 1 vector composed of the traces of the n? n blocks of the matrix ?? ???? A11,nB1,n ... A1K,nB1,n ???? ???? .. . . .. ??? . . . ??? ? ? ??? A11,nBH,n ... A1K,nBH,n ?? .. ? . . . . ... ? ? ? ? ? ?? ? ?? A ?G1,nB1,n ... AGK,nB1,n? ?? ?? .. . . . ..? . . ??? ? ? AG1,nBH,n ... AGK,nBH,n As a special case, let Bn = In and thus vecn(An, In) = vecn(An) = [tr(A11,n), . . . , tr(AG1,n), . . . , tr(A1K,n), . . . , tr(AGK,n)] ? denotes the GK ? 1 vector composed of the trace of the n ? n blocks. Note that for n = 1 we have Agk,n = agk, and vec1(An) = vec(An) where vec(.) denotes the standard operator of vectorization. We also note that vec (A?1n n , Bn) with A ?1 n = (A gk n ) is covered as a special case of the above definition, with Agk,n replaced by Agkn . To save on notations, we will suppress the subscript n when not necessary. We used the following lemmata in deriving the scores of the log-likelihood function. Lemma 2. Let A be some matrix whose elements depend on some vector ? = [?1, ..., ? ?m] . ? Assume A to be nonsingular, then ?ln|A| = vec(A?1 )? ?vec(A) . ?? ?? Proof 134 By Corollary 29 of Dhrymes and Guerard (1978), ?|A| ?= |A|vec(A?1 )? ?vec(A) . Further note ?? ?? that ?ln|A| = 1 ?|A|| | , the statement of the Lemma then follows by combining these two results.?? A ?? Lemma 3. Let A be a matrix of size p? q then ?vec(A? ? In) [ ] = Ip ? (Kn?q ? In)(Iq ? vec(In)) Kp?q. ?vec(A) Proof By the formula on page 55 of Magnus and Neudecker (2019) 1, we can write vec(A? ? In) = [Ip ? (K ?n?q ? In)(Iq ? vec(In))]vec(A ) (A.12) = [Ip ? (Kn?q ? In)(Iq ? vec(In))]Kp?qvec(A). It then follows that ?vec(A? ? In) [ ] = Ip ? (Kn?q ? In)(Iq ? vec(In)) Kp?q ?vec(A) as claimed. Lemma 4. With ? being the column vector consists of nonzero upper diagonal elements of ??1(?), we have ?u?(?)R?(?)(??1(?)? In)R(?)u(?) = vec(E(?)?E(?))?L?, ?? 1The formulae states: let A be p ? q and B be m ? n. We then have vec(A ? B) = (Iq ? T )vec(A) = (H ? Im)vec(B), where T = (Kn?p ? Im)(Ip ? vec(B)), H = (Iq ?Kn?p)(vec(A)? In). 135 where E(?) = vec(?(?)) with ?(?) = R(?)u(?). Proof We would utilize the following two propositions in Dhrymes and Guerard (1978): ? Proposition 89: Let A1, A2, A3 be suitably dimensioned matrices. Then tr(A1A2A3) = vec(A? ?1) (A ? 3 ? I)vec(A2). ? Proposition 98: Let A be m? n, and X be n?m; then ?tr(AX) = A?. If X is a function of ?X the elements of the vector ?, then ?tr(AX) = ?tr(AX) ?vec(X) = vec(A?)? ?vec(X) . ?? ?vec(X) ?? ?? Applying Proposition 89 in Dhrymes and Guerard (1978), we then can write u?(?)R?(?)(??1(?)? In)R(?)u(?) =vec(E(?))?(??1(?)? In)vec(E(?)) =tr(E(?)?E(?)??1(?)) We next apply Proposition 98 in Dhrymes and Guerard (1978) to obtain ?tr(E(?)?E(?)??1(?)) ?tr(E(?)?E(?)??1(?)) ?vec(??1(?)) = ?? ?vec(??1(?)) ?? = vec(E(?)?E(?))?L? ?1 where by definition L = ?vec(? (?))? . 2 The statement of the Lemma now follows.?? Lemma 5. Let A = (Aij) be a nG ? nG matrix with blocks Aij of dimension n ? n and let 2Note that the selector matrix w.r.t. ?, L? , is of dimension G2 ?G(G+ 1)/2. 136 B = (Bh) be a nH ? n matrix with blocks Bh of dimension n? n. We then have ? [ ] [ ]KH[G?G IHG ? (I ? vec(I ) ? G n )(K ? ? n?G ? In) (IG ?B)? InG vec(A ) = tr(A11B1), ..., tr(A11BH), ..., tr(AG1B1), ..., tr(AG1BH), ]? ..., tr(A1GB1), ..., tr(A1GBH), ..., tr(AGGB1), ..., tr(AGGBH) = vecn(A,B), where KHG?G is the commutation matrix on any HG?G matrix T such that vec(T ?) = KHG?Gvec(T ). Proof By definition of commutation matrix Kn?G, we have ?? ??? I ?? G ? i ?1,n Kn?G = ???? . .. ???? ??? IG ? i?n,n where ii,n is the ith column of In. Hence, K ?n?G = [IG ? i1,n, ..., IG ? in,n]. Therefore, (IG ? vec(I )?n )(K ? ?n?G ? In) = (IG ? vec(In) )[IG ? i1,n ? In, ..., IG ? in,n ? In] = [IG ? vec(In)?(i1,n ? In), ..., IG ? vec(I )?n (in,n ? In)] = [IG ? i?1,n, ..., IG ? i?n,n] where the last equality follows because (i?i,n ? In)vec(In) = vec(Inii,n) = ii,n. 137 Also note that [IG ? i?1,n, ..., IG ? i?n,n](Bh ? InG) =[(I ? i?G 1,n)(b11,h ? InG) + (IG ? i?2,n)(b21,h ? InG) + ...+ (IG ? i?n,n)(bn1,h ? InG), ..., (IG ? i? ? ?1,n)(b1n,h ? InG) + (IG ? i2,n)(b2n,h ? InG) + ...+ (IG ? in,n)(bnn,h ? InG)] =[I ? ? ?G ? b.1,h, IG ? b.2,h, ..., IG ?B.n,h] where b.i,h denotes the i-th column of Bh. 138 It then follows that ? [ ] [ ]KHG?G IHG ? (I ? vec(I )? ?G n )(Kn?G ? In) (IG ?B)? InG v?ec(A?)? ??? B1 ? InG ??? ?? ??? ?? ?? . ?.. ??? ? ?? IG ? i ? 1,n, ..., IG ? i? ??n,n ?????? BH ? InG ?? =K ? . .HG?G ??? . ??????? ? . . . ???? ? vec(A?) ? ? ? ?? I ?G ? i1,n, ..., IG ? i? ?n,n ???? B ?1 ? InG ??. ? HG diagonal blocks ???? .. ???? ? ?? BH ? InG? G? ?diagonal blocks ??? I ?B? ... I ?? G 1.,1 G ?B ?n.,1 ????? .. . . .. ?? . . . ??? ? ? ? I ?B? ?? G ? ? 1.,H ... IG ?Bn.,H =K ? ? ? ? ?? ? . . ? HG G ? . ??? vec(A ? ? )???? IG ?B ? 1.,1 ... I ?B? ?G n.,1 ? ?? ?? .. . . ?? . . ... ??? ? ?? I ?B? ? ? G 1.,H ... IG ?Bn.,H ? ? ?G di?agonal blocks ? ??? tr(A?(i i?? 1,G 1,G ?B ? ? ? ? 1)) ???? ???? tr(A (i1,Gi1,G ?B1)) ????? .? . . ? ? ?? ? ??? ... ???? ?? tr(A? ? ? ?? (i1,Gi ? G,G ?B?1)) ???? ??? tr(A ?(i ? ?1,Gi1,G ?BH)) ? ?? ? ? ..? . ?? ???? ? ? ? ? ??? . ?? . . ???? tr(A?(i i? ?B? ? ? ?1,G ? ?1,G H)) tr1(A39(iG,Gi1,G ?B1)) ??? ??? ?? ? ?? .. ? ???? .. ? ?? ? . . ???? ? ? ?? ?? tr(A ?(i ? ? ? ? ?1,GiG,G ?BH)) ??? ??? tr(A (iG,Gi ?1,G ?BH)) ?? =K ?HG?G ?????? ... ?? ???? = ???? ..? . ?? ? ?? (A.13)?? tr(A ?(i ? ? ? ? ? ? ?G,Gi1,G ?B1)) ?? ?? tr(A (i1,GiG,G ?B1))? ??? ? .. ? . ??? ? ? ?? .. ?. ? ? tr(A?(i i? ?B?? )) ?? ?? ??? ?? ? ? ?? ??G,G G,G 1 tr(A (i i? ?B? ?1,G G,G H))? ????? .. ?. ???? ?? ? ?? . ? .. ??? ??? ?? tr(A ?(iG,Gi ? 1,G ?B?H)) ? ???? ? ? ? ? ? ?? .. ?? ? ?? tr(A (i ?G,GiG,G ?B1)) ?? ? . ? ???? .. ?. ???? tr(A?(i i? ?B? ?G,G G,G H)) tr(A (iG,Gi? ?G,G ?BH)) The second to the last equality of above follows because, for i, j = 1, ..., G and h = 1, ..., H: [0, ..., i?? ? b?i,G .1,h, ?..?., i? ? ?i,G ? b.n,?h, ..., 0]vec(A ) jth 1? n2G block =vec(i i? ? ?i,G j,G ?Bh) vec(A ) =tr(A?(i ?j,Gii,G ?B?h)) where the last equality follows from, e.g., Proposition 88 in Dhrymes and Guerard (1978).3 Finally, note that tr(A?(ij,Gi?i,G ?B?h)) = tr((i ?i,Gij,G ?Bh)A) = tr(AjiBh), and thus K ? [ I ? (I ? vec(I )? ] [ ] H[G?G HG G n )(K ? n?G ? In) (IG ?B)? I ?nG vec(A ) = tr(A11B1), ..., tr(A11BH), ..., tr(AG1B1), ..., tr(AG1BH), ]? ..., tr(A1GB1), ..., tr(A1GBH), ..., tr(AGGB1), ..., tr(AGGBH) = vecn(A,B) as claimed. A.2.2 Proof of Proposition 1 Let x denote a n? 1 vector with elements being functions of the r-element vector ?. Let A be n? n matrix and elements in A are independent of ?. Then for B = x?Ax, ?B ? ?x= x (A? + A) . ?? ?? 3The Proposition 88 in Dhrymes and Guerard (1978) states that: Let A, B be suitably dimensioned matrices. Then tr(AB) = vec(A?)?vec(B) = vec(B?)?vec(A) 140 If A is symmetric, it then follows that ?B ? ?x= 2x A . ?? ?? Alternatively, if the elements of A depend on some vector ? and A is nonsingular, Lemma 2 shows that ?ln|A| = vec(A?1?)? ?vec(A) . ?? ?? Using these results, we first write the score vectors of the log-likelihood function (2.15) as ?lnL(?) = vec(S(?, ?)?1 ? ??vec(S(?, ?))) ? u? ?S(?, ?)y(?)R?(?)(??1(?)? In)R(?) ??g ??g ??g (A.14) ?lnL(?) ?C?(?)x = u?(?)R?(?)(??1(?)? In)R(?) (A.15) ??g ??g ?lnL(?) ?1? ??vec(S(?, ?)) ?S(?, ?)y= vec(S(?, ?) ) ? u?(?)R?(?)(??1(?)? In)R(?) ??g ??g ??g (A.16) ?lnL(?) ?1? ??vec(R(?)) ? ? ?R(?)u(?)= vec(R(?) ) u (?)R?(?)(??1(?)? In) (A.17) ??g ??g ??g ?lnL(?) n ??vec(? ?1(?)) ? 1 ?u ?(?)R?(?)(??1(?)? In)R(?)u(?) = vec(?(?)) (A.18) ?? 2 ?? 2 ?? We start off by simplifying the first terms in the score vectors. Recall that S(?, ?) = InG ? (B? ? In) ? (?? ? In)(IG ? W ) and R(?) = InG ? (P ? ? In)(IG ?M). Apply Proposition 86 of Dhrymes and Guerard (1978) 4, we can obtain 4The Proposition 86 of Dhrymes and Guerard (1978) states: Let A, B be n ? m, m ? q respectively. Then vec(AB) = (B? ? In)vec(A) = (Iq ?A)vec(B). 141 [ ] vec[(?? ? In)(IG ?W )] = [(IG ?W ?)? I ?nG] vec(? ? In) (A.19) vec[(P ? ? I ? ?n)(IG ?M)] = (IG ?M )? InG vec(P ? In) (A.20) Also, by the definition of selection matrices, ?vec(B) ?vec(?) ?vec(P ) = ig,G ? L?,g, = ig,G ? L?,g, and = ig,G ? L?,g. (A.21) ??g ??g ??g In light of (A.19), (A.20), (A.21) and apply Lemma 3, we can obtain the following ?vec(S(?, ?)) ?vec(B? ? In) ?vec(B) = ? ??g [ ?vec(B) ??g ] = ? IG ? (Kn?G ? In)(IG ? vec(In)) KG?G(ig,G ? L?,g) (A.22) ?vec(S(?, ?)) ???vec[(? ? In)(IG ?W )] ?vec(?)= ??g [ ?vec(?) ] ??g ? ? ? ? ?vec(? ? ? In) ?vec(?) = (IG W ) InG [ ] [ ?vec(?) ??g ] = ? (IG ?W ?)? InG IPG ? (Kn?G ? In)(IG ? vec(In)) KPG?G(ig,G ? L?,g) (A.23) ?vec(R(?)) ?vec[(P ?? ? In)(IG ?M)] ?vec(P )= ??g [ ?vec(P ) ] ??g ? ? ? ? ?vec(P ? ? In) ?vec(P ) = (IG M ) InG [ ] [ ?vec(P ) ??g ] = ? (I ?G ?M )? InG IQG ? (Kn?G ? In)(IG ? vec(In)) KQG?G(ig,G ? L?,g) (A.24) where KG?G, KPG?G and KQG?G are the commutation matrices on B, ? and P , respectively. Now, to further simplify the above expressions, we[take the followin]g steps: First, we apply Lemma 5 taking A = S?1 and B = In and, noting that (IG ? In)? InG = In2G2 , to obtain 142 ? [ ] v[ec(S(?, ?) ?1 )? IG ? (Kn?G ? In)(IG ? vec(In)) KG?G(ig,G ? L?,g)] = tr(S(?, ?)11), ..., tr(S(?, ?)G1), ..., tr(S(?, ?)1G), ..., tr(S(?, ?)GG) (ig,G ? L?,g) =vec (S?1n (?, ?)) ?(ig,G ? L?,g) =vecn(S(?, ?) .g)?L?,g, (A.25) where S(?, ?)ij denotes the ijth n? n block of S?1(?, ?). Next, take A = S?1 and B = W , it then follows from Lemma 5 that ? [ vec(S(?, ?)?1 )? (I ?W ? ] [ ] [ G )? InG IPG ? (Kn?G ? In)(IG ? vec(In)) KPG?G(ig,G ? L?,g) = tr(S(?, ?)11W1), ..., tr(S(?, ?) 11WP ), ..., tr(S(?, ?) G1W1), ..., tr(S(?, ?) G1WP ), ] ..., tr(S(?, ?)1GW1), ..., tr(S(?, ?) 1GWP ), ..., tr(S(?, ?) GGW1), ..., tr(S(?, ?) GGWP ) (ig,G ? L?,g) =vec (S?1n (?, ?),W ) ?(ig,G ? L?,g) =vecn(S(?, ?) .g,W )?L?,g. (A.26) Next, let A = R?1 and B = M ,it then follows from Lemma 5 that ? [ v?ec(R(?)?1 )? (I ?M ? ] [ ] G )? InG IQG ? (Kn?G ? In)(IG ? vec(In)) KQG?G(ig,G ? L?,g) ? ? =?? ?t?r(R?1 ?11 (?)M1), ..., tr??(R1 (?)MQ), 0, ..., 0?, ..., 0?, ..., 0, tr(R?1 ?1G (?)M?1?), ..., tr(RG (?)MQ?)?? (ig,G ? L?,g) 1st 1?QG block Gth 1?QG block =vecn(R ?1(?),M)?(ig,G ? L?,g) =vec ?1n(Rg (?g),M) ?L?,g. (A.27) Now equations (A.25) - (A.27) represents the simplified first terms of the scores w.r.t. ?g, ?g and 143 ?g. We now simplifying the second terms in the scores. First, we note that ?C?(?)x ?(IG ?X)vec(C) ?vec(C) = = (IG ?X)(ig,G ? L?,g) = ig,G ?Xg (A.28) ??g ?vec(C) ??g Next, recalling that vec(Y ) = (IG?W )vec(Y ) and vec(U) = (IG?M)vec(U) and applying again Proposition 86 of Dhrymes and Guerard (1978), we have (B? ? In)vec(Y ) = (IG ? Y )vec(B) (A.29) (?? ? In)(IG ?W )vec(Y ) = (?? ? In)vec(Y ) = vec(Y ?) = (IG ? Y )vec(?), (A.30) (P ? ? In)(I ?G ?M)vec(U) = (P ? In)vec(U) = vec(UP ) = (IG ? U)vec(P ). Consequently ?S(?, ?)y ?[InG ?B? ?n]y ?(B ? In)vec(Y )= = = ?(IG ? Y ) (A.31) ?vec(B) ?vec(B) ?vec(B) ?S(?, ?)y ?[I ? ?nG ?Bn]y ??[(?n ? In)(IG ?Wn)]vec(Y )= = = ?(IG ? Y ), (A.32) ?vec(?) ?vec(?) ?vec(?) ?R(?)u ?[I ? P ?nG n ]u ??[(P ? n ? In)(IG ?Mn)]vec(U)= = = ?(IG ? U) ?vec(P ) ?vec(P ) ?vec(P ) 144 and thus ?S(?, ?)y ??S(?, ?)y ?vec(B)= = ?(IG ? Y )(ig,G ? L?,g) = ig,G ? Yg, (A.33) ??g ?vec(B) ??g ?S(?, ?)y ??S(?, ?)y ?vec(?)= = ?(IG ? Y )(ig,G ? L?,g) = ig,G ? Y g, (A.34) ??g ?vec(?) ??g ?R(?)u ? ?R(?)u ?vec(P )= = ?(IG ? U)(ig,G ? L?,g) = ig,G ? U g. (A.35) ??g ?vec(P ) ??g Substituting (A.28) - (A.35) into the second terms of the scores and observe that (??1(?)? In)R(?)(ig,G ?X ) = (?.gg ? In)Rg(?g)Xg, (??1(?)? In)R(?)(i .gg,G ? Yg) = (? ? In)Rg(?g)Yg, (??1(?)? In)R(?)(ig,G ? Y g) = (?.g ? In)Rg(?g)Y g, (??1(?)? In)(ig,G ? U g) = (?.g ? In)U g, the second terms of the scores can then be written as u?(?)R?(?)(?.g ? In)Rg(?g)Xg, (A.36) u?(?)R?(?)(?.g ? In)Rg(?g)Yg, (A.37) u?(?)R?(?)(?.g ? In)Rg(?g)Y g, (A.38) u?(?)R?(?)(?.g ? In)U g. (A.39) 145 Finally, note that ?vec(? ?1(?)) = L? by definition and recall Lemma 4 shows that?? ?u?(?)R?(?)(??1(?)? In)R(?)u(?) ( )? = vec E(?)?E(?) L?. ?? Substituting these results into (A.18), we obtain obtain ?lnL(?) 1 [ ? ( ? ) ]?= nvec(?(?)) ? vec E(?) E(?) L?. (A.40) ?? 2 We combine above results (A.25), (A.26), (A.27) for simplifying first terms of the scores, (A.36) - (A.39) for simplifying second terms of the scores, as well as (A.40). We can then re-write scores in (A.14) - (A.18) as ?lnL(?) =? vecn(S(?, ?).g)?L ? ? .g?,g + u (?)R (?)(? ? In)Rg(?g)Yg, ??g ?lnL(?) =u?(?)R?(?)(?.g ? In)Rg(?g)Xg, ??g ?lnL(?) =? vec (S(?, ?).g,W )?L + u?n ?,g (?)R?(?)(?.g ? In)Rg(?g)Y g, ??g ?lnL(?) =? vec (R?1n g (?g),M)?L ??,g + u (?)R?(?)(?.g ? In)U g,??g ?lnL(?) 1 [ ( ) ]? = nvec(?(?))? ? vec E(?)?E(?) L?. ?? 2 146 Recall the notation stacked blocked matrices defined in Chapter A.2.1, we have vec (S(?, ?).g)?L = [tr(S1gn ?,g n (?, ?)), . . . , tr(S Gg n (?, ?))]L?,g, vecn(S(?, ?) .g,W )?L 1g?,g = [tr(Sn (?, ?)W1,n), . . . , tr(S 1g n (?, ?)WP,n), . . . , tr(SGgn (?, ?)W1,n), . . . , tr(S Gg n (?, ?)WP,n)]L?,g, vec (R?1n g (?g),M) ?L?,g = [tr(R ?1 ?1 g,n(?g)M1,n, . . . , tr(Rg,n(?g)MQ,n)]L?,g. Taking transposes of above expressions of scores, it then follows ?lnLn(?, ?) =Y ? R (? )?(?g.? g,n g,n g (?)? In)Rn(?)un(?)? L ? ?,gvecn(S(?, ?) .g), ??g ?lnLn(?, ?) ? ? g. ? =Xg,nRg,n(?g) (? (?)? In)Rn(?)un(?),??g ?lnLn(?, ?) ? ? =Y g,nRg,n(? ? g. g) (? (?)? In)Rn(?)u (?)? L?n ?,gvecn(S(?, ?).g,W ),??g ?lnLn(?, ?) ? =U g,n(?g) ?(?g.(?)? In)Rn(?)un(?)? L? ?1 ?? ?,g vecn(Rg (?g),M), g ?lnLn(?, ?) 1 ? [ ] ? = L? nvec(?(?))? vec(En(?) ?En(?)) , ?? 2 The statement of the proposition now follows. A.2.3 Proof of Proposition 2 Under Assumption 4 and note that ml 1 e ? g.g(?0) = Z?g(?0) (?0 ? In)? is linear in ?. Thusn Emlg(?0) = 0 as the proposition claimed. 147 A.2.4 Proof of Proposition 3 For the quadratic moments, note that at ?0, we have ?g.(?0) = EE ?g,0E0 =?[?g1,0, ..., ?gG,0]. Next, observe that v = S?1(?0, ? )u = S?10 (?0, ? )R?10 (?0)? and thus vl = G lr ?1 r=1 S Rr ?r. Consequently, for the l-th vector in V ? ? g.g (?)Rg(?g)(? (?) ? In)R(?)u(?) at the true parameter values we have ?G ?G Ev?R? (?g. ? I )Ru = E ??R??1 lr? ? gsl g 0 n r r S Rg ?0 ?s (A.41) r?=1 ? s=1G G = tr[R??1Slr? ?r Rg? gs 0 E?s? ? r] (A.42) ?r=?1 s=1G G = tr[R??1Slr? ?r Rg? gs 0 ?sr,0] (A.43) ?r=1 s=1G ?G = tr[R??1Slr?R? ] ?gsr g 0 ?sr,0 = tr[R ??1 lg? ? lg g S Rg] = tr(S ) r=1 s=1 ? ? observing that G ?gs G gss=1 0 ?sr,0 = 0 for r ?= g and s=1 ?0 ?sr,0 = 1 for r = g. This proves that EV ?g (?0)R ? g. g(?0,g)(?0 ? In)R(?0)u(?0) = ??,g(?0, ?0) ? Next, since v = (IG ? W )v and thus v = W G lr ?1l,p p r=1 S Rr ?r for l = 1, , , .G and p = 148 1, ..., P . In light of the proof above, we the have ?G ?G Ev? R? (?g. ? I )Ru =E ??R??1Slr? ?l,p g 0 n r r WpR? ? gs g 0 ?s ?r=?1 s=1G G = tr[R??1Slr?r W ?R? ?gsp g 0 E? ? s?r] ?r=1 s=1G ?G = tr[R??1Slr?W ?R? gsr p g?0 ?sr,0] ?r=1 s=1G ?G = tr[R??1r S lr?W ? ? gs ??1 lg? ? lgpRg] ?0 ?sr,0 = tr[Rg S Rg] = tr(S Wp). r=1 s=1 ? This shows that EV ? g.g(?0)Rg(?0,g)(?0 ? In)R(?0)u(?0) = ??,g(?0, ?0) For the moment conditions w.r.t. ?g, recall that U g = [M1ug, ...,MQug] and hence Eu? M ?(?g.g q 0 ? In)Ru = ?? R?1?M ? g. g g q(?0 ? In)??G = E?? R?1?M ? ?gsg g q 0 ?s ?s=1G = tr(R?1?M ? ?gsE? ??g q 0 s g) ?s=1G = tr(R?1?M ? gsg q ?0 ?gs,0) s=1 = tr(R?1g Mq) ? ? where the last equality follows because G ?gss=1 0 ?gs,0 = 1. This proves that EU g(? g. 0 ? In)Ru = ??,g(?g,0). Putting above results together, the claim of the proposition follows. 149 A.2.5 Proof of Lemma 1 Recall from equation (2.27) in that the set of quadratic moments ?? ??? V ? ? g.? g,n(?) Rg,n(?g) (?n (?)? In)?n(?)? ??,g,n(?, ?)1 ?? ? ?? mq ? g,n(?) = n ?? V g,n(?) ?Rg,n(?g) ?(?g.n (?)? In)?n(?)? ? ? ,?,g,n(?, ?) ??? U g,n(?g) ?(?g.n (?)? In)?n(?)? ??,g,n(?g) for g = 1, . . . , G based on the score of the (log)-likelihood function, and ??,g,n(?, ?) = L ? ?,g[tr(S 1g n (?, ?)), . . . , tr(S Gg n (?, ?))] ?, (A.44) ??,g,n(?, ?) = L ? ?,g[tr(S 1g n (?, ?)W1,n), . . . , tr(S 1g n (?, ?)WP,n), (A.45) . . . , tr(SGgn (?, ?)W1,n), . . . , tr(S Gg n (?, ?)W ? P,n)] (A.46) ? ? ?1?,g,n(?g) = L?,g[tr(Rg,n(?g)M1,n, . . . , tr(R ?1 g,n(?g)MQ,n)] ?. Let vh,n(?) be the h-th n? 1 block in Vn(?) and assume vh,n(?) is one column in Vg,n(?). In light of mqg,n(?) and the expression of Vg,n(?) deduced in Section 2.3.1, we see that ?G vh,n(?) = S hl n (?, ?)R ?1 l,n(?l)?l,n(?) l=1 The (full information) quadratic moments associated with the score w.r.t., e.g., bhg in the g-th equ?ation, is thus given by?? ?G1 ? (?)?R?1 ? hl 1l,n l,n(?l) Sn (?, ?)??Rg,n(? ?g) (?g. ? In)?n(?)? tr(Shgn (?, ?)), (A.47)n n l=1 150 Recall that Sh.n (?, ?) denotes the h-th n ? nG block of S?1n (?, ?) and ig,G denotes the g-th column of the identity matrix of dimension G. We further note that (A.47) is equivalent to 1 ? (?)? ( ) R?1 1 n n (?) ? i? h.g,G ? Sn (?, ?)? R?n(?)(??1 ? In)?n(?)? tr(Shgn (?, ?)). (A.48)n n To see this, note that i? h.g,G ? Sn (?, ?)? is in fact ?? ??? 0 . . . Sh1n (?, ?)? . . . 0 ??? i? ? Sh. ? ? .. .. . ?g,G n (?, ?) =???? . . .. ???? , 0 . . . ShGn (?, ?) ? . . . 0 which is a nG? nG matrix of zeros except for the g-th nG? n (column) block. Thus 1 ? (?)?R??1(?)? ( ) i? ? Sh.n n g,G n (?, ?)? R?n(?)(??1 ? In)?n(?)n ? ???? 0 . . . R ?1(? ? h11,n 1) Sn (?, ?) ?Rg,n(? ) ? g . . . 0 ? ? ? 1 ?? = ? (?)? ?? .. .. . ?1n . . .. ??? (? ? In)?? ? n (?) n 0 . . . R?1 ? hGG,n(?G) Sn (?, ?) ?R (? )? . . . 0 ? g,n g ? ?? ?? ? (?1.? ? In)?n(?) ?1 ?? ? ? ? G ?? ?? ? ?? ?= 0 . . . , ? ?1 ? hl ? ? .? ?l,n(?) R .n l,n(?l)??Sn (?, ?) Rg,n(?g)?, . . . , 0??????? . ???l=1 ? ? g-th 1? n bloc?k (?G. ? In)?n(?)??G1= ? (?)?R?1 ? hl ?? ? g.l,n n l,n (?l) Sn (?, ?) Rg,n(?g) (? ? In)?n(?) l=1 as desired. With entirely analogous steps, one can write an element in V ?g,n(?) R (? )?(?g.g,n g n (?)? 151 In)?n(?)? ??,g,n(?, ?) that is associated with the score w.r.t., e.g., ?hg,p, as 1 ? ? ( )1 1?n(?) Rn (?)? i?g,G ? Sh.n (?, ?)?W ?p,n Rn(?)?(??1 ? In)? hgn(?)? tr(Sn (?, ?)Wp,n), (A.49)n n and the element in U ?g,n(?g) (?g.n (?)? In)?n(?)???,g,n(?g) that is associated with the score w.r.t., e.g., ?g,q, as 1 ? (?)?(i i? ?R?1 1n g,G g,G g,n(?g)?M ?q,n)(??1 ? In)? (?)? tr(R?1n g,n(?g)Mq,n). (A.50)n n Upon replacing ? with ? and taking transposes of the expressions (A.48),(A.49) and (A.50), the claim of Lemma 1 now follows. 152 A.3 Explicit Expressions of VCV matrices A.3.1 Explicit Expression of ??qgg,n(g) Recall that the block corresponding to the quadratic moments is of the form ?? ? ? ? ???? ?? ??? ?gg,n(g) ??gg,n(g) ??gg,n(g)? ???qgg,n(g) = ??? ? ?? ???? ?? ?? ? ,gg,n(g) ??gg,n(g) ??gg,n(g) ? ??? ???? (g) ???? ? gg,n gg,n(g) ?? ?? gg,n(g) as well as the general forms of the individual elements in a sub-matrix of ??qgg,n(g), say ?? ?? gg,n(g), that presented in Section 2.5.1. For easier presentation, we define the following matrices [ ] A??,l,A rg,Ar,g = MATd [Rg,n(??g,n)Sn (??n, ?? )R A n g,n(??g,n) , ] A??,l,Ar,p,g = MATd [Rg,n(?? rg,A A g,n)Wp,nS]n (??n, ??n)Rg,n(??g,n) , A??,l,Aq,g = MAT M A d q,nRg,n(??g,n) . The superscript l is used to highlight these matrices are associated with the limited information moments, in contrast to the full information matrices, e.g., A??,f,Ar,g defined in the next section. Recall that ?g, ?g and ?g are mg,? ? 1, mg,? ? 1 and mg,? ? 1 vector of parameters. The 153 mg,? ?mg,? block ????gg,n(g) is ?? ( ) ( ) ??? tr A??,l,A ? ? ? 1,g (A? ?,l,A ?,l,A ?,l,A ?,l,A ?,l,A 1,g + A?1,g ) . . . tr A?1,g (A?G,g + A?G,g ) ?? ??2 ? ?? gg,n ? .. . ???gg,n(g) = Ln ?,g ???? ( . ) . . . ( .. ) ???L?,g. tr A??,l,A ? ? ? G,g (A? ?,l,A ?,l,A 1,g + A?1,g ) . . . tr A? ?,l,A(A??,l,A + A??,l,AG,g G,g G,g ) The mg,? ?mg,? block ????gg,n(g) is ? ( ) ( ) ? ??? tr A??,l,A ? ? ? 1,g (A? ?,l,A ?,l,A 1,1,g + A?1,1,g ) . . . tr A? ?,l,A ?,l,A ?,l,A 1,g (A?G,P,g + A?G,P,g) ?? ??2 ? ???? gg,n (g) = L? ??? ( ... ) . .gg,n ?,g ? . ( . .. ) ????? L?,g, n ? ? tr A??,l,A(A??,l,A + A??,l,A ) . . . tr A??,l,A(A??,l,A ?,l,AG,g 1,1,g 1,1,g G,g G,P,g + A?G,P,g) and in accordance to the order of quadratic moments mq,Ag,n,R(?, g) in defining the LQ-GSLIVE, the first row in ????gg,n(g) (ignoring selecting matrix) is [ ( ) ( ) tr A??,l,A(A??,l,A ?,l,A ? ?,l,A ?,l,A ?,l,A? 1,g( 1,1,g + A?1,1,g ) , . . . , tr A?1,g (A?) ( 1,P,g + A?1,P,g ) , )] . . . , tr A??,l,A(A??,l,A ?,l,A ? ?,l,A ?,l,A ?,l,A? 1,g G,1,g + A?G,1,g ) , . . . , tr A?1,g (A?G,P,g + A?G,P,g) . The mg,? ?mg,? block ????gg,n(g) is ?? ( ) ( ) ??? tr A??,l,A(A??,l,A + A??,l,A ? ) . . . tr A??,l,A ? ? 1,g 1,g 1,g 1,g (A? ?,l,A ?,l,A Q,g + A?Q,g ) ?????2 ?? gg,n ???gg,n(g) = L? .. . .?,g ???? ( . ) . ( ... ) ???L?,g.n tr A??,l,A(A??,l,A ? + A??,l,A ) . . . tr A??,l,A(A??,l,A ? ? G,g 1,g 1,g G,g Q,g + A? ?,l,A Q,g ) 154 The m ?m block ????g,? g,? gg,n(g) is ?? ( ) ( ) ??? ?,l,A ?,l,A ?,l,A? ?,l,A ?,l,A ?,l,A?? tr A?1,1,g (A? ?1,1,g + A?1,1,g ) . . . tr A?1,1,g (A?G,P,g + A?G,P,g)??? . . . ???? ?? ? ( .. ) . . ( .. )??2gg,n ? ???? (g) = L? ??? tr A??,l,A(A??,l,A + A??,l,A?) . . . tr A??,l,A ?gg,n ?,g ? 1,P,g 1,1,g 1,1,g 1,P,g(A? ?,l,A ?,l,A ?L , n ? G,P,g + A?G,P,g) ? ?? . ? ? ?,g ? ( . ) . . . ( .. ?? . . ) ??? tr A??,l,A ? (A??,l,A + A??,l,A ) . . . tr A??,l,A (A??,l,A ?,l,A ? ? G,P,g 1,1,g 1,1,g G,P,g G,P,g + A?G,P,g) and in accordance to the order of quadratic moments mq,Ag,n,R(?, g) in defining the LQ-GSLIVE, the first row in ????gg,n(g) (ignoring selecting matrix) is [ ( ) ( ) tr A??,l,A ? ? 1,1,g (A? ?,l,A + A??,l,A ) , . . . , tr A??,l,A(A??,l,A + A??,l,A ( 1,1,g 1,1,g ) 1,1,(g 1,P,g 1,P,g ) , )] ? . . . , tr A??,l,A(A??,l,A + A??,l,A ) , . . . , tr A??,l,A ?,l,A ?,l,A ? 1,1,g G,1,g G,1,g 1,1,g (A?G,P,g + A?G,P,g) . The mg,? ?mg,? block ????gg.n(g) is ?? ( ) ( ) ??? tr A??,l,A(A??,l,A ?,l,A? ?,l,A ?,l,A ?,l,A?? ?1,1,g 1,g + A?1,g ) . . . tr A?1,1,g (A?Q,g + A?Q,g )?? ? ? ?. ? 2 ??? ( .. ) . . . ( ... ) ??? ?? ?? gg,n?? (g) = L? ? ? ?tr A??,l,A(A??,l,A + A??,l,A ) . . . tr A??,l,A(A??,l,A ?,l,A?? ?gg,n n ?,g L ,? 1,P,g 1,g 1,g 1,P,g Q,g + A?Q,g ) ??? ?,g ??? ( .. ) . . . ( .. ?? . . ) ??? tr A??,l,A (A??,l,A + A??,l,A ? ?,l,A ?,l,A ?,l,A? ? G,P,g 1,g 1,g ) . . . tr A?G,P,g(A?Q,g + A?Q,g ) 155 and lastly the m ??g,? ?mg,? block ??gg.n(g) is ?? ( ) ( ) ?? ?,l,A ?,l,A ?,l,A? ??? tr A?1,g (A?1,g + A?1,g ) . . . tr A? ?,l,A 1,g (A? ?,l,A Q,g + A? ?,l,A Q,g ) ?? ??2 ? ?? gg,n?? (g) = L? ??? ( ... ) . . . ( ..gg,n n ?,g ? . ) ?? ??L?,g. ?,l,A ?,l,A ?,l,A?tr A? (A? + A? ) . . . tr A??,l,A(A??,l,A ?,l,A ? ? Q,g 1,g 1,g Q,g Q,g + A?Q,g ) A.3.2 Explicit Expression of ??qn For easier expressions, we define [ ] A??,f,A ? r. Ar,g =MATD [Rn(??n)(ig,G ? Sn (??n, ??n))Rn (??n) ] A??,f,Ar,p,g =MATD [Rn(?? )(i ? n g,G ?Wp,nSr.n (??n], ??n))R A n (??n) A??,f,Aq,g =MAT i i ? A D g,G g,G ?Mq,nRg,n(??g,n) . The superscript f is used to highlight these matrices are associated with the full information moments, in contrast to the limited information matrices, e.g., A??,l,Ar,g defined in the previous section. Recall that the block corresponding to the quadratic moments is of the form ?? ???? ?? ?? ???? ?? ?n n ??n ?? ??q ? n = ???? ?? ?? ?? ?? n ??n ??n ???? ???? ?? ??n ??n ??n with each sub-matrix consists of G?G sub-blocks, as well as the general forms of the individual 156 elements in each sub-block, say ????gh,n, that presented in Section 2.5.2. 157 158 Explicitly, the gh-th block of ????n (of size mg,? ?mh,?) is ?? ( ) ( ) ???? tr A??,f,A(A??,f,A + (?? ? I )A??,f,A (???1 ? I )) . . . tr A??,f,A(A??,f,A + (?? ? I )A??,f,A ? ?1 ? 1,g 1,h n n 1,h n n 1,g G,h n n G,h (??n ? In)) ?? ????? 1= L? ? ?? ( .. . . .. ?gh,n n ?,g ? . ) . ( . ) ??? L?,h. tr A??,f,A(A??,f,A + (?? ? I )A??,f,A ? (???1 ? I )) . . . tr A??,f,A(A??,f,A ? ? G,g 1,h n n 1,h n n G,g G,h + (??n ? I )A? ?,f,A n G,h (?? ?1 n ? In)) The gh-th block of ????n (of size mg,? ?mh,?) is ?? ( ) ( ) ?? ? ??? tr A? ?,f,A ?,f,A ?,f,A ?1 ?,f,A ?,f,A 1,g (A?1,1,h + (??n ? In)A?1,1,h (??n ? In)) . . . tr A?1,g (A?G,P,h + (??n ? In)A? ?,f,A ?1 G,P,h (??n ? In)) ??? ???? 1 = L? ??? ( .. . . .. ?gh,n ?,g ? . ) .n ( . ) ???? L?,h, ? ? tr A??,f,A ?,f,A ?,f,A ?1 ?,f,AG,g (A?1,1,h + (??n ? In)A?1,1,h (??n ? In)) . . . tr A?G,g (A? ?,f,A G,P,h + (?? ? I )A? ?,f,A ?1 n n G,P,h (??n ? In)) where, to clarify the ordering, the first row in the bracket is [ ( ) ( ) tr A??,f,A ? ? 1,g (A? ?,f,A 1,1,h + (??n ? In)A? ?,f,A 1,1,h (?? ?1 ? I )) , . . . , tr A??,f,An n 1,g (A? ?,f,A + (?? ? I )A??,f,A (???1 ( ) ( 1,P,h n n 1,P,h n ? In)) , )] . . . , tr A??,f,A(A??,f,A + (?? ? I )A??,f,A ? (???1 ? I )) , . . . , tr A??,f,A(A??,f,A ? 1,g G,1,h n n G,1,h n n 1,g G,P,h + (??n ? I )A? ?,f,A (???1n G,P,h n ? In)) . 159 The gh-th block of ????n (of size mg,? ?mh,?) is ?? ( ) ( ) ?? ? ?? tr A??,f,A? 1,g (A? ?,f,A 1,h + (??n ? In)A? ?,f,A ?1 ?,f,A ?,f,A ?,f,A ?1 1,h (??n ? In)) . . . tr A?1,g (A?Q,h + (??n ? In)A?Q,h (??n ? In)) ??? ???? 1 = L? ??? ( ... ) . . .. ?gh,n ?,g ? . ( . ) ??? L?,h. n tr A??,f,A(A??,f,A ?,f,A ? ? ?,f,A ?,f,A ?,f,A?1 ? ?1 G,g 1,h + (??n ? In)A?1,h (??n ? In)) . . . tr A?G,g (A?Q,h + (??n ? In)A?Q,h (??n ? In)) The gh-th block of ????n (of size mg,? ?mh,?) is ?? ( ) ( ) ??? ?,f,A ? ?? tr A?1,1,g (A? ?,f,A + (?? ? I )A??,f,A (???1 ? I )) . . . tr A??,f,A(A??,f,A + (?? ? I )A??,f,A (???1n n n ?1,1,h 1,1,h n 1,1,g G,P,h n n G,P,h n ? In)) ?? ??? ?? ( . . . . ) . . .?? ( .. ) ???? ???? 1 = L? ? tr A??,f,A ?,f,A ?,f,A? ?1 ?,f,A ?,f,A ?,f,A?? ?1 ?gh,n n ?,g L ,? 1,P,g (A?1,1,h + (??n ? In)A?1,1,h (??n ? In)) . . . tr A?1,P,g (A?G,P,h + (??n ? In)A?G,P,h (??n ? In)) ? ?,h ?? ? ? ( . ? ..? ) . . . ( ... ) ???? tr A??,f,A(A??,f,A + (?? ? I )A??,f,A ? (???1 ? I )) . . . tr A??,f,A(A??,f,A ? + (?? ? I )A??,f,A (???1G,P,g 1,1,h n n 1,1,h n n G,P,g G,P,h n n G,P,h n ? ? In)) 160 where, to clarify the ordering, the first row in the bracket is [ ( ) ( ) ?,f,A ?,f,A ?tr A? (A? + (?? ? I )A??,f,A (???1 ? I )) , . . . , tr A??,f,A ?,f,A ?,f,A ? ?1 1,1(,g 1,1,h n n 1,1,h n n 1,1,g (A?1,P,h + (??) ( n ? In)A?1,P,h (??n ? In)) , )] . . . , tr A??,f,A(A??,f,A + (?? ? I )A??,f,A ? (???1 ? I )) , . . . , tr A??,f,A(A??,f,A ?,f,A ? ?1 1,1,g G,1,h n n G,1,h n n 1,1,g G,P,h + (??n ? In)A?G,P,h (??n ? In)) . The gh-th?block(of ????n (of size mg,? ?mh,?) is? ) ( ) ??? tr A??,f,A(A??,f,A + (?? ? I )A??,f,A?(???1 ? I )) . . . tr A??,f,A ?,f,A ?? 1,1,g 1,h n n 1,h n n 1,1,g (A?Q,h + (??n ? I )A? ?,f,A (???1n Q,h n ? In)) ??? ??? ( ... ) . . . ( . ? ? ..? ) ???? ???? 1 = L? ?? tr A??,f,A(A??,f,A + (?? ? I )A??,f,A?(???1 ? I )) . . . tr A??,f,A(A??,f,A + (?? ? I )A??,f,A? ?1 ?gh,n n ?,g 1,P,g 1,h n n 1,h n n 1,P,g Q,h n n L .Q,h (??n ? In)) ?? ?,h ??? ??? ( .. ) . . . ( .. ) ? ? . . ??? ?,f,A ?,f,A ?,f,A? ? ?tr A?G,P,g(A?1,h + (??n ? In)A?1,h (???1n ? In)) . . . tr A??,f,A(A??,f,A ?,f,A ?1G,P,g Q,h + (??n ? In)A?Q,h (??n ? In)) Lastly, the?gh-th(block of ????n (of size mg,? ?mh,?) is ) ( ) ? ?? ? ??? tr A? ?,f,A ?,f,A ?,f,A 1,g (A?1,h + (??n ? In)A?1,h (???1n ? In)) . . . tr A? ?,f,A 1,g (A? ?,f,A Q,h + (?? ?,f,A ?1 n ? In)A?Q,h (??n ? In)) ? ? ?? ?? 1?? = L? ?? ( .. . .gh,n .n ?,g ? . ) ( .. ?. ) ???L?,h. tr A??,f,A(A??,f,A + (?? ? I )A??,f,A ? ? ? ?1 Q,g 1,h n n 1,h (??n ? In)) . . . tr A? ?,f,A ?,f,A ?,f,A ?1 Q,g (A?Q,h + (??n ? In)A?Q,h (??n ? In)) A.4 Additional Monte Carlo Results In this section, we report additional Monte Carlo results utilize the ?dumbbell-shaped? weights matrix considered in Section 2.7.5. A.4.1 Results with Alternative Wn?s For the robustness check, we experiment with the same model specification as those in Section 2.7 for each scenario, but with a subset of the parameter constellations described in the main text. Specifically, for Scenario I, we experiment with Eqn 1 b21 = 0.15 ?11,1 = 0.3 ?11,2 = 0.2 ?11 = 0.2 ?12 = 0.1 Eqn 2 b12 = 0.3 ?22,1 = 0.3 ?22,2 = 0.15 ?21 = 0.1 ?22 = 0 We consider both the cases with c = c = [1, 1, 1]?.1 .2 and c.1 = c.2 = [0.4, 0.4, 0.4] ? in this scenario. Scenario II We experiment with Eqn 1 b21 = 0.15 ?11,1 = 0.3 ?21,1 depends on ?11,1, b21, ? Eqn 2 b12 = 0.3 ?22,1 = 0.3 ?22,2 = 0.15 For the parameters on the exogenous variables, we let ? Equation 1: c11, c21, c31 = 1; c41, c51, c61 depend on c11,c21, c31, respectively, as well as ?11,1 and ?, ? Equation 2: c72, c82, c92 = 1, 161 and we set the deviation parameter ? = 1 and 0.4 (corresponds to the strong identification and the weak identification cases, respectively). 162 163 Scenario I Table A.1: Median and RMSE of Scenario I, homoskedasticity, alternative Wn?s Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.151 0.023 0.164 0.028 0.154 0.025 0.152 0.024 0.152 0.024 0.151 0.024 0.151 0.024 0.154 0.024 0.154 0.025 ?11,1 0.300 0.303 0.031 0.307 0.036 0.311 0.035 0.301 0.034 0.303 0.032 0.311 0.036 0.309 0.032 0.301 0.036 0.301 0.032 ?11,2 0.200 0.196 0.032 0.178 0.044 0.188 0.037 0.194 0.036 0.196 0.033 0.186 0.040 0.191 0.037 0.193 0.036 0.193 0.035 ?11 0.200 0.191 0.062 0.182 0.072 0.185 0.072 0.191 0.066 0.194 0.069 0.183 0.071 0.189 0.071 0.196 0.069 0.196 0.062 ?12 0.100 0.095 0.070 0.111 0.078 0.105 0.078 0.092 0.075 0.092 0.078 0.100 0.077 0.100 0.077 0.098 0.080 0.105 0.068 b12 0.300 0.301 0.022 0.313 0.025 0.302 0.023 0.303 0.022 0.302 0.021 0.302 0.023 0.301 0.021 0.304 0.022 0.303 0.023 ?22,1 0.300 0.300 0.029 0.306 0.037 0.308 0.033 0.299 0.034 0.298 0.030 0.304 0.037 0.305 0.033 0.299 0.035 0.298 0.030 ?22,2 0.150 0.151 0.030 0.132 0.041 0.140 0.036 0.149 0.036 0.148 0.032 0.143 0.039 0.144 0.033 0.149 0.036 0.148 0.031 ?21 0.100 0.097 0.064 0.088 0.069 0.090 0.071 0.096 0.069 0.103 0.069 0.088 0.070 0.096 0.068 0.098 0.072 0.102 0.069 ?22 0.000 -0.014 0.075 0.002 0.085 -0.006 0.087 -0.017 0.090 -0.017 0.089 -0.007 0.089 -0.009 0.090 -0.007 0.092 -0.010 0.078 Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.152 0.060 0.223 0.090 0.181 0.069 0.155 0.060 0.158 0.061 0.161 0.058 0.158 0.067 0.169 0.063 0.162 0.063 ?11,1 0.300 0.311 0.080 0.356 0.116 0.367 0.121 0.298 0.086 0.308 0.084 0.355 0.107 0.352 0.101 0.302 0.089 0.306 0.080 ?11,2 0.200 0.187 0.077 0.081 0.154 0.108 0.134 0.189 0.089 0.182 0.087 0.130 0.114 0.145 0.105 0.175 0.092 0.177 0.082 ?11 0.200 0.183 0.096 0.114 0.157 0.108 0.163 0.190 0.116 0.183 0.110 0.133 0.132 0.142 0.126 0.186 0.117 0.188 0.098 ?12 0.100 0.107 0.096 0.225 0.174 0.190 0.152 0.101 0.117 0.108 0.108 0.163 0.131 0.146 0.128 0.116 0.125 0.117 0.099 b12 0.300 0.304 0.050 0.360 0.079 0.318 0.062 0.308 0.057 0.309 0.053 0.310 0.054 0.308 0.056 0.317 0.056 0.308 0.056 ?22,1 0.300 0.297 0.071 0.347 0.110 0.359 0.116 0.291 0.088 0.300 0.078 0.329 0.093 0.328 0.092 0.294 0.092 0.295 0.075 ?22,2 0.150 0.148 0.074 0.050 0.134 0.070 0.125 0.148 0.089 0.137 0.078 0.103 0.104 0.104 0.098 0.135 0.089 0.139 0.072 ?21 0.100 0.107 0.102 0.025 0.144 0.019 0.153 0.106 0.113 0.101 0.112 0.057 0.123 0.065 0.119 0.106 0.119 0.107 0.106 ?22 0.000 -0.017 0.105 0.089 0.154 0.071 0.142 -0.018 0.128 -0.008 0.117 0.028 0.125 0.022 0.121 -0.004 0.126 -0.008 0.107 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. 164 Scenario II Table A.2: Median and RMSE of Scenario II, homoskedasticity, alternative Wn?s Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.148 0.024 0.163 0.027 0.155 0.025 0.150 0.024 0.149 0.024 0.166 0.029 0.156 0.026 0.152 0.024 0.150 0.024 ?11,1 0.300 0.300 0.027 0.324 0.050 0.320 0.045 0.304 0.045 0.301 0.043 0.317 0.036 0.314 0.033 0.305 0.032 0.303 0.030 ?21,1 -0.195 -0.195 0.027 -0.198 0.032 -0.191 0.032 -0.196 0.031 -0.195 0.030 -0.205 0.031 -0.197 0.029 -0.199 0.029 -0.196 0.027 c41 -1.300 -1.302 0.065 -1.312 0.075 -1.314 0.065 -1.300 0.074 -1.300 0.067 -1.303 0.070 -1.308 0.062 -1.298 0.070 -1.303 0.064 c51 -1.300 -1.300 0.070 -1.306 0.078 -1.309 0.074 -1.297 0.080 -1.301 0.075 -1.301 0.081 -1.309 0.070 -1.298 0.082 -1.303 0.070 c61 -1.300 -1.291 0.074 -1.311 0.082 -1.304 0.071 -1.295 0.085 -1.292 0.070 -1.302 0.078 -1.300 0.074 -1.295 0.079 -1.294 0.074 b12 0.300 0.300 0.023 0.312 0.026 0.307 0.024 0.301 0.024 0.301 0.024 0.314 0.027 0.308 0.024 0.303 0.023 0.305 0.024 ?22,1 0.300 0.299 0.027 0.311 0.037 0.307 0.036 0.299 0.038 0.299 0.035 0.309 0.030 0.306 0.029 0.303 0.029 0.301 0.027 ?22,2 0.150 0.149 0.026 0.145 0.035 0.145 0.033 0.149 0.037 0.151 0.032 0.145 0.029 0.146 0.028 0.147 0.028 0.149 0.026 Weak i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.149 0.024 0.165 0.029 0.157 0.026 0.149 0.024 0.149 0.024 0.165 0.029 0.156 0.025 0.151 0.024 0.150 0.024 ?11,1 0.300 0.299 0.041 0.414 0.162 0.390 0.129 0.302 0.110 0.299 0.106 0.323 0.052 0.323 0.051 0.305 0.045 0.302 0.044 ?21,1 -0.105 -0.104 0.029 -0.116 0.032 -0.109 0.030 -0.107 0.028 -0.106 0.031 -0.119 0.032 -0.112 0.031 -0.109 0.030 -0.106 0.029 c41 -0.700 -0.696 0.066 -0.791 0.145 -0.769 0.117 -0.705 0.127 -0.702 0.116 -0.715 0.076 -0.717 0.069 -0.698 0.074 -0.700 0.063 c51 -0.700 -0.696 0.074 -0.790 0.148 -0.776 0.121 -0.703 0.127 -0.710 0.112 -0.718 0.086 -0.715 0.078 -0.699 0.085 -0.698 0.077 c61 -0.700 -0.685 0.076 -0.789 0.147 -0.767 0.122 -0.698 0.120 -0.694 0.113 -0.713 0.086 -0.704 0.076 -0.695 0.082 -0.690 0.076 b12 0.300 0.300 0.025 0.314 0.027 0.307 0.025 0.302 0.025 0.302 0.024 0.316 0.029 0.307 0.026 0.304 0.025 0.305 0.025 ?22,1 0.300 0.299 0.026 0.305 0.033 0.307 0.032 0.299 0.033 0.297 0.033 0.305 0.028 0.304 0.027 0.301 0.027 0.300 0.027 ?22,2 0.150 0.150 0.025 0.143 0.036 0.140 0.033 0.149 0.037 0.151 0.032 0.142 0.028 0.144 0.026 0.147 0.027 0.149 0.025 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. A.4.2 Correlated x.k?s Recall that we generated each column in Xn = [x1,n, x2,n, . . . , x6,n] by i.i.d. normal with mean ?x = 1 and variance ?x = 1. Let xi.,n = [xi1,n, xi2,n, . . . , xi6,n] be the ith row of Xn. We now set the covariance between xij,n and xik,n to be 0.25, for j ?= k, and thus the variance covariance matrix for elements in xi.,n is ?? ???? 1 0.25 . . . 0.25 ??? ? ? ? . ?? 0.25 1 . . 0.25 ??? , ??? ... . . ?? . . . . ... ???? 0.25 . . . 0.25 1 6?6 for i = 1, . . . , n. The Xn is generated once for all Monte Carlo experiments. The model speci- fications and parameter choices are the same as those in Chapter A.4.1, but here we focus only on the strong identification cases. 165 166 Scenario I Table A.3: Median and RMSE of Scenario I, homoskedasticity, correlated Xn Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.151 0.023 0.165 0.027 0.155 0.024 0.152 0.023 0.151 0.023 0.152 0.023 0.150 0.025 0.153 0.024 0.152 0.025 ?11,1 0.300 0.302 0.026 0.302 0.029 0.307 0.028 0.301 0.028 0.302 0.025 0.304 0.029 0.305 0.027 0.301 0.027 0.302 0.027 ?11,2 0.200 0.197 0.028 0.184 0.033 0.189 0.029 0.196 0.028 0.197 0.027 0.193 0.029 0.194 0.028 0.194 0.029 0.195 0.027 ?11 0.200 0.192 0.059 0.189 0.068 0.187 0.067 0.193 0.065 0.191 0.066 0.192 0.065 0.194 0.066 0.198 0.064 0.197 0.057 ?12 0.100 0.094 0.065 0.105 0.075 0.101 0.078 0.093 0.073 0.098 0.075 0.096 0.073 0.098 0.076 0.099 0.077 0.101 0.067 b12 0.300 0.300 0.020 0.313 0.023 0.301 0.020 0.301 0.021 0.300 0.019 0.301 0.020 0.302 0.020 0.302 0.020 0.302 0.021 ?22,1 0.300 0.299 0.023 0.300 0.025 0.306 0.026 0.299 0.025 0.298 0.024 0.300 0.026 0.302 0.025 0.299 0.025 0.300 0.024 ?22,2 0.150 0.151 0.024 0.140 0.027 0.145 0.025 0.150 0.024 0.151 0.023 0.147 0.024 0.148 0.023 0.150 0.025 0.149 0.024 ?21 0.100 0.097 0.061 0.095 0.063 0.095 0.066 0.098 0.063 0.099 0.064 0.097 0.063 0.100 0.065 0.102 0.064 0.102 0.062 ?22 0.000 -0.017 0.076 -0.005 0.085 -0.009 0.086 -0.015 0.089 -0.017 0.090 -0.012 0.085 -0.012 0.089 -0.007 0.087 -0.008 0.076 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. 167 Scenario II Table A.4: Median and RMSE of Scenario II, homoskedasticity, correlated Xn Strong i.d. TRUE ??ML ??GS2SLS ??GS3SLS ??GSLIV E ??GSFIV E ??LQ?GS2SLS ??LQ?GS3SLS ??LQ?GSLIV E ??LQ?GSFIV E Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE Med RMSE b21 0.150 0.148 0.023 0.161 0.026 0.153 0.024 0.150 0.023 0.149 0.022 0.164 0.027 0.155 0.023 0.152 0.024 0.150 0.023 ?11,1 0.300 0.301 0.025 0.315 0.039 0.312 0.035 0.301 0.035 0.301 0.033 0.313 0.031 0.312 0.028 0.304 0.027 0.304 0.026 ?21,1 -0.195 -0.197 0.026 -0.195 0.032 -0.190 0.032 -0.195 0.033 -0.195 0.030 -0.202 0.030 -0.196 0.028 -0.200 0.029 -0.197 0.028 c41 -1.300 -1.301 0.074 -1.311 0.085 -1.313 0.072 -1.301 0.086 -1.300 0.074 -1.302 0.081 -1.307 0.070 -1.298 0.082 -1.301 0.072 c51 -1.300 -1.301 0.079 -1.304 0.089 -1.309 0.083 -1.299 0.096 -1.302 0.082 -1.300 0.090 -1.308 0.079 -1.297 0.090 -1.301 0.080 c61 -1.300 -1.292 0.080 -1.307 0.095 -1.304 0.082 -1.294 0.094 -1.292 0.081 -1.300 0.088 -1.300 0.081 -1.296 0.089 -1.293 0.082 b12 0.300 0.300 0.021 0.313 0.026 0.307 0.024 0.302 0.025 0.302 0.025 0.314 0.026 0.307 0.023 0.303 0.021 0.304 0.022 ?22,1 0.300 0.299 0.026 0.312 0.033 0.308 0.031 0.300 0.031 0.299 0.029 0.311 0.029 0.307 0.026 0.302 0.026 0.302 0.026 ?22,2 0.150 0.150 0.020 0.147 0.026 0.147 0.025 0.150 0.026 0.151 0.025 0.147 0.024 0.147 0.021 0.149 0.024 0.149 0.020 1 Results are based on 500 Monte Carlo trials with sample size n = 486; ?? = 1. Appendix B: Appendix to Chapter 3 B.1 Theoretical Motivation for the Demand Equation We consider the indirect utility function of Gorman polar form: yi ? fi(p) vi(p, y) = , g(p) where yi denotes the consumer income (or wealth) and both the functions fi(p) and g(p) are ho- mogeneous of degree one in p. In our context, i denotes the index of stations. Thus we implicitly work with ?aggregated? indirect utility function and demand functions. The aggregation is at census tract level on which station i resides. In the following, we view g(p) as a price index for normalization. The logarithm version of Roy?s identity implies that the demand schedule can be written as hi(p, y) = ? ?vi(p, y) ?vi(p, y) ?fi(p) ?g(p) / = + (yi ? fi(p))/g(p). ?pi ?yi ?pi ?pi Let ?fi(p) = ??i , ?g(p) = ??, and view g(p) = p0 as a price index that does not vary with?pi ?pi 168 consumers. Then the above demand equation can be written equivalently as hi(p, y) = ? ? i + ?f ? i (p) + ? ?yi, (B.1) where f ?i (p) denotes the ?normalized? version of fi(p) which is linear in p. Above model accom- modates log-log transformation in the following sense: note that ( ) ( ) ln hi(p, y) = ln ? ? ? ? i + ?fi (p) + ? yi , and first order Taylor series expansion of above demand function at (p0, y0) can be written as ( ) ln hi(p, y) ? ?hi(p, y) | pi ? pi,0 ?hi(p, y) yi ? y0ln(hi(p0, y0)) + p ?p 0 + |y0 . i hi(p0, y0) ?yi hi(p0, y0) Note further that first order linear approximation also implies ln(x) ? ln(x ) = x?x00 , we thenx0 have ( ) ?hi(p, y) ln(pi)? ln(pi,0) ?hi(p, y) ln(yi)? ln(yi,0) ln hi(p, y) ? ln(hi(p0, y0)) + |p0 + |y .?p 0i hi(p0, y0)/pi,0 ?yi hi(p0, y0)/yi,0 (B.2) In spirit of Pinkse and Slade (2004)?s modeling assumptions, we also assume ?hi(p,y) | pi,0 ?p pi 0 hi(p0,y0) to be a nonlinear function of p. For?simplicity, we assume such nonlinear functions can be approximated by some series function T 1t Wtp. In the context of spatial markets, one may take wij,p to denote some distance measure between firms i and j. In the context of Pinkse and Slade (2004) context, i is the brand index in the sample and the distance measure could be a function 1This formulation shares spirit of semi-parametric estimators. See Pinkse et al. (2002) for an example on spatial competition, among others. 169 of alcohol content of each product/brand. Since ?hi(p,y) = ?? and thus let ?hi(p,y) | yi,0 ?y yi ?yi 0 = hi(p0,y0) ? y? i,0 . For simplicity, we assume ? y? = ? i,0 to be a constant. Thus the coefficients of hi(p0,y0) hi(p0,y0) (log) aggregate income ln(yi) does not depend on proximity measures or individual stations. In summary, in light of (B.2), we could formulate the log-demand equation for station i as ? ln(qi) = ?i + ? wijln(pj) + ?ln(yi), j where ?i may vary between stations and thus reflect station-specific effects/characteristics. B.2 Edgeworth Cycle B.2.1 Retail Margins Panel (a) of Figure B.1 and B.2 plot the retail margin, defined as the station level retail price minus the rack price of corresponding wholesale outlet.2 The margin plot does not show strong evidence for the presence of Edgeworth cycle during this period. Panel (b) and (c) of Figure B.1 and B.2 plot the retail margin calculated by retail price on day t minus spot rack price on day t? 5 and t? 10, respectively, since station often maintain inventory that was purchased/ordered one to two weeks ago. These two figures still do not show strongly the asymmetric fluctuations in prices that mark an Edgeworth cycle. 2For stations without a supplier agreement with either Shell or Suncor, we use the average rack price of these two outlets. 170 (a) (b) (c) Figure B.1: Average Retail Margin Computed with Spot Rack Price, Rack Price of 5 and 10 Days? Lead, Aug - Nov, 2019 171 (a) (b) (c) Figure B.2: Average Retail Margin Computed with Spot Rack Price, Rack Price of 5 and 10 Days? Lead, Feb - Apr, 2020 172 B.2.2 Markov Switching Regression (MSR) To further diagnose the possible presence of Edgeworth cycles, we fit the daily prices and margins of Vancouver market to a Dynamic Markov Switching model of two states, in spirit to Noel (2007) and Noel (2007). Table B.1 and Table B.2 document the estimation results for the Markov Switching model and the estimated expected duration for the relenting phase (State 1) and the undercutting phase (State 2). Table B.3 reports the switching probabilities between State 1 and State 2 for both periods. Generically, the model can be written as yt = ?st +Xt? + Zt?s + ?t, where yt is the dependent variable, Xt is the matrix of exogenous variables with state invariant coefficients ? and Zt is the matrix of exogenous variables with state-dependent coefficients ?s, ?st is the state-dependent intercept. In the current context, the dependent variable are set to be the average station prices (in Canadian cents) at time t (specification (1)), average station level retail margins with spot rack prices (specification (2)), and average station level retail margins with rack prices of 10 days lead (specification (3)). For each specification, Xt includes the first and second order lags of the dependent variables, to account for potential trend and momentum in dependent variables that are not state-dependent. We do allow for state dependent intercept and volatility of ?t. The estimated switching probabilities (Table B.3) show that price tend to stay in the current state, i.e., continue to be in State 1 in period t + 1 if the process is in State 1 in period t. This is in contrast to the results reported in, e.g., Noel (2007) and Noel (2007) where the probabilities of 173 moving from relenting phase (State 1) to undercutting phase (State 2) are generally above 90% while that of moving from undercutting phase (State 2) to relenting phase (State 1) are often less than 10%. In other words, their results are consistent with the asymmetric pattern of price undercutting and restoring that an Edgeworth cycle features (e.g., p11 is much lower than p12), while our results are not. Together with the Figure B.1, we conclude that there is no strong evidence of the presence of Edgeworth cycle during the sample period of this empirical study. 174 Table B.1: Within-Regime Estimates and Expected Duration in Days, September 2019 (1) (2) (3) Main L.price 0.387? (0.177) L2.price 0.163 (0.156) L.Margin 0.060 (0.172) L2.Margin -0.066 (0.164) L.Margin (10 days lead) 0.386?? (0.130) L2.Margin (10 days lead) 0.209? (0.115) State 1 Constant 68.576? 62.794?? 26.979?? (22.304) (14.296) (5.517) ?s1 0.881 ?? 0.351 1.494?? (0.166) (0.232) (0.245) State 2 Constant 71.825?? 62.790?? 26.407?? (23.260) (14.271) (5.331) ?s2 0.089 1.177 ?? 0.094 (0.193) (0.198) (0.182) Expected Duration State 1 29.522 16.586 2.834 Expected Duration State 2 25.229 18.585 5.664 Observations 33 33 33 1 Standard errors in parentheses , * (p?0.05), ** (p?0.01) 175 Table B.2: Within-Regime Estimates and Expected Duration in Days, September 2020 (1) (2) (3) Main L.price 0.146 (0.153) L2.price -0.152 (0.136) L.Margin 0.260 (0.169) L2.Margin 0.082 (0.157) L.Margin (10 days lead) 0.214 (0.166) L2.Margin (10 days lead) 0.105 (0.164) State 1 Constant 152.266?? 39.122?? 41.114?? (26.253) (11.155) (11.994) ? ?? ? ?s1 0.825 0.421 0.369 (0.163) (0.232) (0.221) State 2 Constant 157.127?? 40.654?? 43.465?? (27.042) (11.681) (12.707) ?s2 0.150 1.275 ?? 1.190?? (0.199) (0.183) (0.186) Expected Duration State 1 23.395 15.586 7.834 Expected Duration State 2 21.210 12.585 4.664 Observations 31 31 31 1 Standard errors in parentheses , * (p?0.05), ** (p?0.01) 176 Table B.3: Switching Probabilities (1) (2) (3) September 2019 p11 (relenting ?? relenting) 0.966?? 0.940?? 0.647?? (0.042) (0.096) (0.176) p12 (relenting ?? undercutting) 0.034 0.060 0.353? (0.042) (0.096) (0.176) p21 (undercutting ?? relenting) 0.040 0.054 0.177 (0.040) (0.089) (0.111) p22 (undercutting ?? undercutting) 0.960?? 0.946?? 0.823?? (0.960) (0.089) (0.111) March 2020 p11 (relenting ?? relenting) 0.969?? 0.956?? 0.953?? (0.038) (0.059) (0.064) p12 (relenting ?? undercutting) 0.031 0.044 0.047 (0.038) (0.059) (0.064) p21 (undercutting ?? relenting) 0.040 0.037 0.036 (0.052) (0.048) (0.043) p22 (undercutting ?? undercutting) 0.960?? 0.963?? 0.964?? (0.052) (0.048) (0.043) 1 Standard errors in parentheses , * (p?0.05), ** (p?0.01) B.3 Test for IV power In this section, we present the heteroskedasticity-robust F-statistics used for the empirical application. Some parts of the presentation follows closely that of Andrewsi et al. (2018). For generality of the discussion, we consider the following linear instrumental variables (IV) model with a single outcome variable Yn: Yn = X1,n? +X2,n?1 + ?n, (B.3) X1,n = Z1,n?+X2,n?2 + Vn, (B.4) 177 where X1,n is a n ?K1 matrix of (potentially) endogenous regressors, X2,n is a n ?K2 matrix of exogenous regressors. and Z1,n is the n ? H matr(ix of in)struments.( In light )of above con- str(uction, w)e maintai(n the follo)wing assumptions: E Z ?1,n?n = 0, E Z ?1,nVk,n = 0 for ?k, E X ? ? = 0. E X ?2,n n 2,nVk,n = 0. We are interested in( estimating)? consistently, but X1,n is potentially endogenous in the sense that we may have E ??nVk,n ?= 0 . Substituting for X1,n in (B.3), we obtian the equation Yn = Z ? 1,n? +X ? 2,n?3 + Un, (B.5) with ? = ??. Following the common terminology, we refer to (B.3) as the structural form, (B.4) as the first-stage, and (B.5) as the reduced-form. Let MX2,n = I ? X (X ? ?1 ?n 2,n 2,nX2,n) X2,n and Q?ZZ|X2 = 1Z ?1,nMn X2,nZ1,n, the 2SLS estimator can be written as ( )?1 ?? ? ?2SLS = ?? Q?ZZ|X2?? ?? Q?ZZ|X2 ??, where ?? is the first stage O(LS estimator for the reduced)form parameter ?. Let ???? = Q??1 ?ZZ|X Z MX ,n(V? ? ?1 nV? )MX ,nZ1,n Q? . Then the heteroskedasticity 2 1,n 2 n 2 ZZ|X2 robust F-statistics is given by HFR = ??????1???? ? ? 2 H . In the following table, we report the effective first-stage F-statistics proposed by Olea and Pflueger (2013) that depends on a non-homoskedasticity robust estimate of the variance. Explic- itly, the effective F-statistics is computed as ???eff Q?ZZ|XF = 2 ?? . tr(????Q?ZZ|X2) 178 Table B.4a: First-stage OLS Regression and tests for IV power W1 W4 (1) (2) (3) (1) (2) (3) No. Nb -0.041 -0.221?? No. Nb (0.039) (0.038) Avg. Nb Size 0.030 -0.010 Avg. Nb Size 0.065?? 0.081?? (0.043) (0.050) (0.020) (0.022) No. Indirect Nb -0.142?? -0.151?? No. Indirect Nb (0.017) (0.015) Avg. Indirect Nb Size 0.104 0.104 Avg. Indirect Nb Size 0.056? 0.044? (0.075) (0.082) (0.020) (0.021) Nb. Car Wash -0.066?? -0.068?? Nb. Car Wash 0.034?? 0.040?? (0.022) (0.024) (0.008) (0.008) Nb. Service 0.024? 0.022? Nb. Service 0.026?? 0.026?? (0.009) (0.010) (0.003) (0.003) Suncor X Rack 0.163?? 0.164? 0.172?? Suncor X Rack 0.092?? 0.089? 0.101?? (0.052) (0.062) (0.056) (0.027) (0.032) (0.028) Shell X Rack 0.055? 0.052 0.049 Shell X Rack 0.157?? 0.113?? 0.170?? (0.024) (0.029) (0.026) (0.033) (0.039) (0.035) Dist Refinery 0.012?? 0.013?? 0.012?? Dist Refinery 0.145?? 0.141?? 0.147?? (0.002) (0.002) (0.002) (0.018) (0.021) (0.019) Weak IV (F-stat) 36.486 30.290 40.804 Weak IV (F-stat) 46.538 30.658 48.847 ?2 crit-val (5%) 20.530 18.370 19.860 ?2 crit-val (5%) 19.860 16.850 19.280 W2 W5 (1) (2) (3) (1) (2) (3) No. Nb -0.081?? -0.129?? No. Nb -0.041 0.006 (0.020) (0.018) (0.037) (0.038) Avg. Nb Size -0.034?? -0.037?? Avg. Nb Size -0.013 -0.068? (0.005) (0.006) (0.027) (0.030) No. Indirect Nb -0.050?? -0.069?? No. Indirect Nb 0.010 -0.003 (0.010) (0.008) (0.023) (0.022) Avg. Indirect Nb Size -0.197?? -0.076 Avg. Indirect Nb Size 0.151?? 0.148?? (0.054) (0.056) (0.024) (0.026) Nb. Car Wash -0.104?? -0.119?? Nb. Car Wash -0.016 -0.019 (0.012) (0.012) (0.011) (0.011) Nb. Service 0.035?? 0.037?? Nb. Service -0.203?? -0.200?? (0.005) (0.005) (0.033) (0.035) Suncor X Rack 0.137?? 0.167?? 0.080? Suncor X Rack 0.189?? 0.188?? 0.176?? (0.037) (0.044) (0.038) (0.030) (0.035) (0.030) Shell X Rack 0.102?? 0.113?? 0.074? Shell X Rack -0.028 -0.042 -0.036 (0.026) (0.031) (0.028) (0.023) (0.028) (0.025) Dist Refinery 0.044 0.269 0.215 Dist Refinery 0.141?? 0.151?? 0.142?? (0.200) (0.235) (0.217) (0.017) (0.020) (0.018) Weak IV (F-stat) 59.806 36.789 55.517 Weak IV (F-stat) 27.797 24.030 31.931 ?2 crit-val (5%) 20.530 18.370 19.860 ?2 crit-val (5%) 20.530 18.370 19.860 179 Table B.4b: First-stage OLS Regression and tests for IV power (continued) W3 W6 (1) (2) (3) (1) (2) (3) No. Nb -0.020?? -0.009?? No. Nb -0.041 0.006 (0.004) (0.003) (0.037) (0.038) Avg. Nb Size 0.012 -0.084?? Avg. Nb Size -0.013 -0.068? (0.011) (0.009) (0.027) (0.030) No. Indirect Nb 0.008? -0.006? No. Indirect Nb 0.010 -0.003 (0.004) (0.003) (0.023) (0.022) Avg. Indirect Nb Size -0.144 -0.050 Avg. Indirect Nb Size 0.151?? 0.148?? (0.099) (0.105) (0.024) (0.026) Nb. Car Wash -0.033?? -0.033?? Nb. Car Wash -0.016 -0.019 (0.004) (0.004) (0.011) (0.011) Nb. Service -0.052?? -0.050?? Nb. Service -0.203?? -0.200?? (0.011) (0.011) (0.033) (0.035) Suncor X Rack 0.034? 0.027 0.039?? Suncor X Rack 0.189?? 0.188?? 0.176?? (0.013) (0.014) (0.012) (0.030) (0.035) (0.030) Shell X Rack 0.104 -0.025 0.072 Shell X Rack -0.028 -0.042 -0.036 (0.062) (0.070) (0.065) (0.023) (0.028) (0.025) Dist Refinery -0.315 0.153 -0.161 Dist Refinery 0.141?? 0.151?? 0.142?? (0.206) (0.235) (0.221) (0.017) (0.020) (0.018) Weak IV (F-stat) 65.243 50.940 67.045 Weak IV (F-stat) 27.797 24.030 31.931 ?2 crit-val (5%) 20.530 18.370 19.860 ?2 crit-val (5%) 20.530 18.370 19.860 1 Panels (1),(2),(3) correspond to different sets of IVs; Robust SEs are in parenthesis; * (p < 0.05), ** (p < 0.01). 2 We report effective fist-stage F-statistics based on Olea and Pflueger (2013) and Stock-Yogo weak ID test critical values. B.4 Additional Empirical Results 180 181 Table B.6: Estimation Results with W2,n (2-miles radius) (1) (2) (3) OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E Demand ln(price) -3.996?? -12.217?? -12.250?? -9.980? -9.989?? -9.274? -9.327? (1.233) (3.751) (3.068) (3.870) (3.282) (3.720) (3.228) W*ln(price) -7.301 1.821 1.928 1.145 1.147 0.138 0.258 (4.162) (1.336) (1.923) (1.707) (1.097) (1.543) (1.332) Car wash 0.042 -0.261 -0.006 -0.200 -0.005 -0.082 -0.001 (0.113) (0.126) (0.132) (0.121) (0.119) (0.120) (0.118) Service Station 0.047 0.038 -0.001 0.038 0.035 0.045 0.039 (0.036) (0.040) (0.042) (0.038) (0.038) (0.038) (0.038) C-store Size -0.047 -0.036 -0.094? -0.038 -0.042 -0.046 -0.041 (0.035) (0.038) (0.040) (0.036) (0.036) (0.036) (0.036) No. Drivers 0.053? 0.053? 0.052 0.041 0.058? 0.051? 0.047? (0.023) (0.026) (0.028) (0.025) (0.024) (0.024) (0.025) Med Income 0.020 -0.159 -0.123 -0.156 -0.029 -0.142 -0.023 (0.239) (0.266) (0.284) (0.254) (0.252) (0.251) (0.249) Commute Dist. -1.814? -1.472 -1.533 -1.541 -1.581 -1.609 -1.607 (0.912) (1.027) (1.148) (0.976) (0.977) (0.965) (1.002) Travel Mode -0.015? -0.012 -0.010 -0.010 -0.016? -0.009 -0.013? (0.006) (0.007) (0.008) (0.006) (0.006) (0.006) (0.006) ?1 0.274 -15.533?? -15.522?? -8.751?? -8.743?? -7.084?? -7.008?? (2.540) (0.460) (0.694) (0.584) (1.232) (0.029) (1.367) Period Dummy Yes Yes Yes Sample size 302 302 302 R-square 0.15 1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), ** (p < 0.01). wij = 0.0588 if i and j are neighbors. 182 Table B.7: Estimation Results with W3,n (common boundary and reciprocal of the travel distance) (1) (2) (3) OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E Demand ln(price) -3.345? -6.944? -6.815?? -5.730? -5.712? -7.419?? -7.320?? (1.234) (2.806) (2.048) (2.055) (2.035) (2.856) (2.077) W*ln(price) 1.633?? 1.317? 1.418? 0.749 1.438? 1.247? 1.326? (0.565) (0.628) (0.558) (0.627) (0.539) (0.625) (0.475) Car wash 0.022 -0.172 0.083 -0.012 0.053 -0.008 0.086 (0.110) (0.113) (0.111) (0.111) (0.110) (0.112) (0.111) Service Station 0.048 0.049 0.032 0.050 0.031 0.050 0.032 (0.036) (0.036) (0.035) (0.035) (0.035) (0.036) (0.036) C-store Size -0.042 -0.050 -0.031 -0.046 -0.030 -0.046 -0.031 (0.034) (0.034) (0.033) (0.033) (0.033) (0.034) (0.034) No. Drivers 0.046 0.037 0.037 0.035 0.037 0.038 0.034 (0.022) (0.026) (0.028) (0.025) (0.028) (0.026) (0.027) Med Income -0.041 -0.436 0.137 0.167 0.132 -0.052 0.192 (0.231) (0.250) (0.257) (0.242) (0.254) (0.247) (0.259) Commute Dist. -1.585 -1.472 -1.441 -1.289 -1.407 -1.452 -1.267 (0.897) (1.060) (1.151) (1.032) (1.146) (1.039) (1.132) Travel Mode -0.012? -0.004 -0.017? -0.013 -0.018? -0.009 -0.017? (0.006) (0.007) (0.008) (0.007) (0.008) (0.007) (0.008) ?1 2.430 2.572?? 4.227?? 2.739?? 4.373?? 2.327? 4.167?? (1.476) (0.806) (1.040) (0.054) (1.338) (0.959) (0.648) Period Dummy Yes Yes Yes Sample size 302 302 302 R-square 0.17 1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), ** (p < 0.01). 183 Table B.8: Estimation Results with W4,n (nearest neighbor) (1) (2) (3) OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E Demand ln(price) 3.688 -6.814? -6.833? -4.393 -4.176 -11.579?? -11.358?? (2.553) (2.964) (2.938) (2.792) (2.359) (3.140) (3.074) W*ln(price) 0.630 6.479? 6.403? 4.374? 4.369? 10.545?? 9.444?? (1.506) (2.252) (2.301) (1.907) (1.776) (2.962) (3.023) Car wash 0.037 0.111 0.110 0.093 -0.102 0.229? -0.273? (0.112) (0.118) (0.147) (0.118) (0.143) (0.113) (0.122) Service Station 0.048? 0.069? 0.060? 0.064? 0.058? 0.072? 0.082 (0.023) (0.027) (0.026) (0.027) (0.028) (0.033) (0.040) C-store Size -0.047 -0.045 -0.059 -0.047 0.011 -0.049 -0.002 (0.035) (0.035) (0.040) (0.035) (0.042) (0.036) (0.062) No. Drivers 0.056? 0.070?? 0.065? 0.068? 0.082?? 0.078?? 0.100?? (0.022) (0.023) (0.023) (0.023) (0.024) (0.023) (0.022) Med Income -0.008 0.042 -0.020 0.036 -0.349 0.013 0.025 (0.214) (0.334) (0.389) (0.407) (0.439) (0.436) (0.491) Commute Dist. -1.878? -2.063? -2.095? -2.069? -1.988? -2.052? -1.994? (0.903) (0.923) (0.923) (0.938) (0.831) (0.927) (0.707) Travel Mode -0.016 -0.016? -0.047? -0.017? -0.029 -0.013? -0.080?? (0.006) (0.006) (0.016) (0.006) (0.016) (0.006) (0.025) ? 0.124?1 0.005 0.335 0.032 0.337 -0.040 0.336 (0.061) (1.813) (2.599) (1.116) (1.557) (1.076) (1.590) Period Dummy Yes Yes Yes Sample size 302 302 302 R-square 0.08 1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), ** (p < 0.01). 184 Table B.9: Estimation Results with W5,n (common street) (1) (2) (3) OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E Demand ln(price) -3.945?? -7.213? -7.078? -6.786? -6.679?? -7.008? -6.980? (1.225) (2.646) (2.369) (2.697) (2.159) (3.094) (2.864) W*ln(price) 0.654 1.967?? 1.349? 1.359? 1.331? 1.201? 1.405? (0.294) (0.512) (0.529) (0.509) (0.540) (0.513) (0.519) Car wash 0.042 -0.034 -0.102 -0.026 -0.101 -0.080 -0.103 (0.111) (0.114) (0.122) (0.113) (0.121) (0.114) (0.131) Service Station 0.047 0.043 0.066 0.046 0.065 0.039 0.066 (0.036) (0.037) (0.036) (0.037) (0.036) (0.037) (0.036) C-store Size -0.046 -0.043 -0.064 -0.040 -0.063 -0.047 -0.064 (0.035) (0.035) (0.034) (0.035) (0.034) (0.035) (0.034) No. Drivers 0.051? 0.048? 0.056? 0.051? 0.058? 0.050? 0.056? (0.023) (0.024) (0.024) (0.024) (0.024) (0.024) (0.023) Med Income -0.057 -0.066 -0.147 -0.058 -0.136 -0.074 -0.151 (0.236) (0.245) (0.248) (0.245) (0.254) (0.245) (0.250) Commute Dist. -1.746 -1.599 -1.961? -1.625 -2.048? -1.635 -1.971? (0.907) (0.981) (0.971) (0.984) (0.960) (0.982) (0.948) Travel Mode -0.014? -0.018? -0.015? -0.017? -0.015? -0.014? -0.015? (0.006) (0.006) (0.006) (0.006) (0.006) (0.006) (0.006) ? ??1 0.272 0.109 0.098 0.118 0.100 0.110 0.098 (0.055) (0.410) (0.322) (0.388) (0.321) (0.435) (0.320) Period Dummy Yes Yes Yes Sample size 302 302 302 R-square 0.15 1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), ** (p < 0.01). 185 Table B.10: Estimation Results with W6,n (hybrid measure of common street and travel distance) (1) (2) (3) OLS ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E ln(price) -3.946?? -8.875?? -9.112?? -6.286?? -6.428?? -7.692? -7.698? (1.225) (2.526) (2.303) (2.608) (2.043) (3.008) (2.746) W*ln(price) 0.650? 0.129 0.135 0.648? 0.614? 0.609 0.625 (0.299) (0.329) (0.332) (0.321) (0.331) (0.327) (0.335) Car wash 0.042 -0.147 -0.047 -0.195 -0.279? -0.688?? -0.317? (0.111) (0.118) (0.120) (0.116) (0.118) (0.124) (0.127) Service Station 0.047 0.041 0.040 0.048 0.045 0.055 0.035 (0.036) (0.038) (0.038) (0.038) (0.037) (0.040) (0.038) C-store Size -0.046 -0.043 -0.047 -0.048 -0.029 -0.082? -0.019 (0.035) (0.036) (0.036) (0.036) (0.036) (0.038) (0.036) No. Drivers 0.051? 0.040 0.048? 0.055? 0.091?? 0.051? 0.072?? (0.023) (0.024) (0.024) (0.024) (0.024) (0.025) (0.023) Med Income -0.057 -0.130 -0.071 -0.130 0.119 -0.479 0.071 (0.236) (0.250) (0.253) (0.246) (0.251) (0.261) (0.269) Commute Dist. -1.747 -1.504 -1.574 -1.624 -3.185?? -1.683 -2.474?? (0.907) (0.980) (0.967) (0.968) (0.954) (1.011) (0.974) Travel Mode -0.014? -0.020?? -0.006 -0.022?? -0.017? -0.009 -0.014? (0.006) (0.006) (0.006) (0.006) (0.006) (0.006) (0.006) ?1 0.157?? 0.046 0.088 0.052 0.062 0.012 0.049 (0.040) (0.408) (0.354) (0.367) (0.343) (0.412) (0.395) Period Dummy Yes Yes Yes Sample size 302 302 302 R-square 0.15 1 Panels (1),(2),(3) correspond to different sets of supply-side IVs; SE in parentheses are robust to heteroskedasticity; * (p < 0.05), ** (p < 0.01). 186 Table B.11: Definition of Regression Variables ln(volume) log of monthly sales volume in liters ln(price) log of monthly average price in Canadian cents Car wash 1 if the station provides car wash, 0 otherwise Service Station 1 if the station has service station, 0 otherwise No. Drivers number of car drivers live on the census tract that a station is located (1 unit = 100 people) Med Income median income of the residents live on the census tract that a station is located on (unit = 10, 000 canadian dollars) Commute Dist. index of commute distance (between 0 and 1) Travel Mode index of transportation mode (between 0 and 1) No. Nb number of neighboring stations Avg. Nb Size average size of neighboring stations measured by the number of pumps C-store Size/No. Pump size of C-store (square meters)/number of pumps Suncor X Rack Dummy of Suncor?s presence in neighborhood interacted with monthly average rack price Shell X Rack Dummy of Shell?s presence in neighborhood interacted with monthly average rack price Dist Refinery Distance to closed refineries by brands (in kilometers) 187 B.5 Impact Measures Table B.12: Impact Measures W3 W4 ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E Direct Indirect Direct Indirect Direct Indirect Direct Indirect Demand Side Var. Car wash -0.0433 -0.0463 -0.0178 -0.0207 -0.0127 -0.0347 -0.0128 -0.0789 Service Station -0.0266 -0.0302 -0.0282 -0.0318 0.0388 0.0112 0.0306 -0.0523 C-store Size -0.0598 -0.0608 -0.0399 -0.0410 -0.0428 -0.0404 -0.0559 -0.0486 No. Drivers 0.0371 0.0371 0.0373 0.0373 0.0701 0.0701 0.0646 0.0646 Med Income -0.4364 -0.4364 0.1374 0.1374 0.0423 0.0423 -0.0195 -0.0195 W5 W6 ??LQ?GS2SLS ??LQ?GSLIV E ??LQ?GS2SLS ??LQ?GSLIV E Direct Indirect Direct Indirect Direct Indirect Direct Indirect Demand Side Var. Car wash -0.0339 -0.0613 -0.0403 -0.0675 -0.0508 -0.0832 -0.0407 -0.0731 Service Station 0.0190 -0.0022 0.0423 0.0214 0.0128 -0.0122 0.0120 -0.0130 C-store Size -0.0452 -0.0470 -0.0663 -0.0681 -0.0458 -0.0480 -0.0494 -0.0516 No. Drivers 0.0477 0.0477 0.0556 0.0556 0.0405 0.0405 0.0479 0.0479 Med Income -0.0657 -0.0657 -0.1470 -0.1470 -0.1295 -0.1295 -0.0707 -0.0707 1 Specifications correspond to estimates obtained with W3 - W6, respectively. 2 Estimates are based on regression specification (1). 2 Direct refers to the the ?Average Total Direct Impact?; Indirect refers to the ?Average Total Impact from an Observation?. B.6 Test for Network Dependence Table B.13: Test for Network Dependence W1 W2 W3 I2y 165.28 59.31 400.10 Critical Value 14.07 21.03 19.68 W4 W5 W6 I2y 53.52 46.72 19.47 Critical Value 14.07 19.68 19.68 1 Critical value based on 0.01 significant level. The following test statistics is derived based on Liu and Prucha (2018) and is a special case of that presented in their Theorem 2. Consider the following linear model y = Z? + u, where Z contains the exogenous and endogenous regressors. The null hypothesis is given by Hy0 :Ey = X? and cov(y) is diagonal. Note that X denotes only the exogenous variables and thus under Hy0 there is no network gener- ated dependence. Denote the IV/GMM estimator for the parameters ? of above model as ??. In our imple- mentation, we used LQ-GS2SLS estimates. Denote Z? = X(X ?X)?1X ?Z, ?? = diag(u?2i ), ??k = diag(u?i??ik), and ??kl = diag(??ik??il), where u?i is the i-th element of u? = y ? Z?? and 188 ??ik is the (i, k)-th element of E? = Z ? Z?. Let W = (W +W ?)/2, ?? ? ? ? ???? V? Y ? ???? ? ?? ??Y Y ??Y Z ??Y U? ???? V? = ?? Z , ?? = Y Z ? ZZ ZU ,? V? ??? ???? (?? ) ?? ?? ???? V? U (??Y U)? (??ZU)? ??UU where V? Y =u??Wy, V? Z = (u??WZ)?, V? U = u??Wu??, ?K ?K ?K ??Y Y =2tr(W ??W ??) + 2 ??ktr(W ??W ?? ? k +W ??W ??k) + ??k??ltr(W ??kW ??l +W ??klW ??) k=1 k=1 l=1 ?+ ???Z? ?W ?MZ???MZ?WZ???, ?Y Z =? ?K2tr(W ??W ?? ) + ?? tr(W ?? W ?? +W ?? W ???)? ? l k k l kl k=1 l=1,...,K + ???Z? ?W ?MZ???MZ?WZ??K ??Y U =2tr(W ??W ??) + 2 ??ktr(W ??kW ??), [ k=1 ] ??ZZ = [tr(W ?? ? ? ? kW ??l +]W ??klW ??) + Z? W MZ???MZ?WZ?,k,l=1,...,K ??ZU = 2tr(W ??kW ??) , k=1,...,K ??UU =2tr(W ??W ??), and K denotes the number of columns in Z and MZ? = In ? Z?(Z? ?Z?)?1Z? ?. In their Theorem 2, Liu and Prucha (2018) showed that 1 ?? is a consistent estimator for the VC matrix of n?1/2V? , n 189 and the generalized test statistic I2y is now given by I2y = (LV? )?(L??L?)?1(LV? ), (B.6) where L is a selector matrix such that L??L? is nonsingular. Their Theorem 2 implies I2 ??dy ?2(rank(L)). Note that the degree of freedom of the ?2 distribution is given by rank(L). 190 Bibliography [1] Daron Acemoglu, Suresh Naidu, Pascual Restrepo, and James A Robinson. Democracy does cause growth. Journal of political economy, 127(1):47?100, 2019. [2] David R Agrawal. The tax gradient: Spatial aspects of fiscal competition. American Economic Journal: Economic Policy, 7(2):1?29, 2015. [3] Isaiah Andrewsi, James Stocki, and Liyang Sun. Weak instruments in iv regression: The- ory and practice. 2018. [4] Luc Anselin. Spatial econometrics: methods and models, volume 4. Springer Science & Business Media, 1988. [5] Luc Anselin. Thirty years of spatial econometrics. Papers in regional science, 89(1):3?25, 2010. [6] Luc Anselin and Raymond JGM Florax. New directions in spatial econometrics: Intro- duction. In New directions in spatial econometrics, pages 3?18. Springer, 1995. [7] Luc Anselin et al. Spatial econometrics. A companion to theoretical econometrics, 310330, 2001. [8] Irani Arraiz, David M Drukker, Harry H Kelejian, and Ingmar R Prucha. A spatial cliff- ord-type model with heteroskedastic innovations: Small and large sample results. Journal of Regional Science, 50(2):592?614, 2010. [9] Andrea Ascani, Alessandra Faggian, and Sandro Montresor. The geography of covid-19 and the structure of local economies: The case of italy. Journal of Regional Science, 61 (2):407?441, 2021. [10] Benjamin Atkinson. On retail gasoline pricing websites: potential sample selection biases and their implications for empirical research. Review of Industrial Organization, 33(2): 161?175, 2008. [11] Benjamin Atkinson. Retail gasoline price cycles: Evidence from guelph, ontario using bi-hourly, station-specific retail price data. The Energy Journal, 30(1), 2009. [12] Benjamin Atkinson, Andrew Eckert, and Douglas S West. Daily price cycles and constant margins: Recent events in canadian gasoline retailing. The Energy Journal, 35(3), 2014. [13] Coralio Ballester, Antoni Calvo?-Armengol, and Yves Zenou. Who?s who in networks. wanted: The key player. Econometrica, 74(5):1403?1417, 2006. 191 [14] Badi H Baltagi and Georges Bresson. Maximum likelihood estimation and lagrange mul- tiplier tests for panel seemingly unrelated regressions with spatial lag and spatial errors: An application to hedonic housing prices in paris. Journal of Urban Economics, 69(1): 24?42, 2011. [15] Badi H Baltagi and Ying Deng. Ec3sls estimator for a simultaneous system of spatial autoregressive equations with random effects. Econometric Reviews, 34(6-10):659?694, 2015. [16] Badi H Baltagi, Peter Egger, and Michael Pfaffermayr. Estimating models of complex fdi: Are there third-country effects? Journal of econometrics, 140(1):260?281, 2007. [17] Badi H Baltagi, Peter Egger, and Michael Pfaffermayr. Estimating regional trade agree- ment effects on fdi in an interdependent world. Journal of Econometrics, 145(1-2):194? 208, 2008. [18] Melissa Bantle, Matthias Muijs, and Ralf Dewenter. A new price test in geographic market definition: An application to german retail gasoline market. Technical report, Diskussion- spapier, 2018. [19] John M Barron, Beck A Taylor, and John R Umbeck. Number of sellers, average prices, and price dispersion. International Journal of Industrial Organization, 22(8-9):1041? 1066, 2004. [20] John M Barron, John R Umbeck, and Glen R Waddell. Consumer and competitor reactions: Evidence from a field experiment. International Journal of Industrial Organization, 26(2): 517?531, 2008. [21] Kristian Behrens, Cem Ertur, and Wilfried Koch. ?dual?gravity: Using spatial econo- metrics to control for multilateral resistance. Journal of Applied Econometrics, 27(5): 773?794, 2012. [22] Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equilibrium. Econometrica: Journal of the Econometric Society, pages 841?890, 1995. [23] Bruce A Blonigen, Ronald B Davies, Glen R Waddell, and Helen T Naughton. Fdi in space: Spatial autoregressive relationships in foreign direct investment. European eco- nomic review, 51(5):1303?1325, 2007. [24] Lawrence E Blume, William A Brock, Steven N Durlauf, and Yannis M Ioannides. Identifi- cation of social interactions. In Handbook of social economics, volume 1, pages 853?964. Elsevier, 2011. [25] Phillip Bonacich. Power and centrality: A family of measures. American journal of sociology, 92(5):1170?1182, 1987. [26] James M Brundy and Dale W Jorgenson. Efficient estimation of simultaneous equations by instrumental variables. The Review of Economics and Statistics, pages 207?224, 1971. 192 [27] Antoni Calvo?-Armengol, Eleonora Patacchini, and Yves Zenou. Peer effects and social networks in education. The review of economic studies, 76(4):1239?1267, 2009. [28] G. Clemenz and K. Gugler. Locational choice and price competition: Some empirical results for the austrian retail gasoline market. Empirical Economics, 31:291 ? 312, 2006. [29] Andrew Cliff and J.K. Ord. Spatial Autocorrelation. London:Pion, 1973. [30] Andrew Cliff and J.K. Ord. Spatial Processes, Models and Applications. London:Pion, 1981. [31] Ethan Cohen-Cole, Xiaodong Liu, and Yves Zenou. Multivariate choices and identification of social interactions. Journal of Applied Econometrics, 33(2):165?178, 2018. [32] Timothy G Conley and Ethan Ligon. Economic distance and cross-country spillovers. Journal of Economic Growth, 7(2):157?187, 2002. [33] Phoebus J Dhrymes and John Guerard. Introductory econometrics, volume 4. Springer, 1978. [34] Joseph Doyle, Erich Muehlegger, and Krislert Samphantharak. Edgeworth cycles revisited. Energy Economics, 32(3):651?660, 2010. [35] David M Drukker, Peter H Egger, and Ingmar R Prucha. Simultaneous equations models with higher-order spatial or social network interactions. Econometric Theory, pages 1?48, 2022. [36] Andrew Eckert. Retail price cycles and the presence of small firms. International Journal of Industrial Organization, 21(2):151?170, 2003. [37] Andrew Eckert. Empirical studies of gasoline retailing: A guide to the literature. Journal of Economic Surveys, 27(1):140?166, 2013. [38] Andrew Eckert and Douglas S West. Retail gasoline price cycles across spatially dispersed gasoline stations. The Journal of Law and Economics, 47(1):245?273, 2004. [39] Andrew Eckert and Douglas S West. Price uniformity and competition in a retail gasoline market. Journal of Economic Behavior & Organization, 56(2):219?237, 2005. [40] Cem Ertur and Wilfried Koch. Growth, technological interdependence and spatial exter- nalities: theory and evidence. Journal of applied econometrics, 22(6):1033?1062, 2007. [41] Cem Ertur and Antonio Musolesi. Weak and strong cross-sectional dependence: A panel data analysis of international technology diffusion. Journal of Applied Econometrics, 32 (3):477?503, 2017. [42] Ying Fan. Ownership consolidation and product characteristics: A study of the us daily newspaper market. American Economic Review, 103(5):1598?1628, 2013. 193 [43] Bernard Fingleton and Nikodem Szumilo. Simulating the impact of transport infrastructure investment on wages: a dynamic spatial panel model approach. Regional Science and Urban Economics, 75:148?164, 2019. [44] Roberto Gallardo, Brian Whitacre, Indraneel Kumar, and Sreedhar Upendram. Broadband metrics and job productivity: a look at county-level data. The Annals of Regional Science, 66(1):161?184, 2021. [45] Heather D Gibson, Stephen G Hall, Pavlos Petroulas, Vassilis Spiliotopoulos, and George S Tavlas. The effect of emergency liquidity assistance (ela) on bank lending during the euro area crisis. Journal of International Money and Finance, 108:102154, 2020. [46] Georg Go?tz and Klaus Gugler. Market concentration and product variety under spatial competition: Evidence from retail gasoline. Journal of Industry, Competition and Trade, 6(3-4):225?234, 2006. [47] Sebastian Hauptmeier, Ferdinand Mittermaier, and Johannes Rincke. Fiscal competition over taxes and public inputs. Regional science and urban economics, 42(3):407?419, 2012. [48] Jerry A Hausman. An instrumental variable approach to full information estimators for lin- ear and certain nonlinear econometric models. Econometrica: Journal of the Econometric Society, pages 727?738, 1975. [49] David F Hendry. The structure of simultaneous equations estimators. Journal of Econo- metrics, 4(1):51?88, 1976. [50] Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university press, 1985. [51] Daniel S Hosken, Robert S McMillan, and Christopher T Taylor. Retail gasoline pricing: What do we know? International Journal of Industrial Organization, 26(6):1425?1436, 2008. [52] Jean-Franc?ois Houde. Spatial differentiation and vertical mergers in retail markets for gasoline. American Economic Review, 102(5):2147?82, 2012. [53] Jonathan Hughes, Christopher R Knittel, and Daniel Sperling. Evidence of a shift in the short-run price elasticity of gasoline demand. The Energy Journal, 29(1), 2008. [54] P Wilner Jeanty, Mark Partridge, and Elena Irwin. Estimation of a spatial simultaneous equation model of population migration and housing price dynamics. Regional Science and Urban Economics, 40(5):343?352, 2010. [55] Harry H Kelejian and Ingmar R Prucha. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. The Journal of Real Estate Finance and Economics, 17(1):99?121, 1998. [56] Harry H Kelejian and Ingmar R Prucha. A generalized moments estimator for the au- toregressive parameter in a spatial model. International economic review, 40(2):509?533, 1999. 194 [57] Harry H Kelejian and Ingmar R Prucha. Estimation of simultaneous systems of spatially interrelated cross sectional equations. Journal of econometrics, 118(1-2):27?50, 2004. [58] Harry H Kelejian and Ingmar R Prucha. Specification and estimation of spatial autoregres- sive models with autoregressive and heteroskedastic disturbances. Journal of economet- rics, 157(1):53?67, 2010. [59] Harry H Kelejian, Ingmar R Prucha, and Yevgeny Yuzefovich. Instrumental variable esti- mation of a spatial autoregressive model with autoregressive disturbances: Large and small sample results. In Spatial and spatiotemporal econometrics. Emerald Group Publishing Limited, 2004. [60] Frank Kleibergen and Richard Paap. Generalized reduced rank tests using the singular value decomposition. Journal of econometrics, 133(1):97?126, 2006. [61] Guido M Kuersteiner and Ingmar R Prucha. Dynamic spatial panel models: Networks, common shocks, and sequential exogeneity. Econometrica, 88(5):2109?2146, 2020. [62] Jim Lee and Yuxia Huang. Covid-19 impact on us housing markets: evidence from spatial regression models. Spatial Economic Analysis, pages 1?21, 2022. [63] Lung-fei Lee. Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econometric Reviews, 22(4):307?335, 2003. [64] Lung-fei Lee. Gmm and 2sls estimation of mixed regressive, spatial autoregressive models. Journal of Econometrics, 137(2):489?514, 2007. [65] Lung-Fei Lee. Identification and estimation of econometric models with group interac- tions, contextual factors and fixed effects. Journal of Econometrics, 140(2):333?374, 2007. [66] James LeSage and Robert Kelley Pace. Introduction to spatial econometrics. Chapman and Hall/CRC, 2009. [67] Laurence Levin, Matthew S Lewis, and Frank A Wolak. High frequency evidence on the demand for gasoline. American Economic Journal: Economic Policy, 9(3):314?47, 2017. [68] Xu Lin and Lung-fei Lee. Gmm estimation of spatial autoregressive models with unknown heteroskedasticity. Journal of Econometrics, 157(1):34?52, 2010. [69] Xiaodong Liu. Identification and efficient estimation of simultaneous equations network models. Journal of Business & Economic Statistics, 32(4):516?536, 2014. [70] Xiaodong Liu. Simultaneous equations with binary outcomes and social interactions. Econometric Reviews, 38(8):921?937, 2019. [71] Xiaodong Liu. Gmm identification and estimation of peer effects in a system of simulta- neous equations. Journal of Spatial Econometrics, 1(1):1?27, 2020. 195 [72] Xiaodong Liu and Ingmar R Prucha. A robust test for network generated dependence. Journal of econometrics, 207(1):92?113, 2018. [73] Xiaodong Liu and Paulo Saraiva. Gmm estimation of spatial autoregressive models in a system of simultaneous equations with heteroskedasticity. Econometric Reviews, 38(4): 359?385, 2019. [74] Alexander MacKay and Nathan Miller. Estimating models of supply and demand: Instru- ments and covariance restrictions. Available at SSRN 3025845, 2021. [75] Jan R Magnus and Heinz Neudecker. Matrix differential calculus with applications in statistics and econometrics. John Wiley & Sons, 2019. [76] Mark D Manuszak. Predicting the impact of upstream mergers on downstream markets with an application to the retail gasoline industry. International Journal of Industrial Organization, 28(1):99?111, 2010. [77] Shohei Nakamura and Paolo Avner. Spatial distributions of job accessibility, housing rents, and poverty: The case of nairobi. Journal of Housing Economics, 51:101743, 2021. [78] Aviv Nevo. Measuring market power in the ready-to-eat cereal industry. Econometrica, 69(2):307?342, 2001. [79] Xiaoming Ning and Robert Haining. Spatial pricing in interdependent markets: a case study of petrol retailing in sheffield. Environment and Planning A, 35(12):2131?2159, 2003. [80] Michael D Noel. Edgeworth price cycles, cost-based pricing, and sticky pricing in retail gasoline markets. The Review of Economics and Statistics, 89(2):324?334, 2007. [81] Michael D Noel. Edgeworth price cycles: Evidence from the toronto retail gasoline market. The Journal of Industrial Economics, 55(1):69?92, 2007. [82] Michael D Noel. Edgeworth price cycles and focal prices: Computational dynamic markov equilibria. Journal of Economics & Management Strategy, 17(2):345?377, 2008. [83] Jose? Luis Montiel Olea and Carolin Pflueger. A robust test for weak instruments. Journal of Business & Economic Statistics, 31(3):358?369, 2013. [84] Sung Y Park and Guochang Zhao. An estimation of us gasoline demand: A smooth time- varying cointegration approach. Energy Economics, 32(1):110?120, 2010. [85] Dieter Pennerstorfer. Spatial price competition in retail gasoline markets: evidence from austria. The Annals of Regional Science, 43(1):133?158, 2009. [86] Dieter Pennerstorfer and Christoph Weiss. Spatial clustering and market power: Evidence from the retail gasoline market. Regional Science and Urban Economics, 43(4):661?675, 2013. 196 [87] Jordi Perdiguero and Joan-Ramon Borrell. Driving competition in local gasoline markets. Document de Treball No. XREAP2012-04, 2012. [88] Joris Pinkse and Margaret E Slade. Contracting in space: An application of spatial statistics to discrete-choice models. Journal of Econometrics, 85(1):125?154, 1998. [89] Joris Pinkse and Margaret E Slade. Mergers, brand competition, and the price of a pint. European Economic Review, 48(3):617?643, 2004. [90] Joris Pinkse, Margaret E Slade, and Craig Brett. Spatial price competition: a semipara- metric approach. Econometrica, 70(3):1111?1153, 2002. [91] Ingmar R Prucha and Harry H Kelejian. The structure of simultaneous equation estima- tors: A generalization towards nonnormal disturbances. Econometrica: Journal of the Econometric Society, pages 721?736, 1984. [92] Anindya Sen. Higher prices at canadian gas pumps: international crude oil prices or local market concentration? an empirical investigation. Energy Economics, 25(3):269?288, 2003. [93] Anindya Sen. Does increasing the market share of smaller firms result in lower prices? empirical evidence from the canadian retail gasoline industry. Review of Industrial Orga- nization, 26(4):371?389, 2005. [94] Margaret E Slade. Vancouver?s gasoline-price wars: An empirical exercise in uncovering supergame strategies. The Review of Economic Studies, 59(2):257?276, 1992. [95] Margaret E Slade. Strategic motives for vertical separation: evidence from retail gasoline markets. Journal of Law, Economics, & Organization, pages 84?113, 1998. [96] Shawn W Ulrick, Seth B Sacher, Paul R Zimmerman, and John M Yun. Defining geo- graphic markets with willingness-to-travel circles. Supreme Court Economic Review, 28 (1):241?284, 2020. [97] Wim Van Meerbeeck. Competition and local market conditions on the belgian retail gaso- line market. De Economist, 151(4):369?388, 2003. [98] Luya Wang, Kunpeng Li, and Zhengwei Wang. Quasi maximum likelihood estimation for simultaneous spatial autoregressive models. 2014. [99] Peter Whittle. On stationary processes in the plane. Biometrika, pages 434?449, 1954. [100] Kai Yang and Lung-fei Lee. Identification and qml estimation of multivariate and simulta- neous equations spatial autoregressive models. Journal of Econometrics, 196(1):196?214, 2017. [101] Kai Yang and Lung-fei Lee. Identification and estimation of spatial dynamic panel simul- taneous equations models. Regional Science and Urban Economics, 76:32?46, 2019. 197