ABSTRACT
Title of dissertation: ESSAYS IN EMPIRICAL
INDUSTRIAL ORGANIZATION
Matthew Chesnes, Doctor of Philosophy, 2009
Dissertation directed by: Professor John Rust
Professor Ginger Jin
Department of Economics
Chapter 1: Capacity and Utilization Choice in the US Oil Refining In-
dustry
This paper presents a new dynamic model of the operating and investment
decisions of US oil refiners. The model enables me to predict how shocks to crude
oil prices and refinery shutdowns (e.g., in response to hurricanes) affect the price of
gasoline, refinery profits, and overall welfare. There have been no new refineries built
in the last 32 years, and although existing refineries have expanded their capacity by
almost 13% since 1995, the demand for refinery products has grown even faster. As a
result, capacity utilization rates are now near their maximum sustainable levels, and
when combined with record high crude oil prices, this creates a volatile environment
for energy markets. Shocks to the price of crude oil and even minor disruptions
to refining capacity can have a large effect on the downstream prices of refined
products. Due to the extraordinary dependence by other industries on petroleum
products, this can have a large effect on the US economy as a whole.
I use the generalized method of moments to estimate a dynamic model of ca-
pacity and utilization choice by oil refiners. Plants make short-run utilization rate
choices to maximize their expected discounted profits and may make costly long-
term investments in capacity to meet the growing demand and reduce the potential
for breaking down. I show that the model fits the data well, in both in-sample and
out-of-sample predictive tests, and I use the model to conduct a number of counter-
factual experiments. My model predicts that a 20% increase in the price of crude oil
is only partially passed on to consumers, resulting in higher gasoline prices, lower
profits for the refinery, and a 45% decrease in total welfare. A disruption to refining
capacity, such as the one caused by Hurricane Katrina in 2005, raises gasoline prices
by almost 16% and has a small negative effect on overall welfare: the higher profits
of refineries partially offsets the large reduction in consumer surplus. As the theory
predicts, these shocks have a smaller effect on downstream prices when consumer
demand is more elastic, resulting in a larger share of total welfare going to the con-
sumer.
Chapter 2: Consumer Search for Online Drug Information
Consumers are increasingly turning to the internet and using search engines
to find information on medicinal drugs. Between 2001 and 2007, the number of
adults using the internet as an alternative source of health information doubled.
At the same time, online and o?ine advertising spending by drug companies is
growing rapidly. I seek to understand how consumers use search engines to find drug
information and how this activity is influenced by direct to consumer advertising.
I utilize a database of user click-through data from America Online to ana-
lyze the search behavior of consumers seeking drug information online. Compared
with other searches, users submitting drug-related queries are more likely to click on
more than one result in a search session, and when they do, they click more rapidly
through the results and tend to migrate away from dot-com sites and toward those
ending in dot-org and dot-net. O?ine advertising on a drug serves to increase the
frequency and intensity of these searches.
Chapter 3: Drug Information via Online Search Engines
This paper utilizes a database of organic and sponsored search results from four
large search engines to analyze the supply of drug-related information available on
the internet. I show that the information varies significantly across search engines,
domain extensions, and between organic and sponsored results. Regression results
reveal that websites with relatively more promotional content are pushed down in
the search results while informational sites (including those ending in dot-gov and
dot-org) are more likely to appear on page one of the results.
ESSAYS IN EMPIRICAL
INDUSTRIAL ORGANIZATION
by
Matthew William Chesnes
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2009
Advisory Committee:
Professor John Rust, Co-Chair
Professor Ginger Jin, Co-Chair
Professor Peter Cramton
Professor Pablo D?Erasmo
Professor Erik Lichtenberg
c? Copyright by
Matthew William Chesnes
2009
Acknowledgments
I thank my advisors, John Rust, Ginger Jin and Peter Cramton, for their
invaluable guidance, as well as Pablo D?Erasmo and Erik Lichtenberg for participat-
ing in my defense. I am grateful to the Department of Economics at the University
of Maryland and the Economic Club of Washington for their financial support. I
also want to thank John Shea, Mark Duggan, Adam Copeland, David Givens, Ariel
BenYishay, and seminar participants at the University of Maryland, the Federal Re-
serve Board of Governors, the International Industrial Organization Conference, and
La Pietra-Mondragone Workshop in Economics for their suggestions and comments.
All remaining errors are my own.
ii
Table of Contents
List of Tables v
List of Figures vii
List of Abbreviations ix
1 Capacity and Utilization Choice in the US Oil Refining Industry 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The US Oil Refining Industry . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Competition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Capacity and Utilization . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 Refinery Maintenance and Outages . . . . . . . . . . . . . . . 14
1.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.1 A Firm?s Problem . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.2 Per-Period Profit . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4.3 Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4.4 Probability of Breakdown . . . . . . . . . . . . . . . . . . . . 26
1.4.5 Production and Investment Costs . . . . . . . . . . . . . . . . 26
1.5 Empirical Estimation Strategy . . . . . . . . . . . . . . . . . . . . . . 28
1.5.1 Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5.2 Breakdown Probability . . . . . . . . . . . . . . . . . . . . . . 30
1.5.3 Production Cost Parameters . . . . . . . . . . . . . . . . . . . 31
1.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.6.1 Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.6.2 First Stage Estimates: Demand and Breakdown . . . . . . . . 37
1.6.3 Second Stage Estimates: Costs . . . . . . . . . . . . . . . . . 38
1.6.4 Policy Function . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.7 Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.7.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.7.2 Results of Experiments . . . . . . . . . . . . . . . . . . . . . . 45
1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2 Consumer Search for Online Drug Information 52
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.3 Descriptive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.4.1 Frequency Regressions . . . . . . . . . . . . . . . . . . . . . . 67
2.4.2 Depth Regressions . . . . . . . . . . . . . . . . . . . . . . . . 71
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
iii
3 Drug Information via Online Search Engines 76
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.3 Descriptive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.3.1 Supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.3.2 Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.3.3 Rank and Content Comparisons . . . . . . . . . . . . . . . . . 89
3.3.4 Kernel Density Plots of Content . . . . . . . . . . . . . . . . . 90
3.3.5 Probit Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
A Chapter 1 Supplement 98
A.1 The Distillation Process . . . . . . . . . . . . . . . . . . . . . . . . . 98
A.2 Crude Oil Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A.3 Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
A.4 Additional Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B Chapter 2 Supplement 106
C Chapter 3 Supplement 108
Bibliography 112
iv
List of Tables
1.1 Refinery Downtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Industry Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Demand Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.4 Breakdown Probability Estimates . . . . . . . . . . . . . . . . . . . . 39
1.5 The Effect of a 20% Increase in the Crude Oil Price . . . . . . . . . . 46
1.6 The Effect of a 25% Loss in Capacity . . . . . . . . . . . . . . . . . . 47
2.1 Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2 Search Activity by Drug Class . . . . . . . . . . . . . . . . . . . . . . 61
2.3 Search Activity by Drug Age . . . . . . . . . . . . . . . . . . . . . . . 62
2.4 Search Activity by Drug Type . . . . . . . . . . . . . . . . . . . . . . 63
2.5 Transitions between extensions . . . . . . . . . . . . . . . . . . . . . 64
2.6 Transitions between ranks . . . . . . . . . . . . . . . . . . . . . . . . 65
2.7 Regression Results - Frequency of Search . . . . . . . . . . . . . . . . 68
2.8 Regression Results - Depreciation Analysis . . . . . . . . . . . . . . . 70
2.9 Regression Results - Depth of Search . . . . . . . . . . . . . . . . . . 72
3.1 Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2 Regression Results: Probit of Pr(Page 1) . . . . . . . . . . . . . . . . 94
A.1 Crude Qualities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.2 Industry Concentration . . . . . . . . . . . . . . . . . . . . . . . . . . 104
A.3 Cost Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
B.1 20 Most Actively Searched Drugs . . . . . . . . . . . . . . . . . . . . 106
B.2 20 Most Advertised Drugs . . . . . . . . . . . . . . . . . . . . . . . . 107
v
B.3 Variables Used in Regressions. . . . . . . . . . . . . . . . . . . . . . . 107
C.1 List of Search Queries . . . . . . . . . . . . . . . . . . . . . . . . . . 109
C.2 Keywords Used in Classification Algorithm . . . . . . . . . . . . . . . 110
C.3 Variable Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
vi
List of Figures
1.1 Production Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Average Yields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Refinery Locations (Scaled by Capacity) . . . . . . . . . . . . . . . . 8
1.4 Major Refined Product Pipelines . . . . . . . . . . . . . . . . . . . . 9
1.5 Capacity and Number of Refineries . . . . . . . . . . . . . . . . . . . 11
1.6 Non-Zero Changes in Capacity, All Plants, 1986-2007 . . . . . . . . . 12
1.7 Capacity Utilization Rate and Crack Spread . . . . . . . . . . . . . . 13
1.8 District Breakdowns . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.9 Model Fit (In Sample) . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.10 Model Fit (Out of Sample) . . . . . . . . . . . . . . . . . . . . . . . . 36
1.11 Estimated Production and Investment Cost Functions . . . . . . . . . 40
1.12 Optimal Utilization Rate Versus Month . . . . . . . . . . . . . . . . . 41
1.13 Optimal Utilization Rate Versus Month and Crude Price . . . . . . . 42
1.14 Price Elasticity of Demand . . . . . . . . . . . . . . . . . . . . . . . . 43
1.15 Crude Oil Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.16 Loss in Capacity: Hurricane Katrina . . . . . . . . . . . . . . . . . . 45
1.17 Crude Oil Counterfactual: Simulation . . . . . . . . . . . . . . . . . . 46
1.18 Capacity Counterfactual: Simulation . . . . . . . . . . . . . . . . . . 48
2.1 Total DTCA Spending on all Prescription Drugs . . . . . . . . . . . . 57
2.2 DTCA Breakdown by Media Type . . . . . . . . . . . . . . . . . . . 57
2.3 Extension Popularity in the First 10 Clicks . . . . . . . . . . . . . . . 63
2.4 Rank Popularity in the First 10 Clicks . . . . . . . . . . . . . . . . . 64
2.5 Drill Down Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
vii
3.1 Distribution of Organic and Sponsored Results . . . . . . . . . . . . . 83
3.2 Extension Popularity - Organic Results . . . . . . . . . . . . . . . . . 84
3.3 Extension Popularity - Sponsored Results . . . . . . . . . . . . . . . . 85
3.4 Average Rank by Extension - Organic Results . . . . . . . . . . . . . 85
3.5 Content of Summary Field - Organic Results . . . . . . . . . . . . . . 87
3.6 Content of Summary Field - Sponsored Results . . . . . . . . . . . . 87
3.7 Content of Summary Field - Organic Results - By Extension . . . . . 88
3.8 Content of Summary Field - Sponsored Results - By Extension . . . . 88
3.9 Rank Comparison - Organic Results - Google vs Yahoo . . . . . . . . 90
3.10 Summary Content Comparison - Organic Results - Google vs Yahoo . 91
3.11 Organic Rank Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.12 Kernel Density of Summary Content . . . . . . . . . . . . . . . . . . 92
A.1 Refinery Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A.2 Average Crude Oil Quality: Heavier and More Sour . . . . . . . . . . 101
C.1 Kernel Density of Summary Content, Organic Results, By Extension 108
C.2 Kernel Density of Summary Content, Sponsored Results, By Extension111
viii
List of Abbreviations
API American Petroleum Institute
DTCA Direct-To-Consumer Advertising
EIA Energy Information Administration
FCC Fluid Catalytic Cracking
FDA Food and Drug Administration
GMM Generalized Method of Moments
HHI Herfindahl-Hirschman Index
HTML HyperText Markup Language
IANA Internet Assigned Numbers Authority
MTBE Methyl Tertiary Butyl Ether
NAMCS National Ambulatory Medical Care Survey
NDC National Drug Code
OPEC Organization of the Petroleum Exporting Countries
OTC Over-The-Counter
PADD Petroleum Administration for Defense Districts
RLD Reference Listed Drug
URL Uniform Resource Locator
ix
Chapter 1
Capacity and Utilization Choice in the US Oil Refining Industry
1.1 Introduction
The United States is the largest consumer of crude oil in the world and this
resource accounts for 40% of the country?s total energy needs.1 Although a majority
of this oil comes from foreign sources, almost all is refined domestically. Refineries
distill crude oil into a large number of products such as gasoline, distillate (heating
oil), and jet fuel. While much attention has been paid to the upstream crude oil pro-
duction industry (see Hamilton (1983) and Hubbard (1986)), and the downstream
retail sector (see Borenstein (1991 & 1997)), very little research has focused on the
role of the refining industry. Two important dynamic decisions faced by refiners are
their investment in capacity and the utilization rate at which they run their plant.
These choices are defined over different time horizons.2 The optimal choice of ca-
pacity accumulation, i.e., the increased ability to distill crude oil into higher valued
products, is a long-term decision. Capacity is expensive to build and may take time
to come online so forecasts of future market conditions are crucial. A shorter-term
problem involves a refiner?s choice of capacity utilization. This rate measures the
intensity with which a firm uses its capital, which for a refinery may include the use
1Source: 2007 Annual Energy Review, Energy Information Administration (EIA).
2In addition, they must solve a complicated linear programming problem because their relative
output prices are constantly changing and they have the choice of utilizing different types of crude
oil, some of which are better adapted to producing certain products.
1
of boilers, distillation columns, and downstream cracking units.3
The refiner?s problem is further complicated by changing market conditions,
geopolitical tensions, and unexpected events, such as hurricanes. The largest com-
ponent of refiners? output is gasoline. New alternative technologies, such as hybrid
cars, and changing perceptions on the environmental impact of gas-powered vehicles
has affected the sensitivity of consumer demand to the price of gasoline.4 This af-
fects the ability of refiners to pass through shocks to the price of crude oil resulting
from, for example, reduced production from OPEC countries or a war in the Middle
East. With about one-half of US refining capacity located along the Gulf of Mexico,
the potential for hurricanes can also dramatically affect the ability of the industry
to supply a consistent flow of gasoline and other products to the rest of the country.
This paper develops and estimates a new dynamic model of the operating and
investment decisions of US oil refiners. These refiners face the possibility of breaking
down if they run their plant too intensively, so they make costly investments in
capacity to reduce this potential and to meet the growing demand for their products.
MymodelassumesthatfirmsareCournotcompetitorsintherefinedproductmarket.
With many small firms, each is approximately a price-taker in the market, so the
model of Kreps and Scheinkman (1983), with quantity pre-commitment (capacity
choice) and Bertrand price competition, is similar to my approach. The model
enables me to predict how shocks to crude oil prices and refinery shutdowns (e.g.,
in response to hurricanes) affect the price of gasoline, refinery profits, and overall
3More details on the refining process can be found in section 2 and in appendix A.
4Knittle et al. (2008) and Espey (1996) both study the recent changes in consumers? price
elasticity of demand for gasoline.
2
welfare.5 I also estimate how a change in the price sensitivity of consumers may
affect the results of these shocks, particularly in regards to the division of welfare
between the refiner and the consumer.
I estimate a fully dynamic model of the oil refining industry incorporating key
decisions made by plants which affect both contemporaneous and future profitabil-
ity. The refining industry is inherently forward-looking and decisions made today
rely heavily on forecasts of future market conditions. A static model would not, for
example, account for the increased breakdown potential of a plant from high utiliza-
tion rates or the appropriate long-term investments of a refiner facing rising crude
oil costs and uncertain demand. My estimation algorithm involves classic policy
function iteration nested inside a GMM optimization, which allows me to compute
the equilibrium value and policy functions.6 This approach allows me to run various
counterfactual experiments and determine the optimal policy and future discounted
profits of each firm. Several recent papers, including Bajari et al. (2007) and Ryan
(forthcoming), estimate dynamic models of firm behavior using a 2-step method that
reduces the computational complexity of finding the structural parameters, but does
not allow one to compute the equilibrium under counterfactual environments.
My model predicts that a 20% increase in the price of crude oil is only partially
passed on to consumers, resulting in a 13% increase in gasoline prices, lower profits
for the refinery, and a 45% decrease in total welfare. The pass-through result is
fairly close to the historic rate of about 50%.7 Consumer surplus falls following the
5I define total welfare to be the sum of consumer surplus and refiner profit.
6See Rust (2008).
7See Borenstein and Shepard (1996) and Goldberg and Hellerstein (2008) for related literature
on price pass-through.
3
shock, but the change in the overall distribution of welfare depends on the sensitivity
of consumer demand to the prices of refined products. More sensitive consumers
sacrifice less and receive a larger share of the (smaller) surplus. I also show that
a disruption to refining capacity, such as the one caused by Hurricane Katrina in
2005, raises gasoline prices by almost 16% and has a small negative effect on overall
welfare: the higher profits of operating refineries partially offset the large reduction
in consumer surplus. When Hurricane Katrina hit the Gulf Coast in August 2005,
the actual wholesale gasoline price rose by 14% the following month.
Much of the literature on retail gasoline markets has focused on the asymmetric
response of gasoline prices to crude oil shocks, the so-called rockets and feathers
phenomenon (for example, see Borenstein (1997), Bacon (1991), and Noel (2007)).8
Recent research on the wholesale gasoline market includes Hastings et al. (2008),
which analyzes wholesale prices and the effects of new environmental regulations,
and studies by The Government Accountability Office (2006), the Federal Trade
Commission (2006), and the Energy Information Administration (2007).
Tomyknowledge, thisisthefirstdynamicmodeloftheUSoilrefiningindustry.
Refiners play an important role as an intermediary between upstream crude suppliers
and downstream retail markets. A complete analysis of the oil industry must account
for the important effects of the refiners? dynamic decisions. I show that the model
fits the data well and can be used to generate insights into the pass-through of
crude oil shocks and the impacts of refinery shutdowns on consumers. The model?s
8The market power gained by the refining industry due to a tight capacity environment is one
potential explanation. Others include search costs in the retail market, inventory management by
consumers who may fill their tank more frequently as prices rise, but are less eager to ?top-off?
when prices are falling, and adjustment costs at the refinery.
4
main features include a dynamic decision process, long-term investment choices,
and the possibility of plant break-down. The framework could be applied to other
energy markets as well as industries, such as shipping, that make large investments
in capacity based on expectations of future market conditions.
The remainder of this paper is organized as follows. In section 2, I provide an
overview of the oil refining industry to better understand the complicated problem
facing the refiner. I describe my data in section 3 and lay out a dynamic model of
the industry in section 4. Section 5 provides the details of my empirical strategy
and I summarize the fit and results of the model in section 6. Finally, in section 7,
I use my estimated parameters to run several counterfactual experiments involving
shocks to the price of crude oil, refining capacity, and consumers? price elasticity of
demand. Section 8 concludes and provides a discussion of potential extensions.
1.2 The US Oil Refining Industry
The oil industry is broadly comprised of several vertically oriented segments.
They include crude oil exploration and extraction, refineries which distill crude
oil into other products, pipeline distribution networks, terminals which store the
finished product near major cities, and tanker trucks which transport products to
retail outlets.9 The largest refined product, gasoline, accounts for about 50% of
total production, while distillate makes up another quarter. A full 68% of output
from the oil refining industry is used in the transportation industry. Figures 1.1
975% of terminals in the US are owned by companies not involved in the upstream exploration
and refining.
5
and 1.2 provide a description of the production process and average product yields.
The main distillation process produces some final products like gasoline, but it is
complemented by other units that extract more of the highest valued products.
Technical details of the refining process and background on the types of crude oil
available can be found in the appendix.
Figure 1.1: Production Process
The market for refined oil products is large and growing, with the US consum-
ing 388 million gallons of gasoline each day and one quarter of the world?s crude
oil.10 Aside from refining crude oil into gasoline, refineries produce many products
that are important inputs into other industries. Retail gasoline prices have recently
experienced increased variability in the US and in summer 2008 hit an all time high
of $4.11 per gallon. Wholesale prices peaked around $3.40 a gallon in the same pe-
riod.11 Many justify the high prices as a result of the growing demand for gasoline
10Annual world consumption of crude oil totals 30 billion barrels, of which 7.5 billion barrels
comes from the US. About 60% of crude oil used by refineries is imported and US consumption of
refined gasoline represents 40% of world consumption.
11US regular gasoline, source: EIA.
6
October 2007
Other
9%
Residual Fuel Oil
4%
Petroleum Coke
5%
Gasoline
46%
Distillate
27%
Kerosene Jet Fuel
9%
Figure 1.2: Average Yields
and supply limitations, including the scarcity of crude oil, Middle East uncertainty,
hurricanes, and the OPEC cartel. Others claim the high prices result from coordi-
nated anticompetitive behavior by big oil companies. It may be that the strategic
capacity investment and utilization choices by oil refineries play a significant role in
affecting downstream prices, profits, and consumer welfare.
1.2.1 Competition
Concentration
The refining industry is fairly competitive, with 144 refineries owned by 54 refining
companies in January 2006. About one-half of US production occurs near the Gulf
of Mexico in Texas and Louisiana, though there are significant operations in the
Northeast, the Midwest, and California. During World War II, the country was
7
divided into Petroleum Administration for Defense Districts (PADDs) to aid in the
allocation of petroleum products. Figure 1.3 displays a map of refinery locations
along with delineations of PADDs and PADD districts. PADDs are often used by
regulators such as antitrust authorities when assessing market concentration. See
table A.2 in appendix D for concentration ratios and Herfindahl-Hirschman Indices
(HHIs) for various PADDs and regions at the refiner level. The degree of market
concentration is clearly dependent upon how one defines the relevant geographic
market.12
PADD
Distric
t
A
rea
I
1
East Coast
2
M
idwest
3
U
pper Midwest
4
C
entral Plains
5
L
ouisiana
6
T
exas
7
New Mexico
8
R
ockies
V
9
West Coast
II
IIIIV
Figure 1.3: Refinery Locations (Scaled by Capacity)
Market Definition
While retail markets for gasoline tend to be very small, markets for wholesale gaso-
line are relatively large due to the extensive pipeline network use to transport most
refined products. While a PADD may have roughly approximated a market in 1945,
these delineations were made before the pipeline network had been fully developed,
12At the national level, the top four refiners (who each own multiple refineries) controlled 44.1%
of the market in 2007. The HHI for refiners on the Gulf Coast was about 1,100, which would be
classified as moderately concentrated according to the Horizontal Merger Guidelines.
8
so they are now just a convenient way to report statistics on the industry.13 A map
of major crude oil and production piplines is shown in figure 1.4. With important
pipelines connecting the Gulf Coast production center to the population centers
in the Northeast and the Midwest, I combine PADDs 1, 2, and 3 into one large
market for wholesale gasoline. I denote the Rocky Mountain region, PADD 4, as
another market, because it is isolated from the rest of the country and imports only
limited refined product from other regions. Finally, my third market is the West
Coast, PADD 5, which includes California, a state that, due to strict environmental
regulations, is limited in its ability to use products that are refined in other states.
Figure 1.4: Major Refined Product Pipelines
Aside from the domestic refining industry, US refiners face limited competition
from abroad. While the US is very dependent on foreign oil, domestic production
accounts for about 90% of US gasoline consumption, though the import share has
13For instance, the Colonial pipeline, which runs from the Gulf Coast up to the Northeast, was
built in 1968. Pipelines now carry 70% of all refined products shipped between PADDs.
9
grown since the mid 1990s. These imports come primarily into the Northeast, which
receives 45% of its supply from sources, such as the US Virgin Islands, the United
Kingdom, the Netherlands, and Canada. Recent US regulations limiting certain
types of fuel additives combined with increased European dependence on diesel fuel
has limited the ability of US markets to rely on foreign imports.
1.2.2 Capacity and Utilization
Capacity utilization rates at US refineries have been steadily rising and are
now at their maximum sustainable levels. From 2000 to 2008, the average utilization
rate in US manufacturing industries was 77%, while in the refining industry it was
91%.14 At the same time, no new refineries have been built in the US since 1976.
In fact, many plants have closed and the number of refineries has fallen from 223 in
1985 to just 144 today. However, most of these closures were small and inefficient
plants, and those that remain have expanded, so total operable capacity has grown
from 15.6 million barrels per day (bbl/day) in 1985 to almost 17 million bbl/day
today. However, this figure is lower than in 1981, when capacity was 18.6 million
bbl/day. The overall number of refineries along with their production capacity are
displayed in figure 1.5. The average plant size has increased from 74,000 bbl/day in
1985 to almost 124,000 bbl/day in 2007.
Building a new refinery is very expensive, and environmental requirements
and permits create significant hurdles.15 Evidence from a 2002 US Senate hearing
14See http://www.federalreserve.gov/releases/G17/caputl.htm.
15One of the few new plants in development is in Yuma, Arizona. The builder of the 150,000
bbl/day refinery has spent 30 million dollars over 6 years to acquire all the permits. If not blocked,
10
14.014.515.015.516.016.517.017.518.018.519.019.5
19
8
2
19
8
3
19
8
4
19
8
5
19
8
6
19
8
7
19
8
8
19
8
9
19
9
0
19
9
1
19
9
2
19
9
3
19
9
4
19
9
5
19
9
6
19
9
7
19
9
8
19
9
9
20
0
0
20
0
1
20
0
2
20
0
3
20
0
4
20
0
5
20
0
6
20
0
7
Year
No. of Refineries
100150200250300350
Capacity (M BBL / Day)
Total Refining Capacity Number of Refineries
Figure 1.5: Capacity and Number of Refineries
estimatedthecost of buildinga250,000bbl/day refinery ataround 2.5billiondollars,
with a completion time of 5-7 years (Senate (2002)). This assumes the various
environmental hurdles and community objections are satisfied. No one wants a
dirty refinery operating near them.16 In May 2007, the chief economist at Tesoro,
Bruce Smith, was quoted as saying that the investment costs in building a new
refinery are so high that ?you?d need 10 to 15 years of today?s margins [at the time,
around 20%] to pay it back.?17 Even without new refineries, existing refineries have
invested to expand capacity. The distribution of historical investment rates is shown
in figure 1.6. While the mean investment has been 1.3% per year, the median is zero
construction on the new refinery will begin in 2009.
16Commonly referred to as ?NIMBY,? an acronym for Not In My Back Yard.
17The National Petrochemical & Refiners Association estimates that the average return on in-
vestment in the refining industy between 1993-2002 was 5.5%. The S&P 500 averaged over 12%
for the same period. See ?Lack of Capacity Fuels Oil Refining Profits? available online at http:
//www.npr.org/templates/story/story.php?storyId=10554471 (downloaded: 09/13/2008).
11
as plants tend to make very infrequent investments. Even restricting the sample to
non-zero changes as shown in the graph, investments tend to be small, with almost
85% of the non-zero changes less than 10%.
?100 ?80 ?60 ?40 ?20 0 20 40 60 80 100
0
50
100
150
200
250
300
Investment (Percent)
Frequency
Density
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Figure 1.6: Non-Zero Changes in Capacity, All Plants, 1986-2007
Although oil refining has historically been an industry plagued by thin profit
margins, oil producers are now starting to make higher profits from their refining
business. One simple measure of the profit margin at a refinery is the ?crack spread.?
For every barrel of crude oil the refinery uses, technological constraints require that
about half of it goes into gasoline production and about a quarter into distillate. So
the crack spread, expressed in dollars per barrel, is calculated as:
Crack = 1?Price(distillate) + 2?Price(gasoline)?3?Price(crude oil)3 .(1.1)
12
The crack spread along with the utilization rates of refineries are shown in figure
1.7. The crack spread hit a record high of nearly $30 per barrel in July 2006. Some
argue that based on this measure of profitability, it is surprising that more refiners
have not overcome the setup costs and entered this industry. The increase in the
crack spread after 2000 occurred after the utilization rate had already been at a
very high level. This may imply that a refiner?s ability to pass through their crude
oil cost has changed since 2000, perhaps due to the scarcity of crude oil, an increase
in industry concentration, or an increase in the demand for gasoline.
75%80%85%90%95%
100%
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
Year
Utilization Rate
05101520253035
Crack Spread ($/Bbl)
Utilization Rate Crack Spread
Crack Spread = 1*Heating Oil+2*Gas-3*Crude J
uly of each year, 2006 Dollars
Figure 1.7: Capacity Utilization Rate and Crack Spread
While total refining capacity has risen in the past 10 years, it has not kept
up with demand growth. Capacity of oil refiners has increased by 10% in the past
10 years, while demand for gasoline has increased about 17%. The gap has been
filled by higher utilization rates and, to a lesser degree, growing imports. New
13
regulations requiring the shift from MTBE18 oxygenates to ethanol poses a problem
for this segment of supply because foreign refiners have not invested in the facilities
to produce ethanol blended gasoline. With capacity tight and supply alternatives
limited, even a minor supply disruption (or a major one like Hurricane Katrina) can
have a large price impact.19
1.2.3 Refinery Maintenance and Outages
An oil refinery is a complex operation that requires frequent maintenance,
ranging from small repairs to major overhauls.20 The regular maintenance episodes
tend to be short and have minimal impact on production as they are strategically
scheduled for low demand periods. Unplanned major outages, by definition, can
take place at any time and can have a major impact on production capability. The
EIA divides refinery outages into four classes, summarized in table 1.1.
Table 1.1: Refinery Downtime
Type Typical Length of Outage Frequency
Planned Shutdowns 1-2 Weeks Every year
Unplanned Shutdowns 2-4 Weeks -
Planned Turnarounds 3-9 Weeks Every 3-5 years
Emergency Shutdowns Varies -
Source: EIA.
18Methyl Tertiary Butyl Ether.
19Following Hurricane Katrina on 9/23/05, capacity fell by 5 MBbl/Day. This represented a full
one third of US refining capacity. Inventories are also limited as there is only about 20-25 days
worth of gasoline in storage at any time.
20Refinery maintenance is crucial not only for production sustainability, but also for the safety
of the plant. A 2005 fire at BP?s Texas City refinery killed 15 workers and injured over 100 more.
14
Planned turnarounds are major refinery overhauls, while planned shutdowns
bridge the gap between turnarounds. Unplanned shutdowns involve unexpected
issues that may allow for some strategic planning of the downtime, but often may
force a refinery to reduce production sub-optimally. Finally, emergency shutdowns
are those that cause an immediate plant breakdown like a refinery fire.
Organization for planned turnarounds typically start years in advance, and
cost millions of dollars to implement, in addition to the revenue lost from suspending
production. Due to the hiring of outside personnel, major refineries often have to
plan these turnarounds at different times because of the shortage of skilled labor to
implement them. Given the typical seasonal variation in product demand, the ideal
periods for maintenance are the first and third quarter of the year, though in some
northern refineries, cold winter weather forces shifts in planned downtimes.
Eventhoughrefineriesconsistofseveralcomponents, suchasdistillationcolumns,
reformers and cracking units, these components are dependent on one another so a
breakdown of any one component can affect the production capability of the entire
refinery. Downstream units include hydrocrackers, reformers, fluid catalytic cracking
(FCC) units, alkylation units, and coking units. They are responsible for breaking
down hydrocarbons into more valuable products and removing impurities such as
sulfur. For example, in a typical refinery, only 5% of gasoline is produced from the
primary distillation process; the rest comes hydrocrackers (5%), reformers (30%),
FCC and alkylation units (50%), and coking units (10%). Not all refineries have all
of these components, so such refineries are even more affected when one component
goes down (EIA (2007)).
15
At the PADD level, EIA reports that in the 1999-2005 period, refineries expe-
rienced reductions in monthly gasoline and distillate production of up to 35% due
to outages. At the monthly frequency, there is little effect of outages on product
prices. This is primarily because most (planned) outages occur during the low-
demand months when markets are not tight; most outages last less than a month;
and the availability of imports, increased production from other refineries, and in-
ventories provide a cushion to supply. However, major outages, like those caused
by a hurricane, still affect the downstream prices and profitability of all refineries.
Overall, the oil refining industry features several economic puzzles, some of
which I explore in this paper. While the industry is relatively competitive, refiners
have recently been earning significant profits, as measured by the growing crack-
spread. However, entrants have yet to overcome the regulations and costs of setting
up a new plant and existing firms have been cautious in their expansion. As a
result, plants run at high rates of utilization, which leads to instability in the face
of unexpected capacity disruptions.
1.3 Data
The EIA publishes data on the oil refining industry at various frequencies
and levels of aggregation.21 I observe monthly district level data, which is publicly
21Although monthly plant level data is collected from individual refineries on EIA form 810, this
data remains proprietary and unavailable to academic researchers. A new program, joint with the
National Institute for Statistical Sciences (NISS), called the NISS-EIA Energy Micro Data Research
Program, mayallowaccesstothisdata(http://www.niss.org/eia/niss-eia-microdata.html).
The dataset includes monthly observations for all refineries in the US on production, capacity,
utilization, and inputs into production. The program is currently on hold.
16
available on EIA?s website.22 For every month in the years from 1995 to 2006, and
for each of the 9 refining districts, I have the following data:
? Wholesale gasoline production, sales, and prices.
? Wholesale distillate production, sales, and prices.
? Crude oil first purchase price and inputs into refineries.
? The capacity utilization rate.
This provides 1,296 observations. I also have annual firm level data for the same
years on the capacity to distill crude oil. The reported capacity, called the atmo-
spheric crude oil distillation capacity, measures the number of barrels of crude oil
that a refinery can process through the initial distillation process. This measure is
calculated on a stream-day basis.23
There are 246 unique plants in the dataset, with 179 active in 1995 and 144
active in 2006. Overall, I observe a total of 1,959 plant-year observations. Table
1.2 summarizes the data by district and indicates the market definitions I use in my
estimation. The number of plants and aggregate capacity are for January 2006.
Proceeding with the district level data on production and utilization combined
with capacity at the firm level requires some discussion. Implicitly, I must make the
22See http://tonto.eia.doe.gov/dnav/pet/pet_pnp_top.asp. There are 9 refining districts,
including the East Coast, the Midwest, the upper Midwest, the Central Plains, Louisiana, Texas,
New Mexico, the Rockies, and the West Coast.
23Capacity reported in barrels per stream-day equals the maximum number of barrels of oil that
a refinery can process on a given day under optimal operating conditions. Calendar-day capacities
assume usual rather than optimal operating conditions, though these two numbers are frequently
reported as identical.
17
Table 1.2: Industry Summary
Market District States No. Plants Ref. Cap. (Mbbl)
1 1 CT, DE, DC, FL, GA, ME, 14 659
MD, MA, NH, NJ, NY, NC,
PA, RI, SC, VT, VA, WV
1 2 IL, IN, KY, MI, OH, TN 14 913
1 3 MN ND, SD, WI 4 171
1 4 IA, KS, MO, NE, OK 8 306
1 5 TX 23 1,812
1 6 AL, AR, LA, MS 27 1,353
2 7 NM 3 42
2 8 CO, ID, MT, UT, WY 16 232
3 9 AK, AZ, CA, HI, NV, OR, WA 35 1,220
144 6,709
strong assumption that all firms within a district are identical and respond the
same way to shocks. When aggregating to the district, one firm that increases
production may be cancelled out by another that breaks down. Thus, results from
this approach will be meaningful only in terms of assessing the ?average? behavior of
a firm within a district. However, there is significant variation in district production
levels as well as in the breakdown episodes described below. Also, aggregating to the
district level when I estimate my model avoids having to account for the complicated
linear programming problem faced by an individual refinery. These idiosyncratic
differences should be smoothed out in the higher level data.
18
1.4 Model
Firms make annual investments to increase or decrease their available capacity.
I assume these investments increase or decrease capacity immediately and that firms
then choose their utilization rates each month. While empirically, some plants make
major investments in capacity that take years to complete, the average investment
is small and can be completely quickly.24 Though plants require a certain minimum
level of maintenance each year (usually carried out just before the summer driving
season), running a plant at a high utilization rate in one month increases the proba-
bility of a plant breakdown or an extended maintenance episode in the next month.
Thus, faced with relatively high product prices or low crude oil input prices (a high
refining margin or crack spread), firms may want to run their plants at a high rate of
utilization to maximize profits. However, this intensive use of capital may increase
the possibility of a breakdown next month when prices may be even higher.
I model the competitive environment by assuming that plants are price-takers
in the market for crude oil but are Cournot competitors with some (small) market
power in the downstream refined products market. Since I do not observe plant level
production choices, the model is best described as a representative-agent Cournot
model. In each period, a firm optimally chooses its utilization rate in response to
its estimate of the aggregate production of its competitors.
With the development of a network of pipelines across the US after World
War II, markets tend to be large and feature many firms producing a homogeneous
24These small investments, known as capacity creep, include both additional infrastructure and
improved through-put of existing capital.
19
product. Firms are differentiated not only by their capacity to turn crude oil into
gasoline and other products, but also by their technical capabilities to utilize varying
types of crude oil in their production. I focus on the capacity differentiation and
average firm behavior to smooth over the technical production heterogeneity.
1.4.1 A Firm?s Problem
Consider the problem of firm i in month m.25 I will focus only on gasoline and
distillate production by refineries, since these account for about three-quarters of
the production of an average refinery. Denote production of gasoline and distillate
as qgim and qdim, and the capacity of the refinery as qiy, where y indexes the current
year. Given the investment behavior of firms, I assume that investments in capacity
are made only once per year and the resulting capacity is fixed for the entire year.
Let riy denote the investment of the firm, expressed as the proportional increase or
decrease in capacity.
A firm?s problem can be written as:
Max{riy}?y=0 E
bracketleftBigg ?summationdisplay
y=0
?y?iy(riy;xiy)
bracketrightBigg
, (1.2)
?iy = Max{uim}12m=1 E
bracketleftBigg 12summationdisplay
m=1
?m?1piim(uim;xim,qiy)
bracketrightBigg
. (1.3)
I assume capacity evolves according to:
qiy = qi,y?1(1 +riy), (1.4)
25I assume that firms are individual plants and use the two terms interchangeably.
20
where riy is net of any depreciation of existing capital. The utilization rate can be
expressed as:
uim = qimq
iy
, (1.5)
where qim = qgim + qdim. While this is not a classic utilization rate, in that it does
not assess the proportion of available inputs that are actively being used, technical
constraints on the proportion of total capacity that can be used to produce gasoline
and distillate makes this ratio approximately a scaled down version of the actual rate.
piim(?) is the per-period profit function, xim and xiy are vectors of state variables,
and ? and ? are the discount rates, with ? = ?12. Note that qiy appears as a state
variable in equation 1.3 and equals last year?s capacity plus or minus the investment
made at the beginning of the current year. Throughout a given year, state variables
observable to the firm include the following:
Pcjm The price of crude oil
Bim An indicator equal to 1 if the firm is in a breakdown episode
Q?i,m The estimated aggregate competing production by other firms in the market
qiy A firm?s capacity
Time Month & year
I explicitly include a district j index on the crude oil price because, while I assume
this price is exogenous, there are differences in the quality and price of oil in different
21
districts. The competing production state is needed to calculate the price of a firm?s
output. With the large number of firms in the industry, each firm has only a small
impact on the prices of gasoline and distillate.26 Firms form a statistical forecast of
competing production as follows:
E[Q?i,m] = Q?i,m?1(1 +gm), (1.6)
where gm is the historical growth rate of production in the market between months
m ? 1 and m. The month of the year is included to capture the obvious and
important seasonal effects. For example, a refinery operator may forgo preventative
maintenance measures during the summer high-demand period to capitalize on the
high prices and profit margins. The expectation operator is taken over the future
profile of the state variables, some of which are deterministic (month and year),
others of which evolve according to the firm?s choices (capacity and breakdown),
and still others are stochastic, for which firms base their expectations on historical
values (the crude price and competing production).
Due to breakdowns, only a portion of qiy will be available in a given month. I
denote the available capacity as q?iy. Because the numerator in equation 1.5 is the
volume of downstream products and the denominator is the number of barrels of
crude oil that a refinery can distill, the utilization rate may be greater than 1 in
some cases. This occurs because chemicals called blending components are added
in the distillation process (such as oxygenates like MTBE and ethanol).
26With plant-level production data, I could explicitly solve for the (asymmetric) Cournot equi-
librium in each period. I plan to adopt this approach in future research.
22
Note that the firm?s objective function can be written recursively. Denote
V(?) to be the present discounted value of the stream of refiner?s profits with opti-
mal choices. Then, after dropping subscripts and discretizing the state space, the
Bellman equation can be written:
V(x) = Maxr
braceleftBig
?(r;x) +?
summationdisplay
xprime
V(xprime)P(xprime|x,r)
bracerightBig
. (1.7)
Here P(?) is the annual probability transition matrix and it reflects the transition
between average annual values of the state variables. To solve for ?(r;x), I apply
backward induction from December back to January. For example, the expected
value of a refiner?s aggregate discounted profit from July onward is:
W6 = Maxu6
braceleftBig
pi6(u6;x6,q) +?
summationdisplay
x7
W7(x7)P?(x7|u6,x6,q)
bracerightBig
. (1.8)
Here, P?(?) is conditional on u and q because plants that do not invest in new
capacityandchoosetooperatemoreintensivelyincreasetheirprobabilityofbreaking
down.
1.4.2 Per-Period Profit
Prices are determined at the market level, which I index by k. Per-period profit
is defined as gasoline and distillate revenue less production costs and investment
23
costs. Thus, in month m, profits of firm i are:
piim(uim;Pcjm,,Bim,Q?i,m,qiy,m,y) = uimq?iy[(yieldg)Pgkm(Qgkm;m,y) (1.9)
+ (1?yieldg)Pdkm(Qdkm;m,y)]
? Cim(uim;Pcjm,q?iy)
? 112Criy(riy),
where,
q?iy =
?
???
???
qiy if Bim = 0
?qiy if Bim = 1.
(1.10)
The term yieldg represents the proportion of available capacity that can be
distilled into gasoline. It is fixed over time and across firms. Functional forms for
the demand and cost functions will be specified below. The last term in the profit
function is the investment cost, which is spread equally across the 12 months of a
year. Note that ? ? [0,1) reflects the percentage reduction in capacity that a refinery
experiences during a breakdown. While I allow this term to vary stochastically, the
data suggest this value averages around 0.9 and can fall as low as 0.7. In other
words, district level breakdowns occur that result in a 30% reduction in capacity
relative to normal levels. It should be noted that a 25% capacity reduction in a
given month could result from one week of complete breakdown and three weeks of
optimal operation.
24
1.4.3 Demand
The prices of gasoline and distillate are determined at the ?market? level.
The three markets defined earlier are: the East Coast, Midwest and Gulf Coast; the
Rocky Mountain region; and the West Coast. The first is by far the largest, with
several large pipelines connecting the major production area near the Gulf of Mexico
with the population centers on the East Coast and in the Midwest. I estimate the
demand for wholesale gasoline (and similarly for distillate) according to:
log Qgkm(Pgkm) = ?g0 +?g1(Month) +?g2(Year) +?g3(log Pgkm ?Year) +epsilon1gkm(1.11)
Pg and Qg are the price and sales of wholesale gasoline. Here I specify a log-linear
demand equation with month and year fixed effects to account for the strong seasonal
variation and the growth in demand over time. I allow the price elasticity of demand
to vary by year to account for the changes in the sensitivity of consumers to prices.
Note that the East Coast receives a significant amount of their refined product
from abroad (mostly from Europe and the Caribbean). Imports increase in periods
of high demand or tight supply, as the price must be high enough to justify the
transportation costs. Thus the demand for refined products from US refineries may
be affected by the availability of imports, though robustness checks reveal that the
effect is small relative to the size of the East Coast?s overall market (which includes
the Midwest and Gulf Coast).
25
1.4.4 Probability of Breakdown
Consider the following specification for the likelihood of a plant breakdown or
extended period of maintenance beyond the regular minimum level:
Pr(breakdown in month m) = F(?ui,m?1) = exp(?0 +?1ui,m?1)1 +exp(?
0 +?1ui,m?1)
,(1.12)
which assumes the probability follows the logistic distribution. The same specifica-
tion is used to model the likelihood that a plant recovers from a breakdown next
period, conditional on being broken down this period. With more detailed firm-
level data, an ordered probit may be the ideal specification, as it would account for
both the magnitude and length of the breakdown episode. Modeling the breakdown
dynamics based solely upon last month?s utilization rate, and not, say, the average
rate over the last six months, is primarily a computational simplification. The re-
sults using only last month?s utilization rate are robust to other specifications.27 See
below for how I define a breakdown using district-level production data.
1.4.5 Production and Investment Costs
I assume the following production cost specification:
Cim(uim;Pcjm,q?iy) = ?0 ?qim +?1 ?q2im +?2 ?qim ?Pcjm, (1.13)
27Specifications involving the prior 3-month average rate or last month?s deviation from historical
rates yielded similar results. With firm-level data on production, one could also include the age of
the refinery and perhaps the length of time since the last significant maintenance period.
26
where qim = uimq?iy, the firm?s actual production of gasoline and distillate in the
current month.
I assume firms face increasing costs as they near their capacity constraint.
To model this, I suppose firms have a quadratic production cost function and also
include a term, ?2, reflecting the major input of the refiner, crude oil. Refiners take
this crude oil price as exogenous since the price is determined on the world market.
As firms produce near their capacity, they may face increasing costs due to less
time for maintenance, excess wear on their capital, and other effects that raise their
marginal costs.
Investments in capacity are available immediately, and capacity is fixed within
the year. This is a strong assumption since firms likely make investment decisions
far in advance and spread the costs over a long time period. In future work, I will
relax this assumption, allowing for a one-year ?time-to-build.? Investments come at
a cost:
Criy(riy) = ?3(qi,y?1riy) +?4(qi,y?1riy)2. (1.14)
The parameters, ?3 and ?4, reflect the cost of capacity expansion. They embody
both the cost of physical expansion and any regulatory costs faced by the plant.
Unfortunately, I will not be able to differentiate these two components with cur-
rently available data. Note that the investment cost parameters reflect the cost of
a change in the number of barrels of a capacity that is created or destroyed. Large
plants may benefit from economies of scale in capacity expansion as compared with
27
smaller plants, but since I am estimating my model for an average capacity firm,
this consideration is not necessary.
1.5 Empirical Estimation Strategy
In general, I split the estimation into two stages. I first estimate the demand
parameters, (?g0,?g1,?g2,?g3,?d0,?d1,?d2,?d3), via GMM. This is a static relationship be-
tween the market price and quantity. I also estimate the logit parameters governing
the probability of breakdown, (?0,?1), via maximum likelihood.
In the second stage, I take the demand and breakdown coefficients as given
and solve the firms? dynamic utilization and investment choice problem using a
nested fixed-point GMM algorithm to recover the cost parameters (?0,?1,?2,?3,?4)
for each market. I allow for the cost parameters to vary each year to reflect changes
in technology over time. I assume an annual discount rate of ? = 0.95, implying a
monthly rate of ? = 0.996. When a firm enters a breakdown episode, I assume their
capacity is reduced by a random amount, ?, which follows a beta distribution with
mean 0.9.28
The firms? dynamic problem can be thought of as a finite-horizon monthly uti-
lization choice problem nested inside an infinite-horizon annual investment choice
problem. The annual investments in capacity can raise or lower the optimal utiliza-
tion rate throughout the year, (e.g., a larger investment allows for the same level of
28Formally, ? ? B(9,1).
28
output with a lower level of utilization). Recall that the problem can be written:
Max{riy}?y=0 E
bracketleftBigg ?summationdisplay
y=0
?y?iy(riy;xiy)
bracketrightBigg
, (1.15)
?iy = Max{uim}12m=1 E
bracketleftBigg 12summationdisplay
m=1
?m?1piim(uim;xim,qiy)
bracketrightBigg
. (1.16)
The aggregate discounted profits of the firm over the course of the year becomes the
per-period (annual) payoff of the investment choice problem. Given the frequency
with which refiners adjust their capacity and their utilization rate, this modeling
strategy is not only realistic, but it is computationally appealing. Solving the finite
horizon problem in equation 1.16 is simply a matter of backward induction.
The state variables available to the firm are the same in both sub-problems,
aside from the month of the year, which is only relevant in the utilization choice
problem. For the annual investment choice, the firm considers the average values of
last year?s crude oil price and market production, the proportion of time the refinery
was broken down in the last 12 months, and the current level of capacity.
1.5.1 Demand
The demand parameters, the ??s, are estimated in the first stage using 2-stage
least squares with appropriate instruments. Given the endogeneity of P and Q, I
need to find instruments, Zkm, that are correlated with the price, Cov(Pkm,Zkm) negationslash=
0, and unrelated to error term, Cov(epsilon1km,Zkm) = 0.29 An obvious cost shifter in
29Essentially, I need cost shifters that move around the supply curve to trace out a demand
curve.
29
the oil refining industry is the price of crude oil, which should be exogenous as it?s
determined in the world market. However, it is likely that the market for crude oil
and the market for refined products are both subject to the same demand shocks,
which invalidates the contemporaneous crude oil price as a good instrument.
Therefore, I instrument for the price of wholesale products with the lagged
crude oil price, indicators of supply disruptions (such as those caused by hurricanes
and pipeline outages), and the inventories of gasoline, distillate, and crude oil. These
are industry-wide inventories, not just at the refinery. These should all be related to
the price of a refiner?s products though unrelated to the downstream demand. I can
use the R2 from the first stage to test for the correlation between my instruments
and the endogenous price. Since I have instrumented for price in the first stage, in
the second stage I regress the log of Qkm on the fitted log price, along with year and
month fixed effects.
1.5.2 Breakdown Probability
The parameters of the breakdown logit, ?0 and ?1, are estimated by maximum
likelihood. This is done separately for estimating the likelihood that a breakdown
occurs and for the likelihood that a plant recovers once broken down. I define
a ?breakdown? in district j as a month when the observed utilization rate ujm
(published by EIA, reflecting gross inputs of crude oil divided by the capacity to
30
distill crude oil) drops below ujm, defined as:
ujm = min
braceleftBigg
1
9
9summationdisplay
i=1
uim, 14
4summationdisplay
i=1
uj,m?12i
bracerightBigg
.
So the threshold is the smaller of the contemporaneous average across all districts
and the average of the selected district?s production in the same month for the last
4 years. So a breakdown is only triggered when 1) a district is producing relatively
less than all other districts in the current month, and 2) the district is producing
relatively less than it has historically in the same month. Figure 1.8 displays the
breakdown dynamics for districts that experience a breakdown. The plots show that
districts that run their plants more intensively in one month are more likely to break
down the following month.
Once a breakdown episode is started, a district may stay below the threshold
for a period of months. The data show that median episode length is 1 month, the
mean is 2.3 months, and the maximum is 15 months.30
1.5.3 Production Cost Parameters
The cost parameters, (?0,?1,?2,?3,?4), are estimated by GMM in the second
stage dynamic optimization. In order to solve for the production and investment
cost parameters, I need to solve a dynamic optimization problem. To achieve this,
30The 15 month episode occurred in district 9 (the West Coast) from February 1999 - May 2000.
It resulted from two California refinery fires at the Tosco Refinery in Avon on 02/23/99 and at
the Chevron Refinery in Richmond on 03/25/99. The fall in gasoline production from these two
fires was only 7% but due to California?s strict environmental standards for gasoline, shipments
from other (less regulated) districts were impossible so prices rose by about 25%. This implies a
demand elasticity for retail gasoline of ?0.28.
31
0.7 0.75 0.8 0.85 0.9 0.95 10.7
0.75
0.8
0.85
0.9
0.95
1
Utilization Rate in the Month t
Utilization Rate in the Month t+1
All Months
45
Breakdown Months
Figure 1.8: District Breakdowns
32
I first discretize the state space, which includes deterministic time states. The
transition probability for the crude price is found using the empirical distribution of
its historical series. The transition probabilities between breakdown states depend
on the choice variable in the previous period according to the logit estimation done
in the first stage. In a given year, the transition matrix for months reflects moving
from one month to the next with certainty. Therefore, I can simplify the analysis by
taking advantage of the cyclic nature of the month state. This dramatically reduces
the computational time; see Rust (forthcoming). Further details of the estimation
algorithm can be found in appendix C.
For a candidate parameter vector, I iterate on the policy function until con-
vergence. I then interpolate the policy function on the actual states in my data
and estimate the utilization rate for each district-month observation. Since the op-
timization is preformed at the firm level, I aggregate to the market level and form
the following moments:
M1 = J?1
summationdisplay
j
(umj ? ?umj)
M2 = N?1j
summationdisplay
i
(rijy ? ?rijy)
where ?umj is the average utilization rate in district j and month m and ?rijy is
the estimated investment rate by firm i located in district j in year y. I average
the utilization rate moments over districts and the investment rate moments over
firms and then stack them to form a moment vector: M(?) = (M1,M2)prime. I then
33
numerically solve the following problem:
Min?
braceleftBig
M(?)prime??1M(?)
bracerightBig
, (1.17)
where ? is the variance-covariance matrix of the moment vector. With estimated pa-
rameters in hand, I estimate the standard errors of the cost estimates using Hansen?s
GMM estimator of the VC matrix. Given the matrix G of numerical derivatives,
where (for parameter k and moment l)31,
Glk = Ml(?k)?Ml(?k)?
k ?1%
, (1.18)
I can then compute:
VC(?) = 1N(Gprime??1G)?1. (1.19)
1.6 Results
1.6.1 Model Fit
I first assess the fit of the dynamic model by plotting actual and estimated
values of key variables in figure 1.9. This is an in-sample analysis and shows that,
on average, the estimated values match the data fairly well. Prices are estimated
very precisely due to the flexibility gained by including monthly fixed effects. The
estimated utilization rate is more variable than the actual rate though the month-
31For a 1% window, I perturb the parameter by 0.5% above and below the estimate.
34
1996 1998 2000 2002 2004 20060
20
40
60
80
100
Year
Gas Price ($/Bbl)
1996 1998 2000 2002 2004 20060
20
40
60
80
100
Year
Distillate Price ($/Bbl)
1996 1998 2000 2002 2004 20060
0.5
1
1.5
Year
Utilization Rate
1996 1998 2000 2002 2004 2006?5
0
5
10
15
Year
Firm Capacity Investment (MBbl)
1996 1998 2000 2002 2004 2006150
200
250
300
350
400
Year
Aggregate
Production (MBbl)
1996 1998 2000 2002 2004 2006?10
0
10
20
30
Year
Crack Spread ($/Bbl)
Actual
Estimated
Figure 1.9: Model Fit (In Sample)
35
to-month fluctuations are approximated well. The model does not do as well at
predicting the level of investment because firms tend to make lumpy investments
every few years instead of updating their plant continuously. This means the median
investment in any given year is zero and the reduced variation makes identification
more difficult.
2006 2007 20080
20
40
60
80
100
Year
Gas Price ($/Bbl)
2006 2007 20080
20
40
60
80
100
Year
Distillate Price ($/Bbl)
2006 2007 20080
0.5
1
1.5
Year
Utilization Rate
1 2 3 4 5 6 1 2 3 4 5 6?5
0
5 2006 2007
Refining Districts
Firm Capacity Investment (MBbl)
2006 2007 2008
100
200
300
400
Date
Aggregate
Production (MBbl)
2006 2007 2008?20
0
20
40
Year
Crack Spread ($/Bbl)
Actual
Simulated
Figure 1.10: Model Fit (Out of Sample)
Finally, though the model tracks the movements in the crack spread very well,
it tends to predict a value that is below the actual spread. This occurs because the
estimated prices of gasoline and distillate are also biased down, because I do not
account for inventories in my model. Since a small portion of refinery production is
36
stored, my estimates of downstream demand are biased up, which pushes down the
estimated prices.
In figure 1.10, I do an out-of-sample test of the model, where I use the pa-
rameter estimates based on data through 2006 and simulate the investment and
utilization policy of firms in 2007. The predicted prices of gasoline and distillate are
close to the data for the beginning of 2007 but then begin to deviate. This pattern,
also shown in the crack spread plot, is partially a result of unprecedented levels of
the price of crude oil in 2007. The model predicts that refineries should optimally
respond to these high input prices by cutting their utilization rate to drive up their
product prices and maintain their profit margin.
1.6.2 First Stage Estimates: Demand and Breakdown
Tables 1.3 and 1.4 present the results of the first stage demand and breakdown
estimations. Most of the demand coefficients are significant at the 1% level and
have the expected signs. The monthly fixed effects estimates show the peak in
gasoline demand during the summer months and distillate toward the fall. The
elasticity estimates show a growing sensitivity to wholesale gasoline prices over the
years. These estimates are higher than those reported for retail gasoline in other
studies (see Knittel (2008)). However, unlike the branded retail product, wholesale
gasoline is very homogeneous and downstream buyers can more easily substiute to
a competing supplier. Also, the ability to store gasoline at terminals would imply
the wholesale elasticity should be higher than the retail estimate. The R2 from the
37
first stage regression of price on the instruments is 0.87.
The logit estimation of breakdown reveals an increasing probability of break-
down as a refiner runs the plant more intensively. Estimating the probability of
breakdown next period conditional on being broken down this period reveals that
refiners with more severe breakdowns are less likely to recover in the next period.
1.6.3 Second Stage Estimates: Costs
The cost coefficients are generally significant and reflect a production cost
function that is increasing and convex. I display the cost functions at the average
values of the estimates in figure 1.11 and report all estimates in appendix D, table
A.3. The cost functions show that firms in market 2, the isolated Rocky Mountain
region, are the most sensitive to production changes and have the highest overall
production costs. Market 1 enjoys relatively easy access to crude supplies in the
Gulf region and has the lowest production costs. The curvature of the production
cost functions shows that refiners face increasing marginal costs as they approach
the limitations of their capacity. I use a constant crude oil price of $50/bbl in my
estimated production cost function.
The estimates of investment cost functions reflect an almost linear relationship,
with the quadratic term often insignificant. While the figure shows the average
investment costs over time, table A.3 displays the increase in expansion costs that
refiners have faced in recent years. The Senate?s (2002) estimated cost of building
a new 2,700 barrel/day refinery was about $27 million. I estimate the cost of the
38
Table 1.3: Demand Estimates
Coefficient Std. Err. Coefficient Std. Err.
-0.27 0.44 5.20*** 1.73
2.51*** 0.60 -0.99 2.43
2.64*** 0.64 0.93 2.70
3.49*** 0.62 0.90 2.55
3.08*** 0.56 -1.38 2.28
3.44*** 0.58 -1.82 2.35
3.18*** 0.67 -0.19 2.81
3.19*** 0.63 0.55 2.62
3.59*** 0.61 -0.09 2.47
3.72*** 0.65 1.58 2.75
3.65*** 0.65 0.52 2.84
3.55*** 0.65 1.33 2.96
2.84*** 0.61 1.24 2.86
0.05*** 0.01 0.04 0.04
0.11*** 0.01 0.13*** 0.04
0.17*** 0.01 0.18*** 0.04
0.22*** 0.01 0.14*** 0.04
0.25*** 0.01 0.14*** 0.04
0.25*** 0.01 0.11*** 0.04
0.28*** 0.01 0.27*** 0.04
0.21*** 0.01 0.33*** 0.04
0.19*** 0.01 0.39*** 0.05
0.13*** 0.01 0.26*** 0.04
0.11*** 0.01 0.13*** 0.04
-0.81*** 0.13 -1.79*** 0.57
-0.81*** 0.14 -2.25*** 0.64
-1.07*** 0.13 -2.27*** 0.59
-1.03*** 0.12 -1.73*** 0.52
-1.08*** 0.12 -1.50*** 0.53
-0.91*** 0.14 -1.76*** 0.63
-0.93*** 0.13 -2.02*** 0.58
-1.06*** 0.12 -1.90*** 0.54
-1.04*** 0.13 -2.25*** 0.61
-0.96*** 0.12 -1.80*** 0.59
-0.88*** 0.12 -1.82*** 0.58
Log(P)*Year '06 -0.69*** 0.10 -1.74*** 0.53
Gasoline Distillate
Parameter
Constant
Year '95
Year '96
Year '97
Year '98
Year '99
Year '00
Year '01
Year '02
Year '03
Year '04
Year '05
Year '06
Month 2
Month 3
Month 4
Month 5
Month 6
Month 7
Month 8
Month 9
Month 10
Month 11
Month 12
Log(P)*Year '95
Log(P)*Year '96
Log(P)*Year '97
Log(P)*Year '98
Log(P)*Year '99
***, **, * Significant at the 1%, 5%, and 10% level respectively. Dependent variables: log of gasoline and 
distillate sales. First stage regression of price on hurricane and pipeline disruptions, lagged crude oil price, 
and stocks of crude oil, gasoline and distillate.
Log(P)*Year '00
Log(P)*Year '01
Log(P)*Year '02
Log(P)*Year '03
Log(P)*Year '04
Log(P)*Year '05
Table 1.4: Breakdown Probability Estimates
Parameter Coefficient Std. Err. Coefficient Std. Err.
Constant (?
0
) -2.40*** 0.44 0.91** 0.45
Utilization
t-1
(?
1
)
0.74 0.62 -4.03*** 0.67
Conditional on Breakdown
Maximum likelihood estimates.  ***, **, * Significant at the 1%, 5%, and 10% level respectively.  Dependent variable = 
breakdown indicator.
Conditional on No Breakdown
39
0 5 100
200
400
600
800
1000
1200
Production (MBbl)
Cost (Millions of Dollars)
0 0.5 10
2
4
6
8
10
12
14
Capacity Investment (MBbl)
Cost (Millions of Dollars)
Market 1
Market 2
Market 3
Figure 1.11: Estimated Production and Investment Cost Functions
same size expansion at around $10 million, further evidence that expanding existing
sites is more cost-effective than building a new plant.
1.6.4 Policy Function
In figure 1.12, I plot the optimal policy function over the course of a year
at the average values of the other state variables. The optimal utilization rate
increases during the late winter and early spring but then falls off around April
and May, before rising again to a peak in August. A likely explanation is that
refiners, anticipating the high demand summer driving season in July and August,
scale back operations in the late spring to prevent the possibility of a breakdown
occurring during the peak. This pattern is replicated in most markets and years.
Figure 1.13 displays the optimal policy function in 3-dimensional space, varying by
40
both the month of the year and the crude oil price. It shows that refiners cut back
production when the oil price rises, a competitive response to a rising input price.
The pattern across months is replicated at each crude oil price.
1 2 3 4 5 6 7 8 9 10 11 120.45
0.5
0.55
0.6
0.65
0.7
Month
Optimal Utilization Rate
Breakdown = No
Breakdown = Yes
Figure 1.12: Optimal Utilization Rate Versus Month
1.7 Counterfactuals
With a fully estimated dynamic model of the US oil refining industry, I can
now use the model to determine the effects of various shocks that may occur. There
are many interesting questions that could be examined with my model given the
importance of oil refining in US and global energy markets. I focus on three stylized
facts that I believe to be particularly important in the following analysis: crude
oil prices are rising to unprecedented levels; there is little to no excess capacity in
the oil refining industry; and end-use consumers of refined products are becoming
41
0 2 4
6 8 10 12
10
20
3040
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
MonthCrude Price
Utilization Policy
Figure 1.13: Optimal Utilization Rate Versus Month and Crude Price
increasingly sensitive to the prices they face (See Knittel et al. (2008)). Elasticities
may be changing due to the availability of other fuels or because of changing per-
ceptions of the environmental impact of oil usage (see figure 1.14). As a result, I
will consider 2 experiments:
1. What are the effects of an increase in the crude oil price and how do the results
change when the demand for refined products is more elastic?
2. What are the effects of a fall in available capacity and how do the results
change when the demand for refined products is more elastic?
42
1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
0
0.5
1
1.5
2
2.5
Year
Absolute Elasticity
Gasoline
Distillate
Figure 1.14: Price Elasticity of Demand
1.7.1 Methodology
Both counterfactuals are based on the coefficients and policy functions from
2006, the most recent year in my data. I shock the crude oil price in May to
determine the effects throughout the peak demand summer months. The shock is
permanent and I compute the average effects throughout the remainder of the year.
I shock capacity in August to approximate the effects of a late summer hurricane
hitting the Gulf of Mexico. I compute impacts assuming both the actual estimated
elasticity in 2006 and an elasticity that is higher by 2.5% (in absolute terms) for
both gasoline and distillate. Even this small increase in the sensitivity of consumers
is enough to induce a dramatic response.
In my sample, the maximum observed real crude oil price is around $70/bbl.
However, as shown in figure 1.15, crude oil prices have been driven to record levels
43
2005 2006 2007 200840
50
60
70
80
90
100
110
120
Year
Real Crude Price ($/Bbl)
Figure 1.15: Crude Oil Price
more recently, exceeding $115/bbl (in real 2006 dollars). Thus, I simulate the effects
of a 20% increase in the price of crude oil to determine the impact on prices of
gasoline and distillate and the resulting crack spread. Since the price elasticity of
demand is one of the parameters estimated in the first stage and it influences the
per-period payoff of the firm, I must solve my model at each new elasticity estimate.
The optimal policy functions change as a result. Since the crude oil price is a state
variable, I extrapolate my policy functions to the new crude prices.
About one-half of the US refining capacity is located on the Gulf of Mexico.
Major hurricanes like Katrina and Rita in 2005, and more recently, Gustav and Ike
in 2008, reduced US oil refining capacity by 25% to 35% and had a major impact
on downstream prices and refiners? profit margins (see figure 1.16). Therefore, in
my second counterfactual experiment, I simulate the effects of a 25% reduction
44
65%70%75%80%85%90%95%
100%
1
2
3456789
1
0
1
1
1
2
Month
Utilization Rate
05101520253035
Crack Spread ($/Bbl)
Utilization Rate Utilization Rate (Gulf) Crack Spread Crack Spread (Gulf)
Katrina
Figure 1.16: Loss in Capacity: Hurricane Katrina
in capacity on downstream prices, the crack spread, refiner profits, and consumer
welfare.
1.7.2 Results of Experiments
The effect of a 20% increase in the price of crude oil (from 2006 prices) is shown
in figure 1.17 and summarized in table 1.5. Note, the price and crack spread changes
in the table are the average changes relative to the baseline prediction following the
shock for the remainder of the year. The changes in surplus, profit and welfare are
based on totals for the remainder of the year following the shock. The graphs in
figure 1.17 show the future path of product prices, the utilization rate, and the crack
spread through the remainder of the year.
45
1 2 3 4 5 6 7 8 9 10 11 12
40
60
80
Month
Crude Price
2 4 6 8 10 12
60
80
100
Month
Gas Price
Actual Elasticity
2 4 6 8 10 12
0
0.5
1
Month
Utilization Rate
2 4 6 8 10 12
0
10
20
30
Month
Crack Spread
2 4 6 8 10 12
60
80
100
Month
Gas Price
More Elastic
2 4 6 8 10 12
0
0.5
1
Month
Utilization Rate
2 4 6 8 10 12
0
10
20
30
Month
Crack Spread
Actual
Counterfactual
Figure 1.17: Crude Oil Counterfactual: Simulation
Table 1.5: The Effect of a 20% Increase in the Crude Oil Price
Actual More
Percent Change Elasticity Elastic
Gasoline Price 12.7 10.2
Distillate Price 8.1 6.7
Crack Spread -10.8 -30.1
Consumer Surplus -58.3 -34.1
Refiner Profit -37.1 -70.8
Total Welfare -45.2 -49.7
46
The first column of graphs corresponds to the actual estimated elasticity (in
2006) and the second column of graphs assumes more sensitive demand estimates.
The price of gasoline and distillate both rise following the crude oil price shock,
though the price increases do not cover the entire cost increase as refiner profits
fall after the shock. The amount of the increase that can be ?passed through? to
consumers appears to vary over the year. The crack spread graph reflects this, as
it shows that although refiners are immediately hurt by the crude oil shock, they
recover during the summer months by reducing their utilization rates before the
spread falls again in September with weaker product demand.
Table 1.6: The Effect of a 25% Loss in Capacity
Actual More
Percent Change Elasticity Elastic
Gasoline Price 15.9 3.0
Distillate Price 9.8 2.0
Crack Spread 47.9 11.9
Consumer Surplus -69.0 -17.6
Refiner Profit 15.4 -4.8
Total Welfare -11.1 -11.3
Comparing the two levels of demand sensitivity, we see that refiners are less
able to pass on the crude price increase to more sensitive consumers, and thus their
crack spread is dramatically reduced immediately following the shock. In addition
to analyzing the effects on prices and profit margins, it is interesting to calculate the
distribution of welfare between consumers and refiners. Total welfare declines by
45% in the months following the shock. According to table 1.5, overall welfare falls
47
for both the actual and more sensitive elasticity estimates, although more sensitive
consumers end up with a larger share of the surplus following the shock.
1 2 3 4 5 6 7 8 9 10 11 12
40
45
50
55
60
65
Month
Capacity
2 4 6 8 10 12
60
80
100
120
Month
Gas Price
Actual Elasticity
2 4 6 8 10 12
0
0.5
1
Month
Utilization Rate
2 4 6 8 10 12
0
10
20
30
40
50
Month
Crack Spread
2 4 6 8 10 12
60
80
100
120
Month
Gas Price
More Elastic
2 4 6 8 10 12
0
0.5
1
Month
Utilization Rate
2 4 6 8 10 12
0
10
20
30
40
50
Month
Crack Spread
Actual
Counterfactual
Figure 1.18: Capacity Counterfactual: Simulation
Figure 1.18 and table 1.6 display the results of my second counterfactual ex-
periment, in which I reduce the size of the average refinery by 25%. Again, the
table shows the average response to the shocks and figure 1.18 shows the longer-
term effects for different levels of demand sensitivity. My counterfactual assumes
that all refiners are hit equally hard by the shock, though in reality, some plants
close completely while others operate even more intensively following events like
Katrina.
48
The impact of the shock on the crack spread depends strongly on the demand
elasticity. With the crude oil price the same in both cases and the percentage in-
creases in the prices of gasoline and distillate about five times higher at the actual
elasticity, the refiners facing more sensitive consumers benefit immediately follow-
ing the shock, though the longer-term crack spread is higher for the less sensitive
consumer group. Utilization rates change only slightly following the shock and the
real cost is borne by consumers in the form of gasoline prices, which rise by almost
16%, reducing consumer surplus by 69%.
In terms of the distribution of welfare, the overall pie decreases by about the
same amount in both cases, but at the actual elasticity, the increase in profits at
operating refineries partially offsets the loss in consumer surplus. However, the more
sensitive consumers retain a larger proportion of welfare following the shock. It?s
important to note that my measure of total welfare puts equal weight on consumer
surplus and refiner profit and makes no consideration for the variability of prices
faced by consumers. Given the economy?s extraordinary reliance on gasoline, an
extra dollar per gallon paid at the pump may hurt consumers more than it helps
refiners.
1.8 Conclusion
In this paper, I have developed and estimated a new dynamic model of the
US oil refining industry. Energy markets, and in particular, the production and
distribution of gasoline, are a hot topic in both academic research and the popular
49
media. While the focus has tended to be on the upstream supply of crude oil (from
both foreign and domestic sources) and the downstream retail stations, relatively
little attention has been given to the role that oil refiners play in the industry. My
analysis helps clarify and quantify the crucial role of the refiners in the transmission
of crude oil and capacity shocks into downstream product prices, refiner profits, and
consumer surplus.
The model matches the historical data and provides reasonably good out-of-
sample predictions of key variables. I show that refiners are only partially able to
pass through crude oil shocks to consumers and this ability varies across months of
the year. As consumers have become more sensitive to changes in the price of gaso-
line, refiners face an even tougher competitive environment. Capacity disruptions,
such as those caused by hurricanes, increase industry profits because the resulting
price increase outweighs the loss in profits caused by reduced production. The effect
on overall welfare is negative, though fairly small because the large loss in consumer
surplus is partially offset by a rise in refiner profits.
My analysis not only models the behavior of refiners and the role they play
in an important energy market, it also may have policy implications regarding opti-
mal environmental regulations. In conversations with refiners, I found that current
regulatory policies regarding both the building of new plants and the expansion of
existing sites is the main hurdle that managers face when making their investment
decisions. Regulatory policies have, at the very least, contributed to the current sit-
uation where capacity is tight and small shocks can have large effects. Realizing the
importance of production flexibility in the refining industry means that new poli-
50
cies must balance responsible environmental concerns with incentives for capacity
investment to meet the growing demand for refined products.
There are many extensions to this work that could provide further insights
into the industry, though some require access to plant-level data which the EIA is
considering making available. While this paper only addresses the production and
investment decisions of active firms, including the possibility of exit may improve the
model. Firms would likely follow a cut-off rule, exiting if the expected discounted
stream of future profits fell below some critical level. Another potentially important
determinant of firm behavior in this industry is a refiner?s relationship with upstream
crude oil producers. Currently, 60% of refiners are part of an integrated oil company,
and although they benefit from a consistent supply of their major input, they are also
constrained by having to exhaust their partner?s stream of crude oil before seeking
other, potentially more cost-effective sources. Independent refiners tend to invest
in technologies that allow them to utilize different types of crude oil more flexibly,
though may suffer relatively more when there is a supply disruption. Modeling the
decisions of each type of refiner and the interaction between the two could help
clarify the role of these vertical relationships. I leave these extensions for future
work.
51
Chapter 2
Consumer Search for Online Drug Information
2.1 Introduction
There is a growing availability of medicinal drug information on the internet. A
consumer seeking this complicated information faces the additional hurdle that the
providers, e.g., drug companies, government regulators, and informational websites,
all may have different incentives for providing accurate and unbiased information.
While consumers formerly relied on their doctor as the primary source of information
about the drugs they were taking, now they increasingly turn to the internet.1
Use of the internet worldwide doubled between 2004 and 2008.2 When con-
sumers go online, they are more likely to start with a search engine as the number of
internet users accessing a search engine grew 69% between 2002 and 2008.3 Thus, it
is clear that search engines like Google and Yahoo are an important gateway to the
internet. Also between 2002 and 2007, spending on Direct To Consumer Advertist-
ing (DTCA) for prescription drugs by pharmaceutical companies doubled, with a
small but growing portion of the online spending via banner ads and paid search
1?In 2007, 56% of American adults ? more than 122 million people ? sought information about
a personal health concern from a source other than their doctor, up from 38%, or 72 million people,
in 2001.? (HSC August 2008). According to another survey, ?approximately 40% of respondents
with internet access reported using the internet to look for advice or information about health or
health care in 2001.? (JAMA 2003).
2http://www.allaboutmarketresearch.com/internet.htm. 0.757 billion in May 2004 com-
pared to 1.463 billion in June 2008.
3 http://pewinternet.org/pdfs/PIP_Search_Aug08.pdf. Pew Internet and the American
Life Project (2008)
52
advertising.4 The pharmaceutical company GlaxoSmithKline spent $2.5 billion dol-
lars on advertising in 2007, of which $29 million (1.1%) was online spending.5 A
policy initiated by the Food and Drug Administration (FDA) in 1997 allowed de-
tailed drug information to move to the internet with only essential side-effects and
information provided in a television advertisement. The following year, DTCA on
television more than tripled.6
My goal is to determine how consumers search for information and what char-
acteristics of their query may determine how they navigate through the engine?s
results. This analysis focuses on the click behavior of consumers using AOL?s inter-
net search engine. I look at searches for brand name prescription drugs and those
for consumer electronics as a comparison group. There are many reasons that con-
sumers search and, like drug queries, a search for an electronics product may be
motivated by a desire for product information which may lead to a purchase de-
cision. Restricting to a specific group of products also allows me to define search
sessions, discussed below, which are more difficult to determine in the entire universe
of search queries.
Within drug queries, I analyze the effects of DTCA, drug age, and other drug
characteristics (such as drug class) on consumer search. Given that consumers
search, I also analyze how they do it: how in-depth (length of a search session,
number of clicks, session time) and which types of links they click (extensions,
ranks). I also analyze the different drill-down behavior between drug and electronics
4Source: TNS Media Intelligence.
5Source: www.Adage.com. Note this does not include paid search advertising.
6Television DTCA increased from $168 million in 1997 to $613 million in 1998.
53
searches. This is the frequent practice by users of submitting a query, processing
the results, which may include clicking on one or more links, and then revising their
initial search query.
I focus on drug-related search because typical consumers have limited infor-
mation about the drugs that they are taking or are thinking about taking. The
information also has many dimensions such as efficacy, side-effects, and interactions
with other medications. Consumers face a wide variety of information sources both
online and o?ine. My study complements the analysis in Day (2006), which inves-
tigates how consumers process and understand drug information via o?ine DTCA,
though I only consider their initial search. As a result, consumers? understanding
of the information they find is only relevant for this study in how it affects the way
they search. For example, if consumers are frequently unsuccessful in finding the
information they seek on dot-com sites, they may be more likely to click on other
extensions in future search sessions.
The remainder of this paper is organized as follows. In section 2, I provide a
description of the data which includes click-through data from AOL, drug informa-
tion from the FDA, and advertising data from TNS Media Intelligence. Sections 3
and 4 include a descriptive and regression analysis on which types of search results
are popular with consumers and how DTCA affects online search behavior, both in
terms of the frequency and the intensity of search. Section 5 concludes and provides
some directions for future work.
54
2.2 Data
AOL Click-Through Data
I focus on search and click-through data from AOL which spans a period from
March to May, 2006. The data come from AOL Research, who posted the data on
the web for research purposes on August 4, 2006. Due the privacy concerns, AOL
later removed their own link to the data, but it is still available for download on
many other websites.7 The data has been used to study several topics including
the determinants of search and how social networks could improve search engine
performance.8 To ensure privacy protection, I do not use any information specific
to individual users and only report aggregate statistics in this paper. The data are
a representative sample of over 650,000 AOL users and includes an anonymous user
id, a date/time-stamp, a search query, and if the user clicked on a result, the domain
portion of the click-through URL and its rank.9 An overview of this data can be
found in Chowdhury et. al. (2006).
In this analysis, I use the term query for a search event, which may or may
not be followed by a click-through on a subsequent search result. If a user submits a
query, clicks on a result, and then returns to the same search page (e.g., by clicking
back on her internet browser) and clicks again, two observations are reported in the
dataset with the same time stamp. If a user clicks on a result on page one of the
search results (ranks 1-10), and then moves to page two, two observations are re-
7See http://www.gregsadetsky.com/aol-data/.
8See http://www.cond.org/applications/paper3.pdf and http://www.stanford.edu/
~koutrika/res/Publications/2008_wsdm.pdf.
9If a user clicked on the link www.fda.gov/drug/warnings.html, only www.fda.gov is reported.
55
ported, but with different time stamps. Only organic results and not sponsored/paid
results are included in the AOL database.
Drug Information
To create the database of queries, I use the FDA?s Orange Book, which includes all
drugs that have been approved by the FDA and attributes of each. These include
the drug?s age (years since FDA approval), drug class (16 broad classes), drug type
(prescription, over-the-counter (OTC), or discontinued), and an indicator if drug is
the Reference Listed Drug (RLD).10 I select queries appearing in the AOL database
that contain an FDA brand name somewhere in the query (i.e., it may appear among
other terms). Of the 23,390 drug brand names appearing in the FDA Orange Book,
514 appear in the AOL database and account for 65,038 queries.
Advertising Data
I also gather data on DTCA for each drug in the sample. I have monthly data from
1994 through 2008 from TNS Media Intelligence. In 2008, the data include 327
drugs and the advertising expenditure is broken down by media type. Figures 2.1
and 2.2 display the growth of DTCA over time and the distribution of 2008 spending
across media types. The growth of total DTCA is clearly evident and although TV
and magazine advertising accounts for over 95% of total expenditures, spending on
the internet is a new and growing outlet. TNS only reports internet ad spending on
display or banner ads which appear, for example, across the top of many websites
and some search engines. It does not include spending on sponsored/paid search
10A drug is an RLD if it is used as the chemical standard when generic versions of the drug are
developed. New drugs have to be ?bio-equivalent? to the RLD to gain approval by the FDA.
56
results which is reported to be twice the size of display ad spending.11
$4,000.00?
$5,000.00?
$6,000.00?
s
Total?Annual?Direct?to?Consumer?Advertising?(All?Rx?Drugs)
TV Magazines Internet Newspapers Radio
Internet
Radio
$?
$1,000.00?
$2,000.00?
$3,000.00?
194195196197198199200201202203204205206207208
Mill
io
n
s
?
of
?
D
o
l
lar
s
Year
TV
Magazines
Newspapers
Figure 2.1: Total DTCA Spending on all Prescription Drugs
Internet?2.9%
Newspapers?2.1%
Radio?0.4%
Outdoor?0.1%
2008?DTCA?By?Media
TV?62.6%
Magazines?32.0%
Figure 2.2: DTCA Breakdown by Media Type
Electronics
For electronics queries, I combined lists from consumer reports on popular electronics
11?Gap Widens in Online Advertising,? The Wall Street Journal, September 4, 2008.
57
product and brand names with a list of manufacturers from tigerdirect.com, a major
seller of consumer electronics. This resulted in 804 potential consumer electronics
queries, of which 126 appear in the AOL database and account for 509,833 queries.
Generating Search Sessions
One challenge with analyzing search behavior on the internet is to group a sequence
of potentially changing queries and click-throughs together to form a search session.
Grouping identical queries together is frequently insufficient because users often
revise their queries throughout a session. Therefore, I consider the following three
approaches for defining a search session:
1. A sequence of queries with or without click-through such that the query is
identical and the time between queries is less than one hour. The query needs
to contain one of the drug brand names or electronics product words, but it
may also include other words. However, the overall query may not change
within a session which means the list of search results that the user sees is not
changing. I use this definition for determining the popularity and transitions
between website extensions and ranks.
2. A sequence of queries with or without click-through such that two adjacent
queries are in the same session if any of the words appearing in the first query
also appear in the second query. The time between queries is less than one
hour.12 I use this definition for the ?All Queries? column of figure 2.1.
12There is a potential weakness in this definition. The three queries, ?flights to Europe?, ?dis-
count flights to London?, ?hotels in London? would all be classified in the same session though it
is likely that the intent of the search changed in third query.
58
3. A sequence of ?query-topics? with or without click-through where a keyword
(such as a drug brand name) appears in all queries though other words may
appear and change throughout the session. Again the time between adjacent
queries must be less than one hour. This definition captures the drill-down
behavior that users often exhibit when performing a search. I use this session
definition for all other tables and regressions in the analysis.
2.3 Descriptive Analysis
Table 2.1 displays basic descriptive statistics of the AOL database including all
queries and breakdowns for electronics and drug-related queries. Note that for the
first column, I define sessions using method two, while for the drug and electronics
sessions, I use method three. The method used for the all queries column is the
most liberal in grouping adjacent queries together in a session, which should increase
the number of multiple-query sessions. However, there are also many single-query
sessions in the overall sample, which actually results in relatively fewer multiple-
query sessions compared with drugs and electronics.
Compared with electronics sessions, drug sessions are more likely to feature
multiple clicks, are shorter in time, though are longer in the number of clicks per
session. As a result, the turn-over or average time between clicks is shorter for
drug sessions than for electronics. This may be the result of a user in a drug-related
session seeking a specific piece of information while an electronics session may involve
a user attempting to get general information about a product. Electronics queries
59
All Electronics Drug
Queries Queries Queries
Observations (Queries) 35,383,114     509,833           65,038         
Click-Throughs 19,133,334     281,557           44,885         
Num. Session 16,548,366     245,988           28,679         
Users 651,559        110,261           17,459         
Unique Query-Topics 126                  514              
Mean Users Per Query-Topic 875.09             33.97           
Mean Queries per Session 2.14              2.07                 2.27             
Mean Session Length (Minutes) 2.71              3.55                 3.00             
Mean Time Between Queries 2.38              3.32                 2.36             
Multiple Query Sessions
Num. Session 6,232,843       95,298             12,442         
Proportion Mult Query Session 38% 39% 43%
Mean Queries per Session 4.02              3.77                 3.92             
Mean Session Length (Minutes) 7.19              9.16                 6.90             
Mean Time Between Queries 2.38              3.31                 2.36             
Basic Statistics: AOL User Data
Table 2.1: Basic Statistics
are also dominated by several very popular terms as on average there are 875 users
searching each query topic compared with only 34 for drug-related searches.
Table 2.2 is a breakdown of search activity and advertising by drug class.13 The
two largest classes account for over 42% of the search sessions but only 26% of the
advertising spending. The lack of strong correlation between advertising and search
is surprising if television ads (the largest component of DTCA) direct consumers to
seek more information on the web. I will further investigate this relationship in the
regression section below.
Two further slices of the data appear in tables 2.3 and 2.4 which show search
and advertising activity by age and drug type respectively. Interestingly, though
there is fairly high spending for younger drugs (1-3 years old), there is also relatively
large DTCA on older drugs with the most on drugs that are 8 years old. Search
13Tables displaying the 20 most actively searched and advertised drugs in the sample can be
found in tables B.1 and B.2 in the appendix.
60
Class Num. Of Num. Of Num. Of Mean QueriesAd Spending
Drug Class Num. Drugs Sessions Queries Per Session (Millions)
central nervous system agents1 90                7,065       17,040          2.41                     $666.70
psychotherapeutic agents 2 33                5,080       11,670          2.30                     $220.17
metabolic agents 7 37                2,752       5,869           2.13                     $474.76
anti-infectives 5 74                2,560       5,437           2.12                     $92.69
cardiovascular agents 8 50                2,192       4,378           2.00                     $32.24
miscellaneous agents 3 21                1,825       5,189           2.84                     $353.92
hormones 6 48                1,709       3,996           2.34                     $344.22
antineoplastics 9 36                1,299       3,113           2.40                     $61.10
respiratory agents 10 21                1,283       2,514           1.96                     $238.51
gastrointestinal agents 11 23                1,213       2,083           1.72                     $417.57
topical agents 4 49                1,014       2,215           2.18                     $452.17
coagulation modifiers 12 7                  460          1,040           2.26                     $110.21
not applicable 16 18                95            280              2.95                     $0.10
nutritional products 14 5                  91            151              1.66                     $0.45
immunological agents 13 2                  41            65                1.59                     $0.17
biologicals 15 -               -          -               $0.00
Total 514              28,679     65,040         2.27                    $3,464.98
Search Activity by Drug Classes 
Ad spending is total expenditure on all forms of DTCA in 2005.
Table 2.2: Search Activity by Drug Class
activity is fairly evenly spread among the different aged drugs though most activity
is, again, on the 8 year old subset. The breakdown by drug type reveals far more
activity on prescription drugs and even those classified as discontinued as compared
with OTC drugs.14 Non-innovator drugs, or drugs that are not designated as the
RLD, receive a large share of both search activity and slightly more advertising
spending.
Next, I analyze the search activity in the sample by looking at the popularity
and transitions between various website extensions and ranks. Figures 2.3 and 2.4
display the percentage of clicks on each extension class of website and the percentage
on clicks (within the first 10 clicks of a session) on each search result rank. I see
that users in drug sessions click on relatively fewer dot-com results compared with
electronics related searches. As expected, there is more attention paid to dot-gov
and dot-org/net/info sites and this continues to grow in longer sessions (not shown).
14The advertising data only includes spending on prescription drugs (hence the $0 for OTC
advertising spending), though in some cases I see spending on an OTC drug that has the same
trade name as a prescription drug.
61
Num. Of Num. Of Num. Of Ad Spending
Age Drugs Sessions Queries (Millions)
<1 35            609              1,368                $1.02
126            1,636           3,434                $442.72
230            1,171           2,668                $393.88
331            2,062           4,437                $287.30
425            1,064           2,481                $110.56
536            1,438           2,954                $263.13
625            1,555           3,503                $35.76
730            1,693           4,364                $8.73
831            2,771           6,706                $589.13
938            2,352           4,724                $317.41
10 23            1,285           3,262                $249.17
11 18            708              1,588                $115.27
12 17            1,110           2,344                $8.70
13 17            1,349           2,938                $323.53
14 16            1,424           2,976                $95.25
15 9              271              484                   $5.30
16 6              280              561                   $0.77
17 8              131              310                   $0.22
18 8              570              1,273                $0.52
19 8              164              398                   $0.32
20 9              566              1,389                $108.24
21 8              164              314                   $0.61
22 1              384              1,127                $0.00
23 3              123              258                   $0.00
24 3              316              867                   $0.20
>24 53            3,483           8,312                $107.23
Total 514          28,679         65,040             $3,464.98
Search Activity by Drug Age
Ad spending is total expenditure on all forms of DTCA in 2005.  Age is 
equal to years since FDA approval to March 2006.
Table 2.3: Search Activity by Drug Age
The rank popularity figure shows that attention by rank in drug sessions is less
skewed toward the number one ranked sites. Users are more likely to click further
down in the search results. The spike in the rank one popularity for electronics
sessions is mostly driven by navigational searches (e.g., a search for apple.com and
62
Num. Of Num. Of Num. Of Ad Spending
Type Drugs Sessions Queries (Millions)
Prescription 411             23,467           52,796         $3,163
Over-the-Counter 10               584               1,289           N/A
Discontinued 93               4,628             10,955         $302
Non-Innovator/RLD 267             18,023           41,214         $1,915
Innovator/RLD 247             10,656           23,826         $1,550
Search Activity by Drug Type
Ad spending is total expenditure on all forms of DTCA in 2005.  An innovator is the 
original developer (pioneer) of the drug.
Table 2.4: Search Activity by Drug Type
50.0%
60.0%
70.0%
80.0%
90.0%
o
f
?
Cl
i
c
k
s
0.0%
10.0%
20.0%
30.0%
40.0%
com gov edu org/net/info us/uk/ca other
Pe
r
c
e
n
t
?
o
Extension
Drugs
Electronics
Figure 2.3: Extension Popularity in the First 10 Clicks
immediate click on the first search result, www.apple.com).
I also investigated how users make the transition between extensions and ranks
in multiple click sessions. For tables 2.5 and 2.6, I generate sessions using method
one to guarantee that the set of search results seen by the user on each click is
identical. The table shows that transitions within extensions are less likely in drug
search sessions compared with electronics sessions, though they both feature ap-
proximately the same exit rates. Transitions from other extensions to dot-gov and
63
25.0%
30.0%
35.0%
40.0%
45.0%
o
f
?
Click
s
0.0%
5.0%
10.0%
15.0%
20.0%
123456789101+
Pe
r
c
e
n
t
?
o
Rank
Drugs
Electronics
Figure 2.4: Rank Popularity in the First 10 Clicks
Drug Queries
org/net/ us/uk/
com gov edu info ca other exit
com 52.1% 2.9% 1.2% 9.6% 1.6% 3.0% 29.6% 100%
gov 41.2% 14.2% 1.2% 11.5% 1.5% 1.8% 28.6% 100%
edu 36.6% 3.1% 13.9% 12.3% 2.3% 4.4% 27.3% 100%
org/net/info 40.6% 3.6% 2.2% 20.7% 2.0% 2.9% 28.1% 100%
us/uk/ca 37.1% 3.3% 1.8% 11.7% 11.1% 4.8% 30.3% 100%
other 40.9% 1.6% 1.8% 9.0% 2.5% 17.1% 27.1% 100%
Electronics Queries
org/net/ us/uk/
com gov edu info ca other exit
com 61.0% 0.1% 0.5% 3.9% 2.0% 1.9% 30.6% 100%
gov 31.9% 20.8% 3.1% 10.2% 2.9% 1.5% 29.5% 100%
edu 36.5% 1.1% 22.8% 6.3% 3.6% 3.5% 26.3% 100%
org/net/info 42.8% 0.4% 1.1% 21.6% 2.4% 3.2% 28.5% 100%
us/uk/ca 49.0% 0.2% 1.2% 5.2% 14.8% 3.0% 26.7% 100%
other 39.7% 0.1% 0.7% 5.6% 2.6% 20.7% 30.6% 100%
EXTENSION (t+1)
EX
T
E
NS
ION 
(t)
EXTENSION (t+1)
EX
T
E
NS
ION 
(t)
These tables include the probability of transitioning from one extension to another during a search session.  All 
search sessions are included as long as the user clicked on at least one link.  A session is defined as a sequence 
of clicks following the identical query where the time between clicks is less than 60 minutes.
Table 2.5: Transitions between extensions
dot-org/net/info sites are more likely in drug searches.
The table on rank transitions reveals that users in electronics session are more
likely to find what they are looking for on the rank one result as they are more
64
Drugs
1 2 3 4 5+ exit
1 8.5% 26.6% 14.8% 9.4% 23.5% 17.1% 100%
2 11.2% 5.0% 19.6% 11.5% 26.4% 26.3% 100%
3 8.7% 6.1% 4.2% 18.2% 34.4% 28.4% 100%
4 6.8% 4.6% 4.5% 4.6% 49.0% 30.6% 100%
5+ 3.5% 2.2% 1.7% 1.9% 56.3% 34.4% 100%
Electronics
1 2 3 4 5+ exit
1 25.0% 17.4% 9.9% 5.9% 14.9% 27.0% 100%
2 12.5% 11.8% 17.6% 9.1% 21.0% 28.1% 100%
3 8.9% 6.3% 9.6% 16.6% 30.8% 27.7% 100%
4 7.0% 4.3% 4.9% 8.0% 46.0% 29.7% 100%
5+ 4.3% 2.2% 2.1% 2.0% 55.9% 33.5% 100%
These tables include the probability of transitioning from one rank to another during a 
search session.  All search sessions are included as long as the user clicked on at least one 
link.  A session is defined as a sequence of clicks following the identical query where the 
time between clicks is less than 60 minutes.
RAN
K
 (
t
)
RANK (t+1)
RAN
K
 (
t
)
RANK (t+1)
Table 2.6: Transitions between ranks
likely to revisit it immediately (rank 1 to rank 1). They are also more likely to exit
following a click on rank one. Given that sessions here are defined as unique queries,
it may also be that electronics users are more likely to reformulate/refine their query
after clicking on the rank one result which is classified as an exit. This turns out
to be the case, as shown in the analysis of the potential for query reformulation, or
?drill-down,? in figure 2.5.
Several key features can be seen in the figure depicting drill-down behavior.
Drug sessions are more likely than electronics session to involve a query followed by
a click versus a query without a resulting click. Following a query with or without
a click, users in drug sessions are more likely to issue the same query, less likely to
65
Drug?Queries
Electronics?Queries
54% 40%
47% 44%
6%
9%
33% 54%
27% 54%
14%
Query?without?Click????????
(31%/45%)
Exit
Query?with?Click???????????
(69%/55%)
Same?Query Revised?Query
19%
Percentages?represent?the?likelihood?of?each?event?conditional?on?a?query?with?(or?without)?a?click?on?the?last?
observation.????
Same?Query Revised?Query Exit
Figure 2.5: Drill Down Behavior
revise their query, and approximately equally like to exit as electronics users. If a
user submits a query and clicks on a result, they are more likely to maintain the
same query on the next click, less likely to revise and less likely to exit the search. It
appears that query revisions are an important part of search behavior, but relatively
more popular in electronics sessions compared with drug-related sessions.
Next, I turn to a more detailed analysis of consumers? search behavior where
I investigate the determinants of both the frequency and intensity of drug related
search. Without data on product attributes or advertising for consumer electronics
products, the next section will focus only on drug-related search.
66
2.4 Regression Analysis
In this section, I report regression results explaining the determinants of con-
sumers? search patterns. I look at drug-level regressions to determine how drug
attributes affect search. Then in the session-level probit regressions, I determine
how the intensity of search is affected by drug attributes and DTCA. A description
of all variables included in these regressions is shown in table B.3 in the appendix.
2.4.1 Frequency Regressions
In the following set of results, I assess how DTCA and drug characteristics
affect the frequency of search using drug-level data. I include the effects of both
overall DTCA and also each individual media category that is available. This break-
down by media is important because DTCA via different media channels may have
different effects on consumer search patterns. Television advertising, especially since
the new FDA regulations in 1997 lessening the requirements on what needs to be
conveyed during the ad, tends to only highlight the main benefits and potential side
effects of a drug. Magazine ads usually include two pages: one with the highlights
of the drug in full color and dramatic fonts, and the other with the details in fine
print. The internet ads captured in the data are so-called banner ads and would
likely have a similar effect to television DTCA with only the highlights presented.
The same could said for radio and outdoor ads while DTCA in newspapers is likely
presenting similar information to magazine ads. Therefore television, internet, ra-
dio and outdoor ads, given their lack of detailed information, may have a stronger
67
positive effect on search compared to magazines and newspapers.
Since I only observe DTCA on prescription drugs, I restrict the analysis by
excluding OTC drugs. I investigate the effects of a drug?s age on search as well as if
thedrugis designated as the RLD.Table 2.7displays the results of aregression where
the dependent variable is ?total sessions,? or the total number of search sessions I
observe in the dataset over the three-month period for a given drug. Sessions are
defined using method three, so they allow for keywords to change throughout the
session as long as the drug name appears in each query.
Dependent Variable: Total Sessions Parameter
Estimate
SE
Estimate
SE
Intercep
t
59.19***
23.10
155.30***
27.82
age
0.26
0.57
1.20**
0.56
dtca
5.17***
0.75
-
-
alltv
-
-
2.71***
0.75
allmags
-
-
0.85
0.71
allnewsp
-
-
-0.26
0.77
allradio
-
-
2.15**
0.98
outdoor
-
-
1.51
1.42
interne
t
-
-
3.74***
0.80
rld
-11.70*
8.33
-16.06**
7.86
observations R
2
Components - Stock
510
510
0.21
0.31
Notes: ***, **, * Significant at the 1%, 5%, and 10% level respectively. Drug class fixed effects included but not shown.  All advertising variables are in logs.
DTCA - Stock
Table 2.7: Regression Results - Frequency of Search
The two columns of the table each contain a different breakdown of DTCA.
SpecificationoneincludesthecumulativestockofadvertisingonadrugfromJanuary
1994 through February 2006, just prior to the time-frame of the AOL click-through
data. The second specification includes a breakdown of spending by media type. I
68
attempt to limit the endogeneity that may exist between DTCA and search activity
by including only DTCA spending prior to the period I observe the search sessions.
In addition, most DTCA is o?ine with only 3% of total DTCA in the form of online
spending.15
Overall DTCA is positive and significant meaning that increased ad spending
leads to an increase in the number of search sessions performed on a drug. Focusing
on the breakdowns by media category reveal that television, internet, and radio have
positive and significant effects, consistent with the notion that these ads provide
relatively less detailed information and may leave a consumer wanting to seek out
additional sources. Spending on outdoor advertisements is insignificant, though the
result may be misleading due to this category being the smallest of the types. Also
as expected, newspaper and magazine spending is largely insignificant which may
imply that consumers are able to find all the information they need in these ads.
The drug?s age since original FDA approval is positive in both regressions
(and significant in the second), which is somewhat surprising, but it may be driven
by a few older but very popular drugs. Finally, a drug that is the innovator or
pioneering version of a medicine reduces the number of search sessions and drug
class fixed effects (not shown in the table) are largely insignificant, though central
nervous system drugs and psychotherapeutic agents are searched upon relatively
more frequently.
In Jin and Iizuka (2005), they find that the effect of a drug?s DTCA on the
propensity of consumers to visit their doctor regarding that drug, depreciates by only
15Online spending on paid-search advertising is larger, though not included in the data.
69
about 4% per month. However, in Jin and Iizuka (2007), they find that the effect
of DTCA on the likelihood that a doctor prescribes a drug is small and depreciates
almost immediately. In table 2.8 I present the results of two additional specifications
which assess the rate at which DTCA spending depreciates in terms of its influence
on search activity.
Parameter
Estimate
SE
Estimate
SE
Intercep
t
115.30***
23.26
431.57***
118.21
age
1.77***
0.55
1.74***
0.52
dtca_1_qtrb4
3.15***
1.16
-
-
dtca_2_qtrb4
3.29***
1.39
-
-
dtca_3_qtrb4
0.76
1.35
-
-
dtca_4_qtrb4
0.65
1.09
-
-
alltv_1_qtrb4
-
-
8.04***
2.22
alltv_2_qtrb4
-
-
-6.54**
3.01
alltv_3_qtrb4
-
-
-1.37
3.10
alltv_4_qtrb4
-
-
2.64
2.30
allmags_1_qtrb4
-
-
1.25
1.24
allmags_2_qtrb4
-
-
2.02*
1.47
allmags_3_qtrb4
-
-
-0.38
1.41
allmags_4_qtrb4
-
-
-1.43
1.17
allnewsp_1_qtrb4
-
-
8.57***
2.63
allnewsp_2_qtrb4
-
-
0.60
2.18
allnewsp_3_qtrb4
-
-
-3.77*
2.34
allnewsp_4_qtrb4
-
-
2.61
2.61
allradio_1_qtrb4
-
-
-1.34
3.87
allradio_2_qtrb4
-
-
-4.48
3.69
allradio_3_qtrb4
-
-
10.24**
4.47
allradio_4_qtrb4
-
-
-2.53
3.97
outdoor_1_qtrb4
-
-
15.84***
4.40
outdoor_2_qtrb4
-
-
-8.74
8.10
outdoor_3_qtrb4
-
-
-9.17**
4.58
outdoor_4_qtrb4
-
-
12.42**
5.63
internet_1_qtrb4
-
-
3.30**
1.73
internet_2_qtrb4
-
-
0.04
2.07
internet_3_qtrb4
-
-
0.13
2.32
internet_4_qtrb4
-
-
2.63*
1.99
rld
-22.50***
7.94
-18.15***
7.54
observations R
2
Dependent Variable: Total Sessions.  Depreciation Analysis
DTCA - Quarters
Components - Quarters
510
510
0.30
0.43
Notes: ***, **, * Significant at the 1%, 5%, and 10% level respectively.  Drug class fixed effects included but not shown.  All
 advertising 
variables are in logs.
Table 2.8: Regression Results - Depreciation Analysis
The first regression specification includes overall DTCA separately for the
70
last four quarters and the second displays the results of a similar regression on the
components of DTCA. The results show that after two quarters (or six months), the
positive and significant effect of DTCA disappears. However, in the regression on
the advertising components, I see that although television and internet advertising
are very effective in the most recent quarter, the effect is zero or even negative for
quarters two through four. While magazine spending remains insignificant, spending
on newspapers in the most recent quarter is strongly positive and significant and
then fades for less recent spending.
2.4.2 Depth Regressions
In addition to investigating the influence of DTCA on search frequency, I
also present evidence on how advertising affects the intensity of search. Consumers
who are exposed to a drug advertisement on television may go to a search en-
gine for additional details about the drug or information relating to price and pur-
chase availability. This information may come from multiple sources such as the
pharmaceutical companies (e.g., pfizer.com), government sites (e.g., FDA.gov), and
advertising-driven medical information sites (e.g., webmd.com). Different forms of
o?ine advertising may affect how intensively a consumer searches these sites.
I analyzed several measures of search intensity including the number of clicks
in a search session, the length of a session in minutes, and the number of query
revisions (drill downs) preformed. One important measure, which is reported here,
is the likelihood that a search session goes beyond the first page of results. Since
71
drug information tends to be complicated and has many dimensions, consumers
may be more prone to search deep into the results to find accurate and unbiased
information about a drug, especially following an advertisement that provides very
little detailed information. Table 2.9 displays the results of a probit regression
modeling the probability that a user clicks on a result beyond the first page in a
search session.
Dependent Variable = 1 if user clicked beyond page 1 of the search results Parameter
Mean
dY/dX
z-stat
dY/dX
z-stat
age
10.000
0.0006***
2.9482
0.0008***
4.2705
dtca
-10.754
0.0005***
2.5683
-
-
alltv
-12.711
-
-
-0.0008***
-3.1519
allmags
-11.397
-
-
-0.0002
-0.7384
allnewsp
-13.419
-
-
0.0018***
5.6919
allradio
-13.561
-
-
0.0022***
5.2635
outdoor
-13.777
-
-
-0.0294
-0.0765
interne
t
-12.278
-
-
0.0006**
1.7678
rld
0.481
0.0016
0.5704
0.0016
0.6100
observations
DTCA - Prev. Quarter
Components - Prev. Quarter
Notes: ***, **, * Significant at the 1%, 5%, and 10% level respectively.  Drug class fixed effects included but not shown.  All
 
advertising variables are in logs.
28,679
28,679
percent concordant
52.1
55.3
Table 2.9: Regression Results - Depth of Search
Similar to the regressions in the last section, I report two specifications in-
cluding the effects of overall DTCA and its components, focusing on the quarter
immediately prior to the search sessions. I report the mean of the variables and
as well as their marginal effects (i.e., the predicted change in the probability for a
one unit change in the independent variable at the mean). The z-statistics are also
reported.16
16Note that to measure the fit of the model, I report the percent concordant, which is the percent
of observation pairs such that the observation with the higher ordered response corresponds to the
higher predicted response. In my sample, only about 5% of sessions involve clicks beyond page
one, so the dependent variable in my regression is unbalanced and the predicted probabilities are
skewed towards zero. Calculating a pseudo-R2 by defining a correct prediction of success as a
72
Overall DTCA has a positive and significant effect on the likelihood of a more
intense search session. Considering the breakdown by media category in the second
specification, positive and significant effects are found for newspaper, radio and
internet ads, which is the same result I found in the regressions on search frequency.
The negative effect of television ads is consistent with the notion that television
ads refer a consumer to the drug?s website for more information and this site often
appears high in the ranks. Therefore a consumer simply using the search engine
as a navigational tool to reach a predetermined page (e.g. lipitor.com) instead of
typing in the URL directly, will have a less intense search session.
2.5 Conclusion
The analysis has shown that consumers seek diverse information about pre-
scription drugs online and their behavior is influenced by the online and o?ine
advertising to which they are exposed. O?ine advertising not only increases the
likelihood that a user searches for a drug, but also increases the depth of search
within a search session. Consumers searching for drug information also behave dif-
ferently than those seeking information about consumer products like electronics.
Overall, drug sessions tend to feature more clicks on different search results and
these clicks come faster than in electronics sessions. It may be that consumers are
seeking specific information about a drug and can quickly determine if a search
result is going to provide it.
predicted probability above 0.5, as Greene and others suggest, would result in a very high measure
of fit, but only because most of the observed and predicted outcomes are zero.
73
Among the drug searches, activity is evenly spread among younger and older
drugs. Advertising spending on those drugs is slightly skewed toward younger drugs
though there is still significant spending on drugs that are 8-10 years old. Click
patterns within a search session reveal, as expected, more clicks on dot-gov and
dot-org/net/info results and the popularity of these sites grows in longer sessions.
Consumers may be immediately clicking on the first or second result (usually a dot-
com) but then will make a transition away to results with other extensions, perhaps
in an effort to seek unbiased information. The distribution of clicks by search result
rank also reveals that consumers are more likely to click on lower ranked results
further down the results page.
In the regression analysis, I analyzed the effects of DTCA on search frequency
and depth. Overall DTCA increases both the frequency and depth of search, though
the various types of DTCA (via different media), each affect search differently.
DTCA that provides only a major statement regarding a drug and few additional
details such as television and internet banner ads, increase the frequency of search.
This may be the result of the FDA regulation stating that these ads must direct
consumers to seek additional details at the drug company?s website. If they are sim-
ply using the search engine to find this site, their search session will likely be very
short and the evidence suggests this effect with a significant and negative coefficient
on television DTCA in the depth regression.
Finally, the analysis of depreciation shows that the effects of DTCA spending
disappear after about six months. Television and internet advertising have a strong
positive effect on search in the near term, though quickly fades even after just three
74
months.
Moving forward, I plan to focus on the effects of television ads, the largest
class of DTCA, on search activity using a detailed dataset from TNS which includes
the exact time and placement of an ad during a broadcast. With the growing
accessibility of laptop computers including netbooks, consumers are likely reacting
quickly to television advertisements and immediately seeking further information on
the internet. Combining this with either the AOL dataset analyzed here, or a new
dataset from comScore which also tracks household internet use, I can determine
the effects of television DTCA, including how varying demographics influence the
effects of an ad on consumer search behavior.
75
Chapter 3
Drug Information via Online Search Engines
3.1 Introduction
Search engines are the gateway to the internet as 94% of internet users access
engines to find information on the web.1 According to Nielsen Rankings, over 9.5
billion searches were executed on the top 10 search engines in the US in March
of 2009, 16.7% higher than the year before. The five largest engines by number
of searches are Google (64.2%), Yahoo (15.8%), MSN (10.3%), AOL (3.7%), and
Ask (2.1%), with Google driving most of the growth in search.2 The availability
of health care and drug information on the internet is arguably one of the more
important areas in need of study given the important public health consequences.
In this paper, I document the supply and content of this type of information on four
large search engines and across time.
Given the vast amount of information on the internet, one could study the
supply of search results related to many different industries, though I focus on
the prescription drug market. A 2008 Nielsen study found that health websites
are consumers? second most important source of medical information behind their
doctor. About 50% of the US internet population visited a health-related website
in July of 2008. In the Nielsen study, 82.6% of subjects reported having visited a
1See Ghose and Yang (2008).
2See www.nielsen-online.com.
76
website for health information at some time in the past, and a third of those used
a search engine to find the information they were seeking. Overall, drug queries
involve the potential for users seeking a wide variety of complicated information,
so the summary text and the source (domain and extension) of a search result will
likely be important determinants of a user?s attention and click behavior.
A complication that one faces when studying the supply and demand of in-
formation via a search engine is that it is a very dynamic market that is constantly
evolving. The supply (search results) influence the demand (consumer search be-
havior) and vice versa by way of the engine?s ranking algorithm, and this creates
an endogeneity problem for the analysis. Since the algorithms are proprietary, it
is impossible to know how much, for example, the rank of a search result is purely
a function of its relevance to the search query versus a function of the attention
garnered from being of a certain rank in the past. One way to mitigate this problem
is to average certain metrics across time, which I do frequently in the analysis.
There are two types of search results that appear on a search engine when a
user submits a query. Organic results are those generated by the engine?s algorithm
as being the most relevant to the user?s query. Relevance is determined differently by
each engine and may include determinants such as past click traffic and the number
of inbound links to a site from other relevant websites. The title and summary text
appearing on the search engine is determined endogenously by the engine itself.
Sponsored or paid results are those that appear (at times) above, below, and
to the right of the organic results. See Athey and Ellison (working paper) and Var-
ian (2007) for details on the auction mechanism and optimal bidding strategies for
77
sponsored results.3 Their placement is driven both by relevance and by the amount
that the advertiser has paid to be listed. The title and summary text is chosen by
the advertiser. It is often difficult to distinguish between organic and sponsored
results, undoubtedly because the search engine generates revenue from them only
when a user clicks on a sponsored result.4 I will analyze the different content and
domain extensions between the two types of results, though it is clear that sponsored
results tend to be more promotionally driven and, for drug searches, dominated by
online pharmacies. Ghose and Yang (working paper) analyze the substitution pat-
tern between organic and sponsored links for a specific website address, or Uniform
Resource Locator (URL), and generally find that there are positive and asymmetric
spillovers from one type of link to the other.5
I consider four large search engines: Ask, Google, MSN, and Yahoo. I do
not include AOL?s search engine, which has a similar market share to Ask, though
through a partnership with Google, AOL uses Google?s algorithm to generate both
their organic and sponsored links.6 Ask also partners with Google to display their
sponsored links in addition to Ask?s self-generated links.
In the analysis, I show the popularity of different website extensions, also called
top-level domains, such as dot-com and dot-gov. Consumers may choose to click
relatively more frequently on, for example, a dot-gov site in order to find accurate
and unbiased information, knowing that only the US government can register a
3See also: Edelman, et. al. (2007) and Ghose and Yang (2008).
4Sponsored results often appear with a slightly different background than the organic results
and in my experience, it is increasingly difficult to tell them apart.
5In future work, I hope to extend this type of substitution analysis to drug-queries.
6See http://www.nytimes.com/2007/04/09/technology/09iht-aol.1.5197096.html?_r=
1.
78
website with a dot-gov extension.7 These extensions are maintained by the Internet
Assigned Numbers Authority (IANA), who regulate which sites can have an address
ending in each extension.8
The remainder of this paper is organized as follows. Section 2 provides a
description of the data including the list of drugs I use and a method for generating
the content of each search result. A descriptive and regression analysis of the data is
developed in section 3, including the differences in supply and content across engines
and the dynamics of a URL?s rank over time. Section 4 concludes and provides some
directions for future work.
3.2 Data
Drug Selection
To select the list of queries, I started with the 2004 National Ambulatory
Medical Care Survey (NAMCS), and determined the 20 most popular drug classes
based on ?drug visits? which is the number of visits to a doctor in which a given drug
is prescribed.9 Of these, I decided to focus on the top 95% of drugs in three National
Drug Code (NDC) classes: antidepressants, cholesterol, and diabetes, due to their
relatively high advertising intensity online and o?ine. Since NAMCS only contained
drugs approved through 2004, I supplemented the list with recently approved drugs
7See Huh and Cude (2004), which analyzes medical-related websites to calculate a measure of
bias based on the type of information appearing on each page.
8A complete list of top-level domains and their requirements can be found at: http://www.
iana.org/domains/root/db/.
9See http://www.cdc.gov/nchs/about/major/ahcd/ahcd1.htm.
79
in each of the three classes from FDA?s Orange Book.10 Starting in 2006, NAMCS
started using a different coding system for all drugs. Each drug can belong to up to
four categories which sometimes span the classes from the old NDC system. I use
the old class codes in this paper as they are broadly in-line with the new system.
This yielded 99 unique brand names that formed the basic search list. I sup-
plemented these queries in several ways. First I paired the top five drugs in each
class (based on total search results) with each other to assess queries where a con-
sumer was seeking comparison information about two similar drugs. I also added
keywords to the top five drugs in each class where the keywords where determined
using Google?s Adwords tool.11 These include risk-related keywords like ?interac-
tions? and ?side effects? as well as sales promotion keywords like ?discount? and
?price.? Finally, for brand name comparisons and brand names paired with key-
words, I included searches with and without quotes. In all, this yielded 458 search
queries.12
Crawler Data
With the help of two excellent research assistants,13 we designed a web crawler
that submitted the list of 458 search queries to four large search engines (Ask,
Google, MSN, and Yahoo) every day at 12:00pm during the period from February
- September 2007. The crawler saved the top 100 organic search results which
appeared on first 10 pages. These first 10 pages also contain sponsored search
10See http://www.fda.gov/cder/orange/obreadme.htm.
11See https://adwords.google.com/select/KeywordToolExternal.
12See the appendix for the complete list.
13Chien (Daniel) Yin and Chris Wasko.
80
results, the number of which varies depending on the query.14
Since the crawler program returned the raw HTML files containing the search
results for each engine-day-query, we then wrote a parsing program to separate out
the following fields/variables for each result: rank, title text, summary text, URL
(displayed and actual)15, result type (organic or sponsored), and result position (for
sponsored results).
Organic Results
Ask 4,200,829
Google 10,604,000
MSN 10,724,339
Yahoo 10,725,091
Sponsored Results
Ask 3,384,565
Google 2,667,023
MSN 3,949,351
Yahoo 6,364,854
Date Range Feb - Sep 30, 2007*
Unique Queries 458
Query Types
Drug Name Only 99
Drug + Informational Keyword 195
Drug + Promotional Keyword 96
Drug + Drug 68
Drug Classes
Depression 161
Cholesterol 133
Diabetes 164
Basic Statistics
*Data from the Ask search engine is only available through May and does not 
include the organic links ranked 91-100.
Table 3.1: Basic Statistics
Table 3.1 displays the basic statistics of the data collected by the crawler. Due
to a parsing error, only a limited sample was gathered from the Ask search engine.
14We faced several challenges in collecting the data including adapting to formatting changes on
each engine that occurred during the time period and adding a random time increment between
queries to avoid the search engine (correctly) flagging us as a crawler. We assume our own search
activity has minimal impact on the supply of search results.
15These are frequently different especially for sponsored results which are routed through the
search engine first (so the engine can charge the advertiser) before taking the user to their desti-
nation page.
81
Classification Algorithm
In order to determine the type of search results that were appearing following each
query, I devised an algorithm to classify each search result as being either informa-
tional, promotional, or neutral. This was accomplished with the following steps:
1. For all 4 engines and for one week, first collect all words appearing in the top
100 organic and all sponsored search results following two types of queries:
? drug name + ?buy? or drug name + ?cheap? (likely promotional sites)
? drug name + ?information? or drug name + ?side effects? (likely infor-
mational sites)
where drug name was one of the 99 brand names in the sample. I do this
separately for titles and summaries and for organic and sponsored results,
which provides 8 lists of words.
2. Create a frequency table of all of the words appearing in each list and save
the top 200 most popular words in each list.
3. Eliminate any words that appear in both categories (informational and pro-
motional) and save the top 50 unique words in each category.16
4. Analyze every search result in the database and calculate the proportion of
words in each text field that also appear in the corresponding top 50 list. E.g.,
an organic summary text field may have 25% promotional words and 10%
informational words.
16The uniqueness requirement also eliminates common words that frequently appear in text
fields, but are unhelpful in classifying content.
82
With these proportions in hand, I can form a metric called the ?average content?
of a search result which is simply the difference between the proportion of words
that are promotional and the proportion that are informational. I can also create
a binary indicator of content and, for example, classify a result as promotional if
it contains a relatively higher proportion of promotional keywords. The keywords
used in the classification are shown in table C.2 in the appendix. Note that some
of the words are actually numbers, which are very common in promotional results,
and therefore helpful in their classification.
3.3 Descriptive Analysis
3.3.1 Supply
60%
70%
80%
90%
100%
l
ts
Total Supply
Organic Sponsored
0%
10%
20%
30%
40%
50%
Ask Google Msn Yahoo
Pe
r
c
e
n
t
 o
f
 
R
e
s
u
l
Search Engine
Figure 3.1: Distribution of Organic and Sponsored Results
Figure 3.1 show the overall supply of results on each engine. Note that even
83
with the limited sample from the Ask engine, it has relatively more sponsored results
than the other engines given that it displays its own results along with those from
Google. Of the other 3 engines, while they all have about the same number of
organic results, Google has the largest proportion. There are usually 100 organic
results collected per query-day, but for some queries there are fewer.17
60%
70%
80%
90%
100%
L
s
Extension Supply -Organic Links
Ask Google Msn Yahoo
0%
10%
20%
30%
40%
50%
com gov edu org/net/infous/uk/ca other
Pe
r
c
e
n
t
 o
f
 
U
R
L
Extension
Figure 3.2: Extension Popularity - Organic Results
Organic and sponsored result popularity by extension are shown in figures
3.2 and 3.3 respectively. Google?s organic results feature the fewest dot-com and
the most dot-gov, dot-edu and dot-org/net/info results.18 MSN has the largest
percentage of dot-com results and fewest dot-govs. Among sponsored results, most
have dot-com extensions, except for MSN who has relatively more dot-org/net/info
17In theory, with 235 days and 458 search queries, I could observe a maximum of 235*458*100 =
10,763,000 observations per engine. For Google, MSN, and Yahoo, I observe 99% of this theoretical
maximum.
18The differences between the engines are largely statistically significant. For example, Google?s
percentage of dot-gov results is statistically higher than each of the other three engines.
84
60%
70%
80%
90%
100%
L
s
Extension Supply -Sponsored Links
Ask Google Msn Yahoo
0%
10%
20%
30%
40%
50%
com gov edu org/net/infous/uk/ca other
P
e
rcen
t
 of
 
UR
L
Extension
Figure 3.3: Extension Popularity - Sponsored Results
extensions among their sponsored results.
40
50
60
n
k
Average Rank
Google Msn Yahoo
0
10
20
30
com gov edu org/net/info us/uk/ca other
A
v
era
g
e R
a
n
Extension
*I have omitted the results from ASK in this graph 
because there were often missing data from page 10 
(results 91-100)
Figure 3.4: Average Rank by Extension - Organic Results
I finally break down the average rank of organic results by extension in figure
85
3.4. I omit Ask because of the parser problem. If the results were spread evenly,
they should have a mean of 50, but here dot-gov sites tend to be pushed toward the
top of the page (lower numbered ranks). Of the three engines, the dot-gov sites on
Yahoo are most likely to appear high in the search results.
3.3.2 Content
Using the rank popularity from the AOL click-through database (among all
queries), I calculate an attention index for each organic rank because links appearing
toward the top of the results are more likely to receive a click than those lower in
the results. The index is simply the proportion of clicks on each organic rank, from
1 to 100.19 Then I calculate the percent of organic results weighted by the attention
index for which their summaries are classified as promotional, information, or neu-
tral. E.g., a result is promotional if it contains a higher proportion of promotional
keywords compared with informational.
Figure 3.5 displays the attention weighted content of each engine. MSN?s
results tend to be more promotional than other engines and Google?s are more
informational. Classification reflecting the actual proportions are reported in the
kernel density figures. Figure 3.6 is the same breakdown for sponsored results. Here,
Google and Yahoo tend to be relatively more promotional and Ask and MSN are
more informational.
Figures 3.7 and 3.8 display the organic and sponsored summary content bro-
19For example, because users click on the first result much more often than other results, the
first rank receives a weight of 0.423 while the fifth rank has a weight of 0.049.
86
60%
70%
80%
90%
100%
L
S
Attention Weighted Organic Summary Content
Ask Google Msn Yahoo
0%
10%
20%
30%
40%
50%
Promotional Informational Other
P
e
rcent o
f
 
UR
L
Class
Attention determined by populatity of 
ranks from AOL database.
Figure 3.5: Content of Summary Field - Organic Results
60%
70%
80%
90%
100%
L
S
Attention Weighted Sponsored Summary Content
Ask Google Msn Yahoo
0%
10%
20%
30%
40%
50%
Promotional Informational Other
Per
c
en
t
 o
f
 
U
R
L
Class
Attention determined by populatity of 
ranks from AOL database.
Figure 3.6: Content of Summary Field - Sponsored Results
ken down by extension. For organic results, dot-com and dot-gov sites are more
informational, while surprisingly, dot-edus are relatively more promotional for all
engines. Further investigation revealed that, e.g., the engines are picking up on
comments left on university bulletin boards by online pharmacies trying to sell their
87
60%
Google's Organic Links - Summary Content by Extension
Promotional
60%
Ask's Organic Links - Summary Content by Extension
0%
10%
20%
30%
40%
50%
com gov edu org/net/info us/uk/ca other
Pr
opor
tion of
 
Links
Et i
Informational
-10%
0%
10%
20%
30%
40%
50%
com gov edu org/net/info us/uk/ca other
Pr
opor
tion of
 
Links
Ei
Promotional
Informational
Extens onExtension
30%
40%
50%
60%
r
tion of
 
Links
MSN's Organic Links - Summary Content by Extension
Promotional
Informational
30%
40%
50%
60%
t
ion of
 
Links
Yahoo's Organic Links - Summary Content by Extension
Promotional
Informational
0%
10%
20%
com gov edu org/net/info us/uk/ca other
Pr
op
o
r
Extension
-10%
0%
10%
20%
com gov edu org/net/info us/uk/ca other
Pr
opor
t
Extension
Figure 3.7: Content of Summary Field - Organic Results - By Extension
Google's Sponsored Links -Summary Content by ExtensionAsk's Sponsored Links -Summary Content by Extension
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
com gov edu org/net/info us/uk/ca other
P
r
op
or
tion
 of
 
Lin
k
s
Promotional Informational
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
com gov edu org/net/info us/uk/ca other
P
r
op
or
tion
 of
 
Lin
k
s
Promotional Informational
ExtensionExtension
40%
50%
60%
70%
80%
90%
100%
p
or
tion
 of
 
Lin
k
s
MSN's Sponsored Links -Summary Content by Extension
Promotional Informational
40%
50%
60%
70%
80%
90%
100%
o
po
r
t
i
o
n o
f
 
Li
nk
s
Yahoo's Sponsored Links -Summary Content by Extension
Promotional Informational
0%
10%
20%
30%
com gov edu org/net/info us/uk/ca other
Pr
o
p
Extension
0%
10%
20%
30%
com gov edu org/net/info us/uk/ca other
Pr
o
Extension
Figure 3.8: Content of Summary Field - Sponsored Results - By Extension
drugs. As for sponsored results, Ask, Google and Yahoo feature mostly dot-coms
and these tend to be relatively promotional as expected. Sponsored results ending
88
in dot-org/net/info tend to be informational in content. MSN is unique in that
all of its sponsored results tends to be more informational, in line with the large
proportion of their results ending in dot-org/net/info.
3.3.3 Rank and Content Comparisons
An additional approach to comparing the results from a query across search
engines is to analyze the rank and contents of a set of organic results. In figures
3.9 and 3.10, I display a comparison of Google and Yahoo. The first scatter shows
the ranks of identical URLS (following the same query on the same day). The
differences in algorithms is clear from the figure and a weak correlation of 0.37.
However, when comparing the proportion of promotional keywords on the same two
engines, a stronger correlation (0.44) is revealed. Thus, though the algorithms differ
in how they rank the results, the process for selecting which words and phrases to
include in the summary text is similar. Repeating this for other engine comparisons
reveals roughly the same pattern though the correlations are less strong.
Though the rank of a search result may be very different on a given day, there
may be some relationship between the changes in the rank over time. In figure 3.11,
I track the rank of the same URL (following the same query) across time for the
three engines for which I have complete data. Here I see that the ranks on MSN and
Google are fairly stable though there are frequent spikes in Yahoo?s rank. These
may be due to algorithm testing by the engine throughout the year.20
20In the future, I will analyze how exogenous shocks, such as a FDA news story about a drug,
affect the rank dynamics of specific URLs or extension classes across search engines.
89
60
70
80
90
100
n
k
Rank Comparison
0
10
20
30
40
50
2040608010
Y
a
h
oo Ra
n
Google Rank
Figure 3.9: Rank Comparison - Organic Results - Google vs Yahoo
3.3.4 Kernel Density Plots of Content
As a final analysis of the content differences between search engines and across
different extensions and result types, I estimate Gaussian kernel density distribu-
tions using the difference between the proportions of promotional and informational
keywords in each result. I first drop the search results that have no promotional
or informational keywords (i.e., those that would be classified as neutral/other).
The variable plotted (PropPromo - PropInfo) ranges from -1 to +1 with -1 corre-
sponding to a result that is completely informational and +1 meaning the result
was completely promotional. A value of zero means that a result contained an equal
(non-zero) number of informational and promotional keywords.
Figure 3.12 shows that organic results on all engines tend to be more infor-
90
12%
14%
16%
18%
20%
p
or
t
i
on
Promotional Proportion Comparison
0%
2%
4%
6%
8%
10%
0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20%
Yah
o
o
 Pr
o
p
Google Proportion
Figure 3.10: Summary Content Comparison - Organic Results - Google vs Yahoo
40
50
60
URL Dynamics, Query = zocor
Google MSN Yahoo!
nlm.nih.gov/medlineplus/druginfo/medmaster/a692030.html
0
10
20
30
2007
0209
2007
0223
2007
0309
2007
0323
2007
0406
2007
0420
2007
0504
200
70518
200
7060
1
200
70615
2007
0629
200
70713
2007
0727
200
70810
2007
0824
2007
0907
200
7092
1
Ra
n
k
Date
Figure 3.11: Organic Rank Dynamics
91
?0.5 ?0.4 ?0.3 ?0.2 ?0.1 0 0.1 0.2 0.3 0.4 0.50
2
4
6
8
10
12
<?? Informational       Promotional ??>
Density
Organic Summary
Ask
Google
MSN
Yahoo!
?0.5 ?0.4 ?0.3 ?0.2 ?0.1 0 0.1 0.2 0.3 0.4 0.50
2
4
6
8
<?? Informational       Promotional ??>
Density
Sponsored Summary
Ask
Google
MSN
Yahoo!
Figure 3.12: Kernel Density of Summary Content
mational with the spike around -0.05, though MSN has the highest density of more
promotional sites. Sponsored results tends to be either very information or very
promotional, as revealed by the heavy tails in each distribution. MSN tends to have
the most results with informational sponsored results (consistent with the large
percentage of dot-org/net/info sites in their sponsored results).
In the appendix, I display a breakdown of summary content of organic and
sponsored results by extension (see figures C.1 and C.2). For organic results, dot-
com and dot-gov results are again shown to be more informational for all engines.
Dot-edu sites display about just as many promotional as informational sites, with
the heaviest tail for promotional content on Google. For dot-org/net/info, most
92
of the results are informational as expected. Among the sponsored results, dot-
com sites again account for most of the sponsored results and tend to be either
very informational or very promotional. There are very few dot-gov and dot-edu
results among the sponsored results and the dot-org/net/info sites tend to be very
informational.
3.3.5 Probit Analysis
Finally, I report the results of a simple probit regression analyzing the determi-
nants of rank in a search engine?s results. While I could consider the likelihood that
a URL achieves a given rank using an ordered probit approach, since most users do
not venture beyond the first page of search results, I only consider the probability
that a result appears on the first page. In this regression, I consider organic results
from March 2007 on all 4 engines in the sample. Since characteristics of individual
drugs (like drug age and advertising intensity) do not vary across the ranks in the
search results, I cannot analyze the influence of these variables on where a URL
appears. However, I can interact them with the extension of a search result?s URL
since they do vary by rank. I can then determine, for example, how the age of a
drug may affect the likelihood of a dot-gov URL appearing on page one of the search
results.
Table 3.2 displays the result of a probit estimation on the probability that a
result appears on page one as a function of website extensions, extension/age and
extension/advertising interactions, and result summary contents. Definitions for
93
Parameter
Estimate
SE
Estimate
SE
Estimate
SE
Estimate
SE
Intercept
-1.711***
0.023
-2.220***
0.039
-1.704***
0.032
-1.730***
0.023
dotcom
0.492***
0.025
0.670***
0.040
0.312***
0.033
0.220***
0.025
dotgov
1.049***
0.049
0.810***
0.051
0.732***
0.053
0.804***
0.041
dotedu
0.357***
0.110
-0.481***
0.107
0.127
0.158
-0.141**
0.069
dotorgnetinfo
0.465***
0.033
0.553***
0.046
0.222***
0.040
0.247***
0.032
dotintl
0.289***
0.030
0.069*
0.049
-0.033
0.046
-0.121***
0.035
dotcom_age
0.001
0.001
0.002**
0.001
0.007***
0.001
0.003***
0.001
dotgov_age
-0.016***
0.003
0.006***
0.002
-0.010***
0.003
0.001
0.003
dotedu_age
0.021***
0.006
0.067***
0.005
-0.034***
0.011
0.035***
0.004
dotorg_age
-0.007***
0.002
0.000
0.002
0.010***
0.002
0.010***
0.002
dotcom_dtc
-0.222***
0.030
-0.220***
0.030
0.098***
0.025
-0.060**
0.029
dotgov_dtc
0.530***
0.104
0.379***
0.095
-1.284***
0.211
-0.181**
0.099
dotedu_dtc
-2.036***
0.500
-0.219
0.173
0.870***
0.331
-1.625***
0.249
dotorg_dtc
0.215***
0.064
0.675***
0.058
-0.663***
0.074
0.120**
0.061
prop_promo_summary
-0.730***
0.135
-0.945***
0.155
-2.393***
0.145
-0.163
0.143
prop_info_summary
1.468***
0.086
5.894***
0.075
4.500***
0.105
4.642***
0.083
prop_promo_title
1.893***
0.086
-2.113***
0.153
-2.220***
0.133
0.091
0.086
prop_info_title
0.771***
0.039
1.467***
0.031
1.524***
0.046
0.921***
0.034
Notes: ***, **, * Significant at the 1%, 5%, and 10% level respectively.  Omitted categories: extension = other, summary content = other, title content = other, drug class = diabetes.  Organic results from March 2007.observations
148,718
176,700
176,431
176,680
Dependent Variable: Pr(Page1)
percent concordant
61.6
72.5
65.9
65.7
Ask
Google
MSN
Yahoo
Table 3.2: Regression Results: Probit of Pr(Page 1)
all variables used in the regression are summarized in table C.3 in the appendix.
Promotional sites are uniformly pushed down and informational sites are more likely
to appear on page one. Dot-gov sites are the most likely of all extensions to appear
on page one with the greatest effect for Google?s engine. In all but Ask?s engine,
dot-edu and international sites tend to get pushed off of page one. The interaction
terms reveal that, for most engines, older drugs are more likely to have dot-com sites
appearing on page one. The reason may be that younger drugs have few promotional
dot-com sites appearing high in the ranks.21
I am unable to perform a similar analysis predicting the probability that a
sponsored link appears on page one because the dataset does not include the page
21Note that to assess the fit of the model, I report the percent concordant, as explained in chapter
2.
94
on which a sponsored link appears and there are often a different number of spon-
sored links on each page. However, since I do observe the rank, an ordered probit
predicting sponsored links? overall rank revealed that, as expected, dot-com sites
are more likely to appear high in the ranks and advertising intensity does not have
a consistent effect on rank (via its interaction with the website extensions).
3.4 Conclusion
In addition to many o?ine sources, there is a large and diverse quantity of
prescription drug information accessible online. Consumers are likely filtering this
information and making their decisions about which sites to visit based on a search
engine?s results page, which includes the result?s rank, title and summary text,
classification as organic or sponsored, and the extension of the URL. I have shown
that the information varies significantly across engines, over time, and between
different website extensions.
The descriptive analysis shows that Ask has relatively more sponsored links
compared with other engines, perhaps because of their agreement to deliver spon-
sored links from Google along with those generated from their own algorithm.
Google?s organic results feature relatively more dot-gov and dot-edu links and MSN?s
engine returns the most dot-com results. On all engines, dot-gov sites appear higher
in the ranks compared with other extensions, because the engines? algorithms rank
them higher for their relevance and/or because users frequently click these results.
I also analyze the content of the summary text in order to classify individual
95
results as informational, promotional or neutral. Overall, Google?s results are rela-
tively more informational and MSN?s the most promotional, in line with the popular
extensions on each engine. However, classifying websites solely on their extension
may be misleading as I found that dot-edu sites actually tend to be more promo-
tional. Among sponsored links, dot-com results are by far the most popular and,
as expected, they tend to be relatively more promotional. Kernel density estimates
confirm these results and also show that sponsored links tend to be either very infor-
mational or very promotional, as revealed by heavy tails in the distribution. Since
the website owners are paying for each click on a sponsored link, they are likely
trying to provide a very clear summary of what information the user will find if
they click on the result.
Finally, the probit analysis revealed that informational sites are more likely to
appear on page one of the results. Dot-gov sites are also relatively more likely to
appear high in the results and the effect is largest for Google?s engine. By including
interaction terms of the drugs? ages and website extensions, I also show that younger
drugs are less likely to have dot-com results high in the ranks.
In future work, I hope to track the dynamics of specific URLs (e.g., an fda.gov
site) following a major news story about a drug being issued by the FDA. I expect
to see a displacement of more promotional dot-com sites by the informational sites.
While some analysis can be accomplished with the current dataset, other research
will be possible once I have a complete picture of both the supply and demand for
drug information from the same time period. I will soon have access to data from
comScore?s Media Metrix product which includes individual click-through behavior
96
for a set of consumers in the same time period as the crawler data. Matching
these two data sources will allow me to investigate both the probability of a click
as a function of result characteristics (e.g., rank, content, and extension) as well
as determine the substitution/complementary effects of organic and sponsored links
appearing in the same set of search results.
97
Appendix A
Chapter 1 Supplement
A.1 The Distillation Process
Since the various components of crude oil have different boiling points, a re-
finery?s essential task is to boil the crude oil and separate it into the more valuable
components. Figure A.1 displays a simplified diagram of a typical refinery?s opera-
tions. The first and most important step in the refining process is called fractional
distillation. The steps of fractional distillation are as follows:
1. Heat the crude oil with high pressure steam to 1,112 degrees fahrenheit.
2. As the mixture boils, vapor forms which rises through the fractional distillation
column passing through trays which have holes that allow the vapor to pass
through.
3. As the vapor rises, it cools and eventually reaches its boiling point at which
time it condenses on one of the trays.
4. The substances with the lowest boiling point (such as gasoline) will condense
near the top of the distillation column.
While some gasoline is produced from pure distillation, refineries normally
employ several downstream processes to increase the yield of high valued products
by removing impurities such as sulfur. Cracking is the process of breaking down
large hydrocarbons into smaller molecules through heating and/or adding a catalyst.
Cracking was first used in 1913 and thus changed the problem of the refiner from
98
Figure A.1: Refinery Operations
choosing how much crude oil to distill into choosing an appropriate mix of products
(within some range). Refineries practice two main types of cracking:
? Catalytic cracking: a medium conversion process which increases the gasoline
yield to 45% (and the total yield to 104%).
? Coking/residual construction - a high conversion process which increases the
gasoline yield to 55% (and the total yield 108%).
The challenge of choosing the right input and output mix given the available tech-
nology creates a massive linear programming problem.
A.2 Crude Oil Quality
Crude oil is a flammable black liquid comprised primarily of hydrocarbons
and other organic compounds. The three largest oil producing countries are Saudi
99
Arabia, Russia and the United States.1 Crude oil is the most important input into
refineries and this raw material can vary in its ability to produce refined products
like gasoline. The two main characteristics of crude that determine its quality are
American Petroleum Institute (API) gravity and sulfur content. The former is a
measure (on an arbitrary scale) of the density of a petroleum liquid relative to
water.2 Table A.1 summarizes these characteristics and includes some common
crude types and their gasoline yield from the initial distillation process.
Table A.1: Crude Qualities
API Sulfur Content
Gravity < 0.7% > 0.7%
< 22? Heavy Sweet Heavy Sour - 14% yield
(Maya, Western Canadian)
22? ?38? Medium Sweet Medium Sour - 21% yield
(Mars, Arab light)
> 38? Light Sweet - 30% yield Light Sour
(WTI, Brent)
Source: EIA.
Worldwide, light/sweet crude is the most expensive and accounts for 35% of
consumption. Medium/sour is less expensive and accounts for 50% of consumption
while heavy/sour is the least costly and accounts for 15%. Figure A.2 show how
the average crude oil used by US refiners is becoming heavier and more sour over
time. This means that the production costs of a gallon of gasoline are changing
as refineries must invest in more sophisticated technology in order to process lower
1Production in this sense refers to the quantity extracted from a country?s endowment.
2Technically, API gravity = (141.5/ specific gravity of crude at 60? F) ?131.5. Water has an
API gravity of 10?.
100
quality crude oil.
Since crude oil by itself has very little value to any industry, the price of a
barrel of oil reflects the net value of the downstream products that can be created
from it. The two major sources of movements in the crude oil price are upstream
supply shocks (due to OPEC?s quotas and hurricanes affecting oil rigs in the Gulf
of Mexico) and downstream demand shocks (due to consumer?s demand for refined
products). The other source often sited by industry experts are refinery inventories
of crude oil. Maintaining stocks of crude oil allow the refinery to respond quickly
to downstream shocks like an unexpectedly cold winter increasing the demand for
heating oil.
0.80.91.0
1.11.21.31.41.5
19
8
5
19
8
6
19
8
7
19
8
8
19
8
9
19
9
0
19
9
1
19
9
2
19
9
3
19
9
4
19
9
5
19
9
6
19
9
7
19
9
8
19
9
9
20
0
0
20
0
1
20
0
2
20
0
3
20
0
4
20
0
5
20
0
6
Year
Sulfur Content (%)
30313233
API Gravity (Degrees)
Sulfur Content API Gravity
Figure A.2: Average Crude Oil Quality: Heavier and More Sour
Within the various types of crude oil, the prices of each quality respond dif-
ferently to shocks. The ?light/heavy? differential is one measure that indicates
the benefit a refiner can achieve by investing in sophisticated equipment to process
101
heavier crude oil into highly-valued refined products. The differential has varied
significantly over the last 10 years from 3 dollars per barrel to almost 20 dollars per
barrel. An oil refinery faces a unique decision when making its production choice,
one that provides for both flexibility and complexity. One one hand, consumers
do not care about the type of crude oil, oxygenates, or distillation process used to
make, for example, the gasoline they put in their cars. They just want their car
to run well. While this would appear to make a refiner?s problem easier, choosing
their heterogeneous inputs, such as crude oil, satisfying federal, state and city en-
vironmental regulations, and all while maximizing profits, makes for an enormously
complex optimization.
A.3 Estimation Algorithm
My estimation strategy involves matching utilization and investment moments.
This requires that I solve for a policy function for each of these decisions and in-
terpolate the functions to the realizations of the state variables in the data. The
monthly utilization choice problem is a simple finite horizon dynamic program that
I am able to solve by backward induction. So, for a given level of investment which
induces a capacity for the plant, I can write the problem as:
?iy = Max{uim}12m=1 E
bracketleftBigg 12summationdisplay
m=1
?m?1piim(uim;xim,qiy)
bracketrightBigg
. (A.1)
Then, ?iy, the aggregate discounted annual profit of the plant, becomes the payoff
function for the infinite horizon problem. The Bellman equation for that problem
102
is:
V(x) = Maxr
braceleftBig
?iy(r;x) +?V(xprime)P(xprime|x,r)
bracerightBig
. (A.2)
To solve this equation, I could have used several different methods including succes-
sive approximations or collocation, but I chose policy function iteration, also known
as the Howard Policy Improvement Algorithm. The first step is to guess a candidate
policy function, which I call, ?t(x), where t indexes the iteration. Since this pol-
icy governs investment which affects optimal utilization, which in turn affects the
probability of breakdown, I have to calculate the transition matrix given the policy:
P(xprime|x,?t(x)). Then comes the ?policy evaluation step? which is to solve A.2, i.e.:
Vt(x) = [I ??P(xprime|x,?t(x))]?1?iy(?t(x);x). (A.3)
For a size K state space, this involves the inversion of a KxK matrix which makes it
difficult to estimate the with too fine of a discretization. With the value function in
hand, I move to the ?policy improvement step? which updates the policy function:
?t+1(x) = argmaxr
braceleftBig
?iy(r;x) +?Vt(xprime)P(xprime|x,r)
bracerightBig
. (A.4)
Finally, I compare ?t+1(x) to ?t(x) and repeat the process until convergence.
A.4 Additional Tables
Table A.2: Industry Concentration
1970 1980 1991 2001 2004 2005 2006 2007 2008
4-Firm (%) 31.4       40.2     44.4     43.0     45.8     44.1      41.2       
8-Firm (%) 52.2       61.6     69.4     68.4     72.0     69.5      63.7       
HHI 437.0     611.0   728.0   727.0   776.4   730.3   644.2     
PADD 1
4-Firm (%) 59.2       80.7     76.7     85.8     87.3     87.3      87.0       
8-Firm (%) 88.7       99.0     97.9     99.4     99.4     99.4      99.4       
HHI 1,225.0  2,158.0 1,943.0 2,505.0 2,537.5 2,540.2 2,524.7  
PADD 2
4-Firm (%) 38.3       37.4       39.3       50.9     57.1     57.1     59.6     55.5      50.5       
8-Firm (%) 59.7       60.0       65.0       75.6     82.6     82.6     85.0     80.9      75.9       
HHI 675.0     961.0   1,063.0 1,059.0 1,114.0 1,031.3 950.8     
PADD 3
4-Firm (%) 44.0       36.2       36.3       48.4     56.3     56.0     57.8     56.0      50.9       
8-Firm (%) 64.8       54.5       58.5       66.5     78.8     78.2     81.2     77.6      73.2       
HHI 578.0     851.0   1,018.0 1,005.0 1,052.2 976.7   909.2     
PADD 4
4-Firm (%) 53.5       48.0       55.8       58.1     46.1     45.7     50.9     50.7      58.7       
8-Firm (%) 81.7       75.3       83.6       86.9     81.2     80.4     85.5     85.2      84.3       
HHI 1,080.0  1,179.0 944.0   935.0   1,047.7 1,031.5 1,405.5  
PADD 5
4-Firm (%) 66.5       54.4       53.8       60.2     62.4     62.4     59.1     59.2      61.8       
8-Firm (%) 95.2       76.5       74.2       86.9     92.7     92.8     89.5     89.6      89.4       
HHI 965.0     1,148.0 1,246.0 1,247.0 1,162.2 1,168.7 1,195.7  
California
4-Firm (%) 58.9       68.7     66.2     66.5     62.3     62.5      63.0       
8-Firm (%) 82.5       95.1     96.3     96.3     92.1     93.2      93.2       
HHI 1,184.0  1,481.0 1,475.0 1,475.0 1,354.9 1,367.2 1,368.8  
Gulf Coast
4-Firm (%) 59.1     60.1      53.7       
8-Firm (%) 83.5     83.1      76.7       
HHI 1,107.9 1,110.5 995.0     
PADDs 1 & 3
4-Firm (%) 40.9       35.0       36.7       44.6     54.6     52.5     55.4     54.0      50.2       
8-Firm (%) 62.3       55.0       57.2       65.3     76.1     75.5     79.5     76.6      72.8       
HHI 561.0     741.0   919.0   890.0   967.9   991.1   861.2     
PADDs 2 & 3
4-Firm (%) 30.7       42.5     46.2     45.9     50.0     47.5      44.4       
8-Firm (%) 56.5       64.9     75.6     75.2     79.9     76.2      70.3       
HHI 455.0     681.0   826.0   818.0   894.6   822.7   742.9     
PADDs 1, 2, & 3
4-Firm (%) 35.2       30.7       30.2       39.4     45.9     44.5     49.2     47.1      43.9       
8-Firm (%) 58.0       49.2       53.6       63.5     73.1     72.6     78.3     75.1      69.6       
HHI 460.0     638.0   789.0   783.0   872.7   807.9   731.4     
US
Source: EIA.  Concentration based on operating capacity of crude oil distillation measured per calendar day on 
January 1st of the given year.  The FTC generated the table through 2004 and I extended it through 2008.    
Upper Midwest: Illinois, Indiana, Kentucky, Michigan, and Ohio.  Increase from 2004 to 2005 HHI's in PADDs 
I and III primarily due to the merger between Valero and Premcor.   Capacities used in this table are at the 
corporate level (multiple refineries owned by the same corporation are aggregated).
104
Table A.3: Cost Estimates
Year Parameter Coefficient Std. Err. Coefficient Std. Err. Coefficient Std. Err.
Q (?
0
) 3.45*** 0.01 0.36*** 0.10 7.99*** 0.75
Q
2
 (?
1
) 2.70*** 0.01 10.86 11.18 5.45*** 0.21
Q*P
c
 (?
2
) 0.29*** 0.00 0.06*** 0.02 0.28*** 0.04
Investment (?
3
) 4.41*** 0.14 4.56 5.36 7.80 8.70
Investment
2
 (?
4
) -4.41*** 0.07 -2.99*** 0.74 -5.52 5.01
Q (?
0
) 3.48*** 0.00 2.62*** 0.38 0.05 2.09
Q
2
 (?
1
) 6.19*** 0.01 5.21*** 0.31 6.02*** 0.44
Q*P
c
 (?
2
) 0.03*** 0.00 0.03* 0.02 1.00*** 0.03
Investment (?
3
) 4.01*** 0.15 5.58 51.23 3.84 11.82
Investment
2
 (?
4
) -1.27*** 0.05 -0.97 8.91 -2.09** 1.03
Q (?
0
) 0.05* 0.03 0.92*** 0.19 1.08 1.98
Q
2
 (?
1
) 5.14*** 0.05 7.85*** 0.15 7.30*** 0.04
Q*P
c
 (?
2
) 0.08*** 0.00 0.05*** 0.01 0.38*** 0.00
Investment (?
3
) 4.25*** 0.03 3.60** 1.64 8.88*** 0.21
Investment
2
 (?
4
) -0.81*** 0.01 1.03 1.88 -1.86*** 0.04
Q (?
0
) 0.17*** 0.03 0.05 26.36 1.16*** 0.31
Q
2
 (?
1
) 1.00*** 0.04 3.68 8.20 3.40*** 0.24
Q*P
c
 (?
2
) 1.00*** 0.01 0.02 55.93 0.86*** 0.08
Investment (?
3
) -17.65 110.67 3.28 6.13 5.15 95.97
Investment
2
 (?
4
) 25.35 33.80 -4.30 32.06 -1.91 1.75
Q (?
0
) 2.70*** 0.04 0.44 51.57 6.94 35.07
Q
2
 (?
1
) 5.79*** 0.18 2.13 6.43 7.35*** 0.05
Q*P
c
 (?
2
) 0.01*** 0.00 0.27 3.96 0.12 19.64
Investment (?
3
) 4.65 14.90 5.90 11.03 9.53*** 0.73
Investment
2
 (?
4
) -0.82 1.31 -6.05 58.11 -0.92*** 0.13
Q (?
0
) 6.19*** 0.57 0.04 0.19 10.29*** 1.57
Q
2
 (?
1
) 5.89*** 0.11 11.36*** 0.63 6.36*** 0.41
Q*P
c
 (?
2
) 0.00 0.00 0.00 0.00 0.01 0.06
Investment (?
3
) 5.65* 4.16 4.08 4.40 11.85*** 1.33
Investment
2
 (?
4
) -2.82*** 0.44 -0.99 2.13 5.26 9.43
Q (?
0
) 0.32*** 0.06 0.05*** 0.01 0.03 2.92
Q
2
 (?
1
) 5.75*** 0.06 23.84*** 1.07 2.63** 1.19
Q*P
c
 (?
2
) 0.02*** 0.00 0.00*** 0.00 1.00*** 0.20
Investment (?
3
) 4.56*** 0.53 3.91*** 0.35 9.74 15.17
Investment
2
 (?
4
) 1.12*** 0.07 -4.79*** 0.36 -5.05*** 0.99
Q (?
0
) 2.24*** 0.74 0.12*** 0.03 0.58 0.52
Q
2
 (?
1
) 4.51*** 0.10 3.70*** 0.74 6.90*** 0.58
Q*P
c
 (?
2
) 0.16*** 0.03 0.98*** 0.08 0.28*** 0.05
Investment (?
3
) 17.48** 9.18 5.49** 2.74 6.75 1,402.90
Investment
2
 (?
4
) 3.49 14.69 -1.09 0.86 -0.87 6.75
Q (?
0
) 0.88*** 0.18 13.42 394.71 0.03 0.22
Q
2
 (?
1
) 5.87*** 0.11 0.56 27.99 4.50*** 0.24
Q*P
c
 (?
2
) 0.08*** 0.01 0.32 3.94 0.79*** 0.04
Investment (?
3
) 4.32*** 0.70 5.43** 3.15 4.73*** 1.64
Investment
2
 (?
4
) 2.75*** 0.89 -1.02 1.88 -3.08* 2.14
Q (?
0
) 3.18*** 0.22 0.17*** 0.07 0.15 0.45
Q
2
 (?
1
) 8.04*** 0.13 28.65*** 8.47 11.49*** 0.68
Q*P
c
 (?
2
) 0.00*** 0.00 0.01*** 0.00 0.00 0.02
Investment (?
3
) 7.48*** 0.70 5.35*** 1.19 7.07 7.23
Investment
2
 (?
4
) 2.09*** 0.10 -5.14*** 0.57 -2.84 4.96
Q (?
0
) 0.34*** 0.02 0.90*** 0.11 0.04 34.02
Q
2
 (?
1
) 2.85*** 0.05 8.52*** 0.07 1.39 13.95
Q*P
c
 (?
2
) 1.00*** 0.01 1.00*** 0.01 1.00*** 0.05
Investment (?
3
) 10.42 24.93 11.69*** 1.40 10.74*** 4.42
Investment
2
 (?
4
) 2.05 1.60 -2.97*** 0.71 -1.15 2.36
Q (?
0
) 2.92*** 0.06 0.01 0.19 1.01*** 0.35
Q
2
 (?
1
) 1.39*** 0.02 4.67*** 0.93 4.79*** 0.34
Q*P
c
 (?
2
) 1.00*** 0.00 1.00*** 0.03 1.00*** 0.03
Investment (?
3
) 9.44 443.89 8.42 7.81 7.43 493.22
Investment
2
 (?
4
) 2.85 134.06 0.01 130.15 0.15 157.43
2004
2005
2006
***, **, * Significant at the 1%, 5%, and 10% level respectively.
2000
2001
2002
2003
1996
1997
1998
1999
1995
Market 1 Market 2 Market 3
105
Appendix B
Chapter 2 Supplement
Num. Of Num. Of Mean Queries Ad Spending
Drug Name Sessions Queries Per Session (Millions)
viagra 778             2,544             3.27                      $80.56
lexapro 728             1,734             2.38                      $1.18
depo 661             1,437             2.17                      $0.00
xanax 583             1,497             2.57                      $0.00
zoloft 566             1,305             2.31                      $46.73
wellbutrin 489             1,193             2.44                      $108.14
ambien 484             1,012             2.09                      $130.20
cymbalta 477             1,060             2.22                      $6.33
lyrica 430             886                2.06                      $0.58
effexor 405             897                2.21                      $4.05
insulin 384             1,127             2.93                      $0.00
lipitor 384             754                1.96                      $93.54
paxil 358             873                2.44                      $0.11
prozac 330             757                2.29                      $0.52
celebrex 290             744                2.57                      $3.59
cialis 284             830                2.92                      $110.94
seroquel 267             521                1.95                      $2.16
lithium 265             767                2.89                      $0.05
oxycontin 258             1,006             3.90                      $0.00
toprol 253             493                1.95                      $0.00
Total 8,674          21,437           2.48                     $588.66
Top 20 Most Actively Searched Drugs
Ad spending is total expenditure on all forms of DTCA in 2005.  These 20 drugs 
account for 30% of all search sessions, 33% of clicks, and 17% of DTCA spending.
Table B.1: 20 Most Actively Searched Drugs
106
Num. Of Num. Of Mean Queries Ad Spending
Drug Name Sessions Queries Per Session (Millions)
nexium 250                 439                   1.76                        $226.34
lunesta 185                 383                   2.07                        $215.14
vytorin 181                 330                   1.82                        $155.26
crestor 226                 441                   1.95                        $141.82
ambien 484                 1,012                2.09                        $130.20
nasonex 79                   143                   1.81                        $124.16
flonase 65                   113                   1.74                        $112.82
cialis 284                 830                   2.92                        $110.94
lamisil 117                 276                   2.36                        $110.51
plavix 199                 371                   1.86                        $110.16
wellbutrin 489                 1,193                2.44                        $108.14
singulair 141                 323                   2.29                        $105.05
lipitor 384                 754                   1.96                        $93.54
imitrex 40                   106                   2.65                        $82.21
viagra 778                 2,544                3.27                        $80.56
valtrex 161                 307                   1.91                        $72.11
prevacid 154                 242                   1.57                        $71.88
allegra 184                 379                   2.06                        $71.04
boniva 87                   178                   2.05                        $66.45
zelnorm 103                 150                   1.46                        $62.45
Total 4,591              10,514              2.10                       $2,250.77
Top 20 Most Actively Advertised Drugs
Ad spending is total expenditure on all forms of DTCA in 2005.  These 20 drugs account for 
16% of all search sessions and clicks, and 65% of DTCA spending.
Table B.2: 20 Most Advertised Drugs
Dependent VariablesDescription
Total Sessions total search sessions for a drug
Beyond Page 1 0/1 indicator; 1 if a user clicks on a link on page 2 or higher
Independent Variables
age years since FDA approval
dtca total DTCA spending, available 1994 - February 2006, logs
alltv total DTCA spending on TV, available 1994 - February 2006, logs
allmags total DTCA spending in magazines, available 1994 - February 2006, logs
allnewsp total DTCA spending in newpapers, available 1994 - February 2006, logs
allradio total DTCA spending on radio, available 1994 - February 2006, logs
outdoor total DTCA spending on outdoor media, available 1994 - February 2006, logs
internet total DTCA spending on the internet, available 1994 - February 2006, logs
X_Y_qtrb4 total spending on X in the Y quarter prior to search, logs
rld 0/1 indicator; 1 if producer of drug is the innovator/pioneer
Variable Definition for OLS and Probit Models
Note: Stock regressions include the total spending for a drug for all months between January 1994 and February 2006.  Regressions 
involving previous quarter data include spending from December 2005 - February 2006.  The depreciation analysis also includes 
spending from three previous quarters in 2005.
Table B.3: Variables Used in Regressions.
107
Appendix C
Chapter 3 Supplement
?0.5 0 0.50
2
4
6
8
10
12
14
<?? Informational       Promotional ??>
Density
Organic Summary / Dot?Com
Ask
Google
MSN
Yahoo!
?0.5 0 0.50
5
10
15
<?? Informational       Promotional ??>
Density
Organic Summary / Dot?Gov
Ask
Google
MSN
Yahoo!
?0.5 0 0.50
2
4
6
8
10
<?? Informational       Promotional ??>
Density
Organic Summary / Dot?Edu
Ask
Google
MSN
Yahoo!
?0.5 0 0.50
5
10
15
<?? Informational       Promotional ??>
Density
Organic Summary / Dot?OrgNetInfo
Ask
Google
MSN
Yahoo!
Figure C.1: Kernel Density of Summary Content, Organic Results, By Extension
108
1actos59actos price117crestor information175insulin233lipitor buy291metformin generic349pravachol cost407vivactil
2actos "actos plus"60actos risks118crestor interactions176insulin "actos plus"234lipitor cheap292metformin glucophage350pravachol crestor408vytorin
3actos "blood glucose"61actos sale119crestor lipitor177insulin "blood glucose"235lipitor cholesterol293metformin information351pravachol discount409welchol
4actos "side effects"62actos side effects120crestor lovastatin178insulin "side effects"236lipitor cost294metformin insulin352pravachol generic410wellbutrin
5actos "weight gain"63actos weight gain121crestor pravachol179insulin "weight gain"237lipitor crestor295metformin interactions353pravachol information411zetia
6actos "weight loss"64actos weight loss122crestor price180insulin "weight loss"238lipitor discount296metformin price354pravachol interactions412zocor
7actos actos plus65advicor123crestor risks181insulin actos239lipitor generic297metformin risks355pravachol lipitor413zocor "adverse effects"
8actos blood glucose66altocor124crestor sale182insulin actos plus240lipitor information298metformin sale356pravachol lovastatin414zocor "blood pressure"
9actos buy67amaryl125crestor side effects183insulin blood glucose241lipitor interactions299metformin side effects357pravachol price415zocor "side effects"
10actos cheap68amitriptyline126crestor statins184insulin buy242lipitor lovastatin300metformin weight gain358pravachol risks416zocor adverse effects
11actos cost69anafranil127crestor zocor185insulin cheap243lipitor pravachol301metformin weight loss359pravachol sale417zocor blood pressure
12actos diabetes70avandamet128cymbalta186insulin cost244lipitor price302mevacor360pravachol side effects418zocor buy
13actos discount71avandaryl129desyrel187insulin diabetes245lipitor risks303mirtazapine361pravachol statins419zocor cheap
14actos generic72avandia130doxepin188insulin discount246lipitor sale304nardil362pravachol zocor420zocor cholesterol
15actos glucophage73bupropion131duetact189insulin generic247lipitor side effects305niaspan363pravastatin421zocor cost
16actos information74buspar132edronax190insulin glucophage248lipitor statins306norpramin364precose422zocor crestor
17actos insulin75byetta133effexor191insulin information249lipitor zocor307nortriptyline365prozac423zocor discount
18actos interactions76celexa134elavil192insulin interactions250lopid308novolin366prozac "long term"424zocor generic
19actos metformin77celexa "long term"135endep193insulin metformin251lovastatin309novolog367prozac "side effects"425zocor information
20actos plus78celexa "side effects"136escitalopram194insulin price252lovastatin "adverse effects"310nph insulin368prozac "weight gain"426zocor interactions
21actos plus "blood glucose"79celexa "weight gain"137fenofibrate195insulin risks253lovastatin "blood pressure"311pamelor369prozac "weight loss"427zocor lipitor
22actos plus "side effects"80celexa "weight loss"138fluoxetine196insulin sale254lovastatin "side effects"312parnate370prozac buy428zocor lovastatin
23actos plus "weight gain"81celexa buy139fluvoxamine197insulin side effects255lovastatin adverse effects313paroxetine371prozac celexa429zocor pravachol
24actos plus "weight loss"82celexa cheap140galvus198insulin weight gain256lovastatin blood pressure314paxil372prozac cheap430zocor price
25actos plus actos83celexa cost141gemfibrozil199insulin weight loss257lovastatin buy315paxil "long term"373prozac cost431zocor risks
26actos plus actos84celexa depression142glimepiride200januvia258lovastatin cheap316paxil "side effects"374prozac depression432zocor sale
27actos plus blood glucose85celexa discount143glipizide201lantus259lovastatin cholesterol317paxil "weight gain"375prozac discount433zocor side effects
28actos plus buy86celexa generic144glucophage202lescol260lovastatin cost318paxil "weight loss"376prozac generic434zocor statins
29actos plus buy87celexa information145glucophage "actos plus"203lexapro261lovastatin crestor319paxil buy377prozac information435zoloft
30actos plus cheap88celexa interactions146glucophage "blood glucose"204lexapro "long term"262lovastatin discount320paxil celexa378prozac interactions436zoloft "long term"
31actos plus cheap89celexa lexapro147glucophage "side effects"205lexapro "side effects"263lovastatin generic321paxil cheap379prozac lexapro437zoloft "side effects"
32actos plus cost90celexa long term148glucophage "weight gain"206lexapro "weight gain"264lovastatin information322paxil cost380prozac long term438zoloft "weight gain"
33actos plus cost91celexa paxil149glucophage "weight loss"207lexapro "weight loss"265lovastatin interactions323paxil depression381prozac paxil439zoloft "weight loss"
34actos plus diabetes92celexa price150glucophage actos208lexapro buy266lovastatin lipitor324paxil discount382prozac price440zoloft buy
35actos plus diabetes93celexa prozac151glucophage actos plus209lexapro celexa267lovastatin pravachol325paxil generic383prozac risks441zoloft celexa
36actos plus discount94celexa risks152glucophage blood glucose210lexapro cheap268lovastatin price326paxil information384prozac sale442zoloft cheap
37actos plus discount95celexa sale153glucophage buy211lexapro cost269lovastatin risks327paxil interactions385prozac side effects443zoloft cost
38actos plus generic96celexa side effects154glucophage cheap212lexapro depression270lovastatin sale328paxil lexapro386prozac weight gain444zoloft depression
39actos plus generic97celexa weight gain155glucophage cost213lexapro discount271lovastatin side effects329paxil long term387prozac weight loss445zoloft discount
40actos plus glucophage98celexa weight loss156glucophage diabetes214lexapro generic272lovastatin statins330paxil price388prozac zoloft446zoloft generic
41actos plus glucophage99celexa zoloft157glucophage discount215lexapro information273lovastatin zocor331paxil prozac389questran447zoloft information
42actos plus information100cholestyramine158glucophage generic216lexapro interactions274ludiomil332paxil risks390remeron448zoloft interactions
43actos plus information101citalopram159glucophage information217lexapro long term275luvox333paxil sale391rezulin449zoloft lexapro
44actos plus insulin102clozapine160glucophage insulin218lexapro paxil276metaglip334paxil side effects392sarafem450zoloft long term
45actos plus insulin103colestid161glucophage interactions219lexapro price277metformin335paxil weight gain393sertraline451zoloft paxil
46actos plus interactions104colestipol162glucophage metformin220lexapro prozac278metformin "actos plus"336paxil weight loss394serzone452zoloft price
47actos plus interactions105crestor163glucophage price221lexapro risks279metformin "blood glucose"337paxil zoloft395simvastatin453zoloft prozac
48actos plus metformin106crestor "adverse effects"164glucophage risks222lexapro sale280metformin "side effects"338pertofrane396sinequan454zoloft risks
49actos plus metformin107crestor "blood pressure"165glucophage sale223lexapro side effects281metformin "weight gain"339prandin397starlix455zoloft sale
50actos plus price108crestor "side effects"166glucophage side effects224lexapro weight gain282metformin "weight loss"340pravachol398strattera456zoloft side effects
51actos plus price109crestor adverse effects167glucophage weight gain225lexapro weight loss283metformin actos341pravachol "adverse effects"399surmontil457zoloft weight gain
52actos plus risks110crestor blood pressure168glucophage weight loss226lexapro zoloft284metformin actos plus342pravachol "blood pressure"400tofranil458zoloft weight loss
53actos plus risks111crestor buy169glucotrol227lipitor285metformin blood glucose343pravachol "side effects"401tranylcypromine
54actos plus sale112crestor cheap170glucovance228lipitor "adverse effects"286metformin buy344pravachol adverse effects402trazodone
55actos plus sale113crestor cholesterol171glyburide229lipitor "blood pressure"287metformin cheap345pravachol blood pressure403tricor
56actos plus side effects114crestor cost172glyset230lipitor "side effects"288metformin cost346pravachol buy404trimipramine
57actos plus weight gain115crestor discount173humalog231lipitor adverse effects289metformin diabetes347pravachol cheap405venlafaxine
58actos plus weight loss116crestor generic174humulin232lipitor blood pressure290metformin discount348pravachol cholesterol406vestra
Complete List of Search Queries
Table C.1: List of Search Queries
109
Obs Promotional Informational Promotional Informational Promotional Informational Promotional Informational
1 phentermine effect purchase patient canada treatment cheap legal
2 pills interactions phentermine oral orders natural 20mg limited
3 shipping including shipping lawsuit cheap tips cost promo
4 price possible save statin risk options sold samples
5 viagra serious hosting lawyer hidden out 500mg cure
6 now common offer hydrochloride fees anti guaranteed right
7 save statin cialis withdrawal fedex depressant canadapharmacy injury
8 offers oral genuine treatments beat expert prescription inhaled
9 lowest cause viagra webmd wholesale breaking better avoid
10 cialis patient TRUE lawyers overnight anxiety 30mg possible
11 net important #3634 antidepressant x30 birth 10mg lawyer
12 fast occur #3619 heart accredited defects pills reviewed
13 cheapest include #3585 answers ranked use safe exubera
14 delivery includes delivery handout 10mg calcium assistance linked
15 tramadol lowering tramadol information: 844 member huge damage
16 purchase pdf truly safety 891 support only kidney
17 compare provides brand rhabdomyolysis brand doctor capsules birth
18 easy see trusted encyclopedia onlineover linked 100mg take
19 blog safety sale medlineplus off code medicine discovery
20 levitra nausea shop class x90 zip rosuvastatin attorney
21 pill symptoms home niacin discount check canadian lower
22 soma such catalog type competitors there amp dna
23 worldwide warnings fast lowering satisfaction choose savings test
24 meds risk cost suicide customer lawyer canadadrugs fatigue
25 link learn online: attorney 1800 which download natural
26 pharmacies muscle xanax lawsuits pharmaciesfind membersupport comparison hair
27 xanax consumer store warnings x60 proven program lamictal
28 homepages over easy product guaranteeaccredited smarter trusted aid
29 snewman following top precautions 500mg questions deals performance
30 anti know guaranteed antidepressants beaten really iipitor sexual
31 stmartin experience #3610 articles fastdelivery works celexxa lawsuit
32 discussionboard levels #1072 pcos medisave taking 45mg drugs?
33 store problems compare resistance please nutrition meds defect
34 aciphex prescribed pharmacies learn give ebay off hypertension
35 great help link syndrome processing join pioglitazone infant
36 prescriptions usage pump koop meds know direct pulmonary
37 quality hydrochloride india library tablets breakthrough clearance news
38 cost prescribing #3629 injury fee exciting starting defects
39 licensed heart topic anxiety available one today contraceptive
40 day precautions usa indications x180 recommend china missed
41 products potential #3633 defective convenient most directly oral
42 valium well diabetic mayoclinic lowest productsfind incredibly safe?
43 shop sexual aciphex wikipedia fda solution great detailed
44 sale diarrhea name blood minutes damage rxdrugcard prevachol
45 search adverse #3637 revolution quality loss medication resistance
46 make clinical bravenet litigation medication statins name reverse
47 overnight out levitra hydrobromide sale medicine selling locating
48 2006 many #1086 ssri ringtones research sale need
49 posted exercise #3591 statins 30mg contact tickets products
50 ultram additional xanga oxalate home work nordisk review
Popular Keyword Lists
Organic Sponsored
Summary Title Summary Title
Table C.2: Keywords Used in Classification Algorithm
110
?0.5 0 0.50
1
2
3
4
5
6
7
<?? Informational       Promotional ??>
Density
Sponsored Summary / Dot?Com
Ask
Google
MSN
Yahoo!
?0.5 0 0.50
5
10
15
20
25
30
<?? Informational       Promotional ??>
Density
Sponsored Summary / Dot?Gov
Ask
Google
MSN
Yahoo!
?0.5 0 0.50
0.1
0.2
0.3
0.4
<?? Informational       Promotional ??>
Density
Sponsored Summary / Dot?Edu
Ask
Google
MSN
Yahoo!
?0.5 0 0.50
10
20
30
40
<?? Informational       Promotional ??>
Density
Sponsored Summary / Dot?OrgNetInfo
Ask
Google
MSN
Yahoo!
Figure C.2: Kernel Density of Summary Content, Sponsored Results, By Extension
Dependent Variable Description
Page1 0/1 indicator; 1 if the result appears on page 1 of the result
Independent Variables
dtca_stock total DTCA spending, 1994 - February 2007, billions
dotcom 0/1 indicator; 1 if dot-com
dotgov 0/1 indicator; 1 if dot-gov
dotedu 0/1 indicator; 1 if dot-edu
dotorgnetinfo 0/1 indicator; 1 if dot-org, net, or info
dotintl 0/1 indicator; 1 if dot-us, uk, or ca
dotcom_age interaction term: dotcom and age
dotgov_age interaction term: dotgov and age
dotedu_age interaction term: dotedu and age
dotorg_age interaction term: dotorg and age
dotcom_dtc interaction term: dotcom and dtca_stock
dotgov_dtc interaction term: dotgov and dtca_stock
dotedu_dtc interaction term: dotedu and dtca_stock
dotorg_dtc interaction term: dotorg and dtca_stock
prop_promo_summary Proportion of words in the summary of organic links that are promotional
prop_info_summary Proportion of words in the summary of organic links that are informational
prop_promo_title Proportion of words in the title of organic links that are promotional
prop_info_title Proportion of words in the title of organic links that are informational
Variable Summary for Probit Regression
Table C.3: Variable Definitions
111
References
[1] Aguirregabiria, V. P. Mira, (2006). ?Sequential estimation of dynamic discrete
games.? Econometrica, 75(1), 2006.
[2] Athey, Susan and Glenn Ellison (working paper). ?Position Auctions with
Consumer Search.?
[3] Attanasio, Orazio, (2000). ?Consumer Durables and Inertial Behavior: Esti-
mation and Aggregation of Ss Rules for Automobiles.? Review of Economic
Studies, October 2000.
[4] Bacon, Robert W., (1991). ?Rockets and Feathers: The Asymmetric Speed
of Adjustment of UK Retail Gasoline Prices to Cost Changes.? Energy Eco-
nomics, 13 July 1991.
[5] Bajari, Patrick, Lanier Benkard, and Jonathan Levin, (2007). ?Estimating
Dynamic Models of Imperfect Competition.? Econometrica, 75(5), 2007.
[6] Battelle, John. (2005). The Search: How Google and Its Rivals Rewrote the
Rules of Business and Transformed Our Culture. Penguin Books Ltd, 2005.
[7] Benkard, Lanier, (2004). ?A dynamic analysis of the market for wide-bodied
commercial aircraft.? Review of Economic Studies, 71(3), 2004.
[8] Besanko, David and Ulrich Doraszelski, (2004). ?Capacity Dynamics and En-
dogenous Asymmetries in Firm Size.? The RAND Journal of Economics, Vol.
35, No. 1. Spring 2004.
[9] Besanko, David A., Ulrich Doraszelski, Lauren Xiaoyuan Lu, and Mark A.
Satterthwaite, (2008). ?Lumpy Capacity Investment and Disinvestment Dy-
namics.? Harvard Institute of Economic Research Discussion Paper No. 2154
Available at SSRN: http://ssrn.com/abstract=1117991.
[10] Borenstein, S., C. A. Cameron and R. Gilbert, (1997). ?Do Gasoline Prices
Respond Asymmetrically to Crude Oil Price Changes?? Quarterly Journal of
Economics, 112(1), 1997.
[11] Borenstein, S., (1991). ?Selling Costs and Switching Costs: Explaining Retail
Gasoline Margins.? The RAND Journal of Economics, 22(3), 1991.
[12] Borenstein, S., Andrea Shepard (1996). ?Dynamic Pricing in Retail Gasoline
Markets.? The RAND Journal of Economics, 27(3), 1996.
112
[13] Chowdhury, A., G. Pass, C. Torgeson. (2006). ?A Picture of Search? The
First International Conference on Scalable Information Systems,
Hong Kong, June, 2006.
[14] Day, Ruth. (2006). ?Comprehension of Prescription Drug Information:
Overview of A Research Program.? Proceedings of the American Association
for Artificial Intelligence, Argumentation for Consumer Healthcare. 2006.
[15] Day, Ruth. (2003). ?Understanding Rx drug Information: TV ads, internet,
hardcopy.? U.S. Food and Drug Administration, Public Meeting on Effects of
Direct-to-Consumer Advertising. 2003.
[16] Edelman, Ben and Michael Ostrovsky. (2007). ?Strategic Bidder Behavior in
Sponsored Search Auctions.? Decision Support Systems, 2007.
[17] Energy Information Administration, US Department of Energy, (2007). ?Re-
finery Outages: Description and Potential Impact on Petroleum Product
Prices.? March 2007.
[18] Energy Information Administration, US Department of Energy, (2008). ?A
Primer on Gasoline Prices.? Online: http://www.eia.doe.gov/bookshelf/
brochures/gasolinepricesprimer/index.html [Downloaded: 09/11/2008],
May 2008.
[19] Ericson, Richard, and Ariel Pakes, (1995). ?Markov-Perfect Industry Dynam-
ics: A Framework for Empirical Work.? Review of Economic Studies, 62:1,
53-83, 1995.
[20] Espey, Molly, (1996). ?Explaining Variation in Elasticity of Gasoline Demand
in the United States: A Meta Analysis.? The Energy Journal, 17, 1996.
[21] The Federal Trade Commission, (2006). ?Investigation of Gasoline Price Ma-
nipulation and Post-Katrina Gasoline Price Increases.? Spring 2006.
[22] Ghose, A., and S. Yang. (2009). ?An Empirical Analysis of Search Engine Ad-
vertising: Sponsored Search and Cross-Selling in Electronic Markets.? Forth-
coming in Management Science.
[23] Ghose, A., and S. Yang. (2008). ?An Empirical Analysis of Sponsored Search
Performance in Search Engine Advertising.? Proceedings of the ACM In-
ternational Conference on Web Search and Data-mining Conference
(WSDM 2008), Stanford, February 2008.
113
[24] Ghose, A., and S. Yang. (working paper). ?Organic and Paid Search Adver-
tising: Complements, Substitutes or Neither??
[25] Goldberg, Pinelopi K. and Rebecca Hellerstein, (2008). ?A Structural Ap-
proach to Explaining Incomplete Exchange-Rate Pass-Through and Pricing-
to-Market.? The American Economic Review, 98(2), 2008.
[26] The Government Accountability Office, (2006). ?Energy Markets: Factors
Contributing to Higher Gasoline Prices.? GAO-06-412T. February 2006.
[27] Gron, Anne, Deborah Swenson, (2000). ?Cost Pass-Through in the U.S. Au-
tomobile Market.? The Review of Economics and Statistics, 82(2), 2000.
[28] Hamilton, James D., (1983). ?Oil and the Macroeconomy since World War
II.? The Journal of Political Economy, 91(2), 1983.
[29] Hastings, Justine, Jennifer Brown, Erin Mansur, and Sofia Villas-Boas, (2008).
?Reformulating Competition? Gasoline Content Regulation and Wholesale
Gasoline Prices.? Journal of Environmental Economics and Management, Jan-
uary 2008.
[30] Hotz, V. J., and R. A. Miller, (1993). ?Conditional Choice Probabilities and
the Estimation of Dynamic Models.? Review of Economic Studies, 60:3, 497-
529, 1993.
[31] Hubbard, Glenn, (1986). ?Supply Shocks and Price Adjustment in the World
Oil Market.? The Quarterly Journal of Economics, 101(1), 1986.
[32] Huh, Jisu and Brenda Cude (2004). ?Is the Information Fair and Balanced
in Direct-to-Consumer Prescription Drug Websites?? Journal of Health Com-
munication, 2004.
[33] ICF Consulting, (2005). ?The Emerging Oil Refinery Capacity Crunch: A
Global Clean Products Outlook.? 2005.
[34] Jin, Ginger Zhe and Toshiaki Iizuka. (2005). ?The Effects of Prescription Drug
Advertising on Doctor Visits.? Journal of Economics & Management Strategy,
Fall 2005.
[35] Jin, Ginger Zhe and Toshiaki Iizuka. (2007). ?Direct to Consumer Advertising
and Prescription Choice.? Journal of Industrial Economics, 2007.
114
[36] Knittel, Christopher, Jonathan E. Hughes, and Daniel Sperling, (2008). ?Ev-
idence of a Shift in the Short-Run Price Elasticity of Gasoline Demand.? The
Energy Journal, 29(1), January 2008.
[37] Kreps, David M., Jose A. Scheinkman, (1983). ?Quantity Precommitment
and Bertrand Competition Yield Cournot Outcomes.? The Bell Journal of
Economics, 14(2), Autumn 1983.
[38] Lidderdale, T.C.M. (United States Energy Information Administration),
(1999). ?Environmental Regulations and Changes in Petroleum Refining
Operations.? Online: http://www.eia.doe.gov/emeu/steo/pub/special/
enviro.html [Downloaded: 12/07/2007], 1999.
[39] Nielsen-online.com. (2009). ?April 10, 2009 News Release.? 2009.
[40] Nielsen-online.com. (2008). ?The Second Opinion: How the Web Drives
Healthcase Decisions.? 2009. Webinar presented by Melissa Davies, September
3, 2008.
[41] Noel, Michael D., (2007). ?Edgeworth Price Cycles, Cost-Based Pricing, and
Sticky Pricing in Retail Gasoline Markets.? Review of Economics and Statis-
tics, Vol. 89, 2007.
[42] Pakes, Ariel, (2000). ?A framework for applied dynamic analysis in I.O.?
Working paper no. 8024, NBER, Cambridge, 2000.
[43] Pakes, Ariel, Michael Ostrovsky, and Steven T. Berry, (2004). ?Simple estima-
torsfortheparametersofdiscretedynamicgames(withentry/exitexamples).?
Harvard Institute. Economic Research Discussion Paper No. 2036, May 2004.
[44] Pakes, Ariel and P. McGuire, (1994). ?Computing Markov-perfect Nash equi-
libria: Numerical implications of a dynamic differentiated product model.?
Rand Journal of Economics, 25(4), 1994.
[45] Peterson, D. J. and Sergej Mahnovski, (2003). ?New Forces at Work in Re-
fining: Industry Views of Critical Business and Operations Trends.? Santa
Monica, CA : RAND, 2003.
[46] Rust, John and Harry Paarsch, (forthcoming). ?Valuing Programs with De-
terministic and Stochastic Cycles.? Forthcoming in the Journal of Economic
Dynamics and Control.
115
[47] Rust, John, (2008). ?Dynamic Programming.?
The New Palgrave Dictionary of Economics. Second Edition. Eds. Steven N.
Durlauf and Lawrence E. Blume. Palgrave Macmillan, 2008.
[48] Rust, John, (1987). ?Optimal Replacement of GMC Bus Engines: An Empir-
ical Model of Harold Zurcher.? Econometrica, 55:5, 999-1033, 1987.
[49] Ryan, Stephen, (forthcoming). ?The Costs of Environmental Regulation in a
Concentrated Industry.? Forthcoming in Econometrica.
[50] Tirole, Jean, (1988). The Theory of Industrial Organization. Cambridge, MA:
M.I.T. Press. 1988.
[51] The United States Senate, (2002). ?Gas Prices: How are they Really
Set?? Online: http://www.senate.gov/~gov_affairs/042902gasreport.
htm [Downloaded 10/01/2007], May 2002.
[52] Varian, H. (2007). ?Position Auctions.? International Journal of Industrial
Organization, 2007.
116