ABSTRACT 

 
Title of Dissertation: CYCLING AROUND THE CLOCK: 

MODELING BIKE SHARE TRIPS AS HIGH-

FREQUENCY SPATIAL INTERACTIONS 

  
 Zheng Liu, Doctor of Philosophy, 2023 

  
Dissertation directed by: Assistant Professor Taylor Oshan, 

Department of Geographical Sciences 

 
Spatial interactions provide insights into urban mobility that reflects urban livability. A range of 

traditional and modern urban mobility models have been developed to analyze and model spatial 

interaction. The study of bike-sharing systems has emerged as a new area of research, offering 

expanded opportunities to understand the dynamics of spatial interaction processes. This 

dissertation proposes new methods and frameworks to model and understand the high-frequency 

changes in the spatial interaction of a bike share system. Three challenges related to the spatial and 

temporal dynamics of spatial interaction within a bike share system are discussed via three studies: 

1) Predicting spatial interaction demand at new stations as part of system infrastructure expansion; 

2) Understanding the dynamics of determinants in the context of the COVID-19 pandemic; and 3) 

Detecting events that lead to changes in the spatial interaction process of bike share trips from a 

model-based proxy. The first study proposes a hybrid strategy to predict 'cold start' trips by 

comparing flow interpolation and spatial interaction methods. The study reveals 'cold start' stations 


with different classifications based on their locations have different best model choices as a hybrid 

strategy for the research question. The second study demonstrates a disaggregated comparative 

framework to capture the dynamics of determinants in bike share trip generation before, during, 

and after the COVID-19 lockdown and to identify long-term bike share usage behavioral changes. 

The third study investigates an event detection approach combining martingale test and spatial 

interaction model with specification evaluation from simulated data and explorative examination 

from bike share datasets in New York City, Washington, DC, and San Francisco. Results from the 

study recognize events from exogenous factors that induced changes in spatial interactions which 

are critical for model evaluation and improvement toward more flexible models to high-frequency 

changes. The dissertation elaborated and expanded the spatial interaction model to more 

effectively meet the research demands for the novel transportation mode of bike-share cycling in 

the context of a high-frequency urban environment. Taken as a whole, this dissertation contributes 

to the field of transportation geography and geographic information science and contributes 

methods toward the creation of improved transport systems for more livable cities. 

 
CYCLING AROUND THE CLOCK:  

MODELING BIKE SHARE TRIPS AS HIGH-FREQUENCY SPATIAL INTERACTIONS 

 
by 

 
          Zheng Liu 

 
Dissertation submitted to the Faculty of the Graduate School of the 

University of Maryland, College Park, in partial fulfillment 

of the requirements for the degree of 

Doctor of Philosophy 

2023 

 
Advisory Committee: 

Dr. Taylor Oshan, Chair 

Dr. Kathleen Stewart 

Dr. Yiqun Xie 

Dr. Giovanni Baiocchi 

Dr. Vanessa Frias-Martinez 


© Copyright by 

Zheng Liu 

2023 

 
 ii 

Dedication 

 
To my parents, 

 
Honglei Zheng (郑红雷) and Guangdou Liu (刘光斗), 

for their unconditional support and love through the journey. 


 iii 

Acknowledgements 

First and foremost, I would like to express my profound gratitude to my doctoral advisor, Dr. 

Taylor Oshan. Without his unwavering support and guidance, I would not have been able to 

successfully navigate the challenges of the PhD program and reach the finish line. Dr. Oshan 

provided me with invaluable feedback on my research helping me to refine my ideas and develop 

a clear focus for my dissertation. Beyond his professional expertise, Dr. Oshan was a constant 

source of encouragement and support throughout my time in the program, always taking the time 

to offer a listening ear or a word of wisdom when needed. 

 
In addition to my gratitude towards Dr. Oshan, I would like to express my sincere appreciation to 

my dissertation committee members, Dr. Kathleen Stewart, Dr. Yiqun Xie, Dr. Giovanni Baiocchi, 

and Dr. Vanessa Frias-Martinez. Their constructive comments and feedback on my dissertation 

have been invaluable in helping me to improve my work. Moreover, I am deeply grateful to each 

of them for the advice, encouragement, and opportunities they have provided throughout my PhD 

journey. Their mentorship and support have been instrumental in shaping my academic growth 

and development, and I am honored to have had them as my committee members. 

 
My thanks go to all friends in the Center for GIS, Zhiyue, Hai, Jeff, Peiqi, Xin, Yuehui, Weiye, 

Riccardo, Federico, and Xiaoguo for the wonderful work and life experiences. My thanks also go 

to all the colleagues in the Oshan group, Mengyu, Victor, and Ushashi, for discussing my research 

and providing me with substantial help in the development of my dissertation. 

 
 iv 

I would like to thank to Guimin, Yunting, Yimeng, Junchuan, Lei, Xiuxia, Xiaodong, Yu, Ruomin, 

Fengyu, Yuqian, Jeremiah, Tianyu, as well as faculty and cohorts from the MSGIS program who 

offered me unforgettable memories and made my PhD life not alone. Special thanks to all my 

friends from the Lakeside North community, Yao, Jiaming, Aolin, Yanjia, Cheng, and Yuhan, who 

left the most pieces of memories ever since my life in the US. Beyond the fun times, their support 

has been invaluable in helping me navigate the many challenges that life has thrown my way. 

 
A special thanks to my girlfriend, Diyang Cui. From the very beginning of our story, she has been 

there for me, providing the emotional support and motivation I needed to push through even the 

most difficult moments. I am eternally grateful to her for all that she has done for me, and my 

gratitude to her has no expiration date.   

 
Finally, I am indebted to my parents for offering solid backing for me. Their constant love and 

encouragement have been a source of strength for me during the six years that I lived on the other 

side of the world pursuing my degree. 

 
 v 

Table of Contents 

Dedication ....................................................................................................................................... ii 
Acknowledgements ........................................................................................................................ iii 
Table of Contents ............................................................................................................................ v 
List of Tables ................................................................................................................................ vii 
List of Figures .............................................................................................................................. viii 
List of Abbreviations ...................................................................................................................... x 
Chapter 1: Introduction ................................................................................................................... 1 

1.1 Background and motivation .................................................................................................. 1 
1.2 Research objectives ............................................................................................................. 10 
1.3 Dissertation outline ............................................................................................................. 13 

Chapter 2: Comparing spatial interaction models and flow interpolation techniques for predicting 

'cold start' bike share trip demand ................................................................................................. 15 
2.1 Abstract ............................................................................................................................... 15 
2.2 Introduction ......................................................................................................................... 15 
2.3 Background ......................................................................................................................... 18 
2.4 Methodology ....................................................................................................................... 23 

2.4.1 Problem statement ........................................................................................................ 23 
2.4.2 Station classification .................................................................................................... 24 
2.4.3 Modeling strategies ...................................................................................................... 27 

2.5 Data ..................................................................................................................................... 34 
2.5.1 Study area..................................................................................................................... 34 
2.5.2 Study design ................................................................................................................. 35 

2.6 Results ................................................................................................................................. 36 
2.6.1 Gravity-type spatial interaction model calibration ...................................................... 36 
2.6.2 Prediction results .......................................................................................................... 39 
2.6.3 Spatial trends ................................................................................................................ 40 

2.7 Discussion ........................................................................................................................... 42 
2.8 Conclusion .......................................................................................................................... 44 

Chapter 3: Modeling the temporal dynamics of bike share trip determinants and the impact of the 

COVID-19 pandemic .................................................................................................................... 46 
3.1 Abstract ............................................................................................................................... 46 
3.2 Introduction ......................................................................................................................... 47 
3.3 Data and methods ................................................................................................................ 49 

3.3.1 Independent variables .................................................................................................. 49 
3.3.2 COVID-19 pandemic semantics .................................................................................. 51 
3.3.3 Spatial interaction model ............................................................................................. 52 
3.3.4 Variable selection......................................................................................................... 52 

3.4 Results ................................................................................................................................. 53 
3.4.1 Bike trip overview........................................................................................................ 53 
3.4.2 Variable selection......................................................................................................... 54 
3.4.3 Yearly trends ................................................................................................................ 58 
3.4.4 Weekly trends .............................................................................................................. 60 

3.5 Discussion ........................................................................................................................... 68 
3.6 Conclusion .......................................................................................................................... 73 


 vi 

Chapter 4: Event detection in bike share trip behaviors with spatial interaction models and 

martingales .................................................................................................................................... 75 
4.1 Abstract ............................................................................................................................... 75 
4.2 Introduction ......................................................................................................................... 75 
4.3 Methods............................................................................................................................... 79 

4.3.1 Exchangeability and martingales ................................................................................. 80 
4.3.2 Martingale test ............................................................................................................. 81 
4.3.3 Adaptations .................................................................................................................. 83 
4.3.4 Simulation design......................................................................................................... 84 
4.3.5 Empirical design .......................................................................................................... 86 

4.4 Results ................................................................................................................................. 89 
4.4.1 Simulation .................................................................................................................... 89 
4.4.2 Empirical work in NYC ............................................................................................... 93 
4.4.3 Comparison with additional US cities - Washington, DC and San Francisco ............. 98 

4.5 Discussion ......................................................................................................................... 107 
4.6 Conclusion ........................................................................................................................ 112 

Chapter 5: Conclusion................................................................................................................. 114 
5.1 Review of dissertation....................................................................................................... 114 
5.2 Significant contributions ................................................................................................... 117 
5.3 Future work ....................................................................................................................... 119 
5.4 Concluding remarks .......................................................................................................... 121 

Appendices .................................................................................................................................. 122 
Appendix A. Supplementary materials for Chapter 3 ............................................................. 122 
Appendix B. Supplementary materials for Chapter 4 ............................................................. 122 

Bibliography ............................................................................................................................... 123 
 

 vii 

List of Tables 

Table 1. Bike share total trip count, mean trip duration, and mean trips per day in the NYC system 

from 2019 to 2021. .......................................................................................................................... 5 
Table 2. Numbers of stations added over time by month and year. .............................................. 25 
Table 3. Attributes used in the spatial interaction models in Chapter 2. ...................................... 29 
Table 4. A summary of the prediction results based on Pearson’s R for each method using data for 

each classification category and using all data. Standard deviations are in parentheses after the 

mean values. .................................................................................................................................. 39 
Table 5. Variables for model selection in Chapter 3. ................................................................... 51 
Table 6. Variance inflation factor values with different independent variables for the spatial 

interaction models. ........................................................................................................................ 58 
Table 7. Yearly spatial interaction model coefficients and 95% confidence intervals. ................ 59 
Table 8. Simulation results for each setting, and mean delay time for detecting the true change 

event. ............................................................................................................................................. 90 
Table 9. Highlighted markers in the sequence of martingale values from the NYC experiment and 

associated events, with occurrence time, cause of the event, and the delay compared to the 

occurrence time (if applicable). .................................................................................................... 95 
Table 10. Highlighted markers in the sequence of martingale values from the Washington, DC 

experiment and associated events, with occurrence time, cause of the event, and the delay 

compared to the occurrence time (if applicable)......................................................................... 102 
Table 11. Highlighted markers in the sequence of martingale values from the San Francisco 

experiment and associated events, with occurrence time, cause of the event, and the delay 

compared to the occurrence time (if applicable)......................................................................... 106 
Table B.1. Results of variable selection for three different cities using the method in section 3.3.4.

..................................................................................................................................................... 122 
 

 viii 

List of Figures 

Figure 1. Citi Bike trip arrival changes from 2019 to 2020 and 2020 to 2021 in NYC, with each 

hexagon categorized primarily as decreasing trips, increasing trips, and new stations installed. .. 3 
Figure 2. Conceptual framework of the dissertation. .................................................................... 11 
Figure 3. Illustration of areal natural neighbor interpolation and ordinary kriging interpolation with 

toy data. ......................................................................................................................................... 32 
Figure 4. Spatial distribution of bike stations color-coordinated by the three derived classes for 

newly added stations and the original stations. ............................................................................. 35 
Figure 5. Top: Feature weight from the SI model results averaged over the time periods. The X-

axis represents the absolute value of the parameter estimates and the color represents the sign of 

the average magnitude. The dark green strip shows the average range. Bottom: Time series of top 

11 largest feature weights. The X axis is the number of weeks elapsed since 01/01/2015. ......... 37 
Figure 6. Spatial distribution of the best method for flow prediction at each 'cold start' station.. 41 
Figure 7. Weekly trip numbers in the NYC bike share system from 2019 to 2021. ..................... 53 
Figure 8. Regularization paths for the LASSO results. A more important variable ends in y=0 with 

a larger lambda. ............................................................................................................................. 55 
Figure 9. Pairwise scatter plot for selected variables in the spatial interaction design matrix for 

2019 model.................................................................................................................................... 57 
Figure 10. Top: COVID cases by week. Bottom: Destination capacity coefficient estimates and 

95% confidence interval by week. ................................................................................................ 61 
Figure 11. Top: COVID cases by week. Bottom: Distance decay coefficient estimates and 95% 

confidence interval by week. ........................................................................................................ 63 
Figure 12. Top: COVID cases by week. Bottom: Destination recreation opportunities coefficient 

estimates and 95% confidence interval by week .......................................................................... 65 
Figure 13. Top: COVID cases by week: Bottom. Destination employment coefficient estimates 

and 95% confidence interval by week. ......................................................................................... 67 
Figure 14. Demonstration of trend breaks from April to June 2020 in model performance 

(Pearson’s R) from spatial interaction models trained with different periods of data. ................. 78 
Figure 15. Algorithm of event detection based on martingale test. .............................................. 83 
Figure 16. A demonstration of random distribution of 25 simulated sites. .................................. 86 
Figure 17. Relationship between OD pairs with non-zero flows and proportion (p) of time periods, 

e.g., weekdays, in the NYC system. ............................................................................................. 86 
Figure 18. Left: Station distribution in NYC Citi Bike. Right: Illustrations of consistent OD flows 

in NYC Citi Bike with 85% time periods having positive flows. ................................................. 88 
Figure 19. Left: Summary of 50 runs for four simulations. Right: A set of martingale test values 

by time in one sample experiment of “cost” change, with a red line marking the time of change.

....................................................................................................................................................... 90 
Figure 20. Delay times and recall rates from an experiment on the relationship of the delay time 

and parameter change. .................................................................................................................. 91 
Figure 21. Recall and false positive rate from experiments with magnitude of event using the cost 

factor. ............................................................................................................................................ 92 
Figure 22. Martingale values and events of interest (marker A-F) from weekday samples in the 

NYC system. ................................................................................................................................. 94 
Figure 23. Martingale test value for weekday samples and distance decay from daily spatial 

interaction models. Red lines mark the events from A to F except for C in Table 9.................... 96 


 ix 

Figure 24. Martingale values (log-scaled) (Top) compared with trip numbers (Middle) and change 

scores from the changefinder algorithm (Bottom) using the weekday samples. .......................... 98 
Figure 25. Bottom: Bike station distribution in Washington, DC Capital Bikeshare. Top: 

Illustrations of consistent OD flows in Washington, DC Capital Bikeshare with 70% time periods 

having positive flows .................................................................................................................. 101 
Figure 26. Martingale values and events of interest (marker A-D) from weekday samples in the 

Washington, DC system.............................................................................................................. 102 
Figure 27. Bike station distribution in San Francisco Baywheels (Bottom-Left) and illustrations of 

consistent OD flows in the areas of downtown (Top-Left), East Bay (Top-Right), and San Jose 

(Bottom-Right) with 60% time periods having positive flows. .................................................. 104 
Figure 28. Martingale values and events of interest (marker A-G) from weekday samples in the 

San Francisco system. Top: values calculated from mixture martingales. Bottom: values calculated 

from power martingales (ϵ=0.92)................................................................................................ 105 
Figure A.1. Cumulative daily number of individuals fully vaccinated against COVID-19 in New 

York City in 2021 ....................................................................................................................... 122 
 

 x 

List of Abbreviations 

COVID-19: Coronavirus disease caused by the SARS-CoV-2 virus 

DC: Washington, DC 

GBFS: General Bikeshare Feed Specification 

GPS: the Global Positioning System 

ICT: Information and Communications Technology 

IOT: Internet of Things 

LASSO: Least Absolute Shrinkage and Selection Operator 

LSTM: Long Short-Term Memory 

NYC: New York City 

OD: Origin-destination 

OK: Ordinary Kriging 

POI: Point of interest 

RK: Regression Kriging 

RNN: Recurrent Neural Networks 

SI: Spatial interaction 

US: the United States 

VIF: Variance Inflation Factor 


 1 

Chapter 1: Introduction 

1.1 Background and motivation 

The increasing trend of urbanization in the past century has resulted in over half of the world's 

population now living in cities (United Nations, Department of Economic and Social Affairs, 

Population Division, 2019). This rapid urbanization has created a pressing need to gain a deeper 

understanding of urban infrastructure, population distribution, and job opportunities, which are 

crucial indicators of urban livability (Bassolas et al., 2019). A key aspect of this livability is the 

rising human mobility within urban areas, which serves as an essential observation for urban 

planners. Thus, it has become increasingly important to investigate and anticipate human mobility 

both within and between cities. Spatial interactions or origin-destination flows, which refer to the 

movement of people and goods between locations, are vital indicators of urban socio-economic 

activities and provide insights into the interdependence between different places. Studying spatial 

interactions of human mobility can gain valuable insights into the factors that influence urban 

livability and create more sustainable and equitable cities for all. 

The study of spatial interaction and mobility has long been a central focus of transportation and 

geographical science, as understanding and predicting the flows of people, goods, and information 

across space is vital for numerous fields, including urban planning, transportation, and regional 

development. A range of methods have been developed to analyze and model spatial interaction, 

including both traditional approaches and new techniques that leverage recent advances in 

computing and artificial intelligence. Traditional methods such as spatial interaction models 

(Fotheringham & O’Kelly, 1989; Oshan, 2020; Wilson, 1971) are widely used to understand the 


 2 

spatial interaction generation process in telecommunication, commodity goods, and migration in 

the past decades. These models use statistical techniques to estimate the flows of people or goods 

between different locations, based on factors such as distance, accessibility, and infrastructure. In 

recent years, modern urban mobility methods such as deep learning have gained attention in the 

study of spatial interaction, due to the development of more powerful computing resources and the 

increased use of artificial intelligence. Among the different targets of urban mobility models, there 

are (1) trip-based origin-destination (OD) demand (Z. Cheng et al., 2021; Ke et al., 2019; L. Liu 

et al., 2019; Y. Wang et al., 2019; Xiong et al., 2020); (2) location-based demand (Ai et al., 2019; 

P.-C. Chen et al., 2020; Geng et al., 2019; S. Guo et al., 2019; Li & Axhausen, 2020; Lin et al., 

2019; Z. Pan et al., 2020; Ren, Chen, et al., 2019; Sun et al., 2020; Y. Yang et al., 2020); and (3) 

volumetric demand (L. Cai et al., 2020; Cui et al., 2019; Do et al., 2019; Ma et al., 2019; Qu et al., 

2019; Ren, Cheng, et al., 2019; Shleifer et al., 2019; B. Yang et al., 2019; Y. Zhang et al., 2020; 

Zhang & Xin, 2020). Trip-based OD demand, also referred to spatial interaction in geography and 

other disciplines, has focused on enhancing predictions with the recent advances in statistical 

learning and deep learning at various temporal scales utilizing big geospatial data. 

The availability of geo-located traces from Information and Communications Technologies (ICTs) 

has revolutionized the study of urban mobility, providing much greater spatiotemporal resolution 

than was previously possible. However, the use of individual-level traces from mobile phone 

records can be limited by concerns around privacy and consistency of data availability. As a result, 

anonymous flow records from transportation systems have become a popular choice for mobility 

modeling studies. Historically, public transportation systems such as buses, rail, and taxicabs have 

formed the backbone of cities, by facilitating interaction and providing an affordable and reliable 

alternative to private transportation. More recently, bike share systems have emerged as a new 


 3 

form of transportation that can be more easily changed or expanded due to their lightweight and 

affordable infrastructure. Docked bike share systems, in particular, have become popular in many 

cities, offering daily trips with detailed time information that create new opportunities to study the 

dynamics of spatial interaction processes. Many articles have been dedicated to build the 

relationship between bike-share trips and its determinants. Factors related to socio-demographics, 

public transit, system infrastructure, street features, land use or places of interests, and weather are 

frequently included in the regression model with bike-share trip demand (Hu et al., 2022; A. Li et 

al., 2020; Maas et al., 2021; Noland et al., 2016; Ross-Perez et al., 2022; Schimohr & Scheiner, 

2021; X. Wang et al., 2021; Younes et al., 2020; Zacharias & Meng, 2021). 

 
Figure 1. Citi Bike trip arrival changes from 2019 to 2020 and 2020 to 2021 in NYC, with each 

hexagon categorized primarily as decreasing trips, increasing trips, and new stations installed. 

This dissertation aims to address unsolved challenges that arise for this new form of transportation 

by proposing new methods and frameworks to model and understand the high-frequency changes 


 4 

in the spatial interaction from a bike share system. Taking the New York City (NYC) system 

demonstrated in Figure 1 above as an example, there are several types of changes from 2019 to 

2021. The blue hexagons contained new stations installed within the year, the red hexagons 

experienced a decrease in trip arrivals compared to the previous year, and the green hexagons 

marked an increase in the total trip arrivals. This exploratory data analysis from the NYC system 

has shown at least two challenges to predict the associated spatial interaction flows. The first 

challenge is the changing layout of the stations, which does not allow the direct application of 

models like Recurrent Neural Networks (RNN) or Long Short-Term Memory (LSTM) networks 

that take historical flows as input. A new approach is needed to deal with trips at those new stations. 

The second challenge stems from the changes associated with the COVID-19 pandemic as the total 

trips in 2020 dropped slightly compared to 2019 but total trips in 2021 exceeded those from 2019. 

In Table 1 below, changes in the total trips and trip duration across three years suggest the spatial 

interaction model calibrated before the pandemic may no longer be valid for the post-pandemic 

era (Werner et al., 2022). Such changes also imply that the health benefits of bike share are 

recognized by people and lead to a much stronger recovery after the COVID-19 pandemic. This 

short-term and long-term motivation for adopting the bike share brings a huge potential for the 

expansion of bike share systems in urban areas in the near future. Therefore, there is a critical need 

for developing new tools to understand the dynamics of spatial interactions from this new form of 

transportation. To achieve this goal, three studies are conducted that build on the theoretical basis 

of spatial interaction models. Specifically, the unconstrained gravity model (Wilson, 1971) is used 

as the core spatial interaction model across the three studies. Methodological frameworks that 

build from and incorporate spatial interaction models are introduced in each of the three main 


 5 

chapters to accomplish the research objectives whose motivation and background are summarized 

below. 

 
Table 1. Bike share total trip count, mean trip duration, and mean trips per day in the NYC 

system from 2019 to 2021. 

Year 2019 2020 2021 

Total trips 20,551,517 19,506,857 27,551,166 

Mean trip duration 16:18 21:50 16:30 

Mean trips per day 56,306 53,297 75,483 

 
The first study in the dissertation is motivated by the aforementioned challenges from new stations 

as the system evolves. Among different transportation modes, bike share can be analyzed with 

models proposed for other transportation modes but it has unique features that require special 

treatments and make recent cutting-edge trip prediction methods not applicable: individual bike-

sharing programs are often piloted for a limited number of zones within an urban area that are 

known to have relatively high overall activity and transport demand and then systems are expanded 

and updated according to the evolution of demand over time and across space. Previous flows 

across stations are essential input in almost all cutting-edge spatial interaction prediction methods. 

However, the dynamic development of bike-sharing systems brings in a large amount of new bike 


 6 

stations into the system without any historical flows. This feature prevents a straightforward 

deployment of the majority of cutting-edge models for flow prediction. A new methodology is 

hence needed to avoid the issue of missing previous flows when using cutting-edge prediction 

methods. Therefore, study 1 investigates a methodology for predicting OD flows of bike share 

systems given that the system evolves over time and demand is constantly changing. More 

specifically, this work focuses on predicting demand for a new station where there may be no 

knowledge of previous flows, also known as a ‘cold start’ station, which occurs when the system 

is progressively updated in order to maintain efficiency and grow the ridership of bike share 

systems.  

Flow prediction using spatial interaction models and flow interpolation methods are two directions 

that can be used to compute flows at a new station. On one hand, spatial interaction models are 

calibrated from the system-wide flow generation processes, so predictions are made based on the 

general relationships between the independent variables and all spatial interactions between all 

origins and destinations.  On the other hand, interpolation methods may incorporate data borrowed 

from nearby locations and incorporate local information that is not captured in a spatial interaction 

model. However, flow interpolation techniques are under-discussed with little literature on either 

their implementation or application. For example, only an areal natural neighbor method has been 

introduced for flow interpolation so far (Jang & Yao, 2011; Šimbera & Aasa, 2019). In study 1, a 

new Kriging-based interpolation method is proposed by adopting Ordinary Kriging and Regression 

Kriging in the flow domain. Along with the areal-based kriging method proposed in previous work, 

these interpolation methods are compared to spatial interaction models. Of note is that Regression 

Kriging also incorporates a spatial interaction model as the regression step before "kriging" the 

residual of the regression output. This array of potential methods is tested within the NYC system, 


 7 

which is the longest and largest bike share system in the US, to get an initial result for the 

effectiveness of the newly proposed flow interpolation method against each alternative. As a result, 

a hybrid strategy is suggested that uses a different method for different station classifications based 

on the location of a new station in comparison to existing stations. 

Human mobility in major US cities was heavily impacted during the COVID-19 pandemic and the 

consequential lockdown period, resulting in significant changes in general mobility metrics (Kang 

et al., 2020; Y. Pan et al., 2020; Sevtsuk et al., 2021), metro ridership (H. Wang & Noland, 2021), 

bike share systems (Padmanabhan et al., 2021), and micro-transit (Y. Zhou et al., 2021). As will 

be seen in this research, changes in the coefficients in the spatial interaction models indicate that 

the COVID-19 pandemic affects the trip generation processes of cycling. Recent studies on bike 

share usage also revealed changes in the bike share trip duration during the COVID-19 pandemic. 

For example, in NYC there was a nearly 40% increase in the average trip duration in March 2020 

compared to the month before COVID (Teixeira & Lopes, 2020). Similar increases were also 

observed in other US cities like Boston and Chicago (Padmanabhan et al., 2021) as well as in some 

European cities such as London, UK (Heydari et al., 2021), Zurich, Switzerland (Li et al., 2021) 

and Kosice, Slovakia (Kubaľák et al., 2021). However, beyond changes in ridership volume, there 

is a lack of evidence regarding the processes driving changes in human behavior, such as the effects 

of distance and attraction factors at different locations. The motivation of the second study is 

therefore to use spatial interaction models calibrated over different periods of time to investigate 

how behavior related to bike share spatial interaction processes evolve before, during, and after 

the COVID-19 lockdown.  

As a result, the second study presents a disaggregated comparative modeling framework using two 

different time scales, yearly and weekly, to capture the dynamics of the determinants in the bike 


 8 

share usage. A similar framework was proposed to split taxi trips by monthly and hourly subsets 

(Oshan, 2020), but it has never been applied to a bike-sharing dataset. NYC Citi Bike is selected 

again as it has the largest usage of bike share systems in the US and is the most frequently discussed 

in the related literature (Chai et al., 2018; P.-C. Chen et al., 2020; X. Chen & Jiang, 2022; Faghih-

Imani, Anowar, et al., 2017; Kong et al., 2020; Liu et al., 2016; Teixeira & Lopes, 2020; Wang & 

Noland, 2021). In order to get a parsimonious interpretation of determinants from the model 

calibration procedure, model selection is first performed to obtain a subset of the most important 

variables. Model selection with the LASSO (Least Absolute Shrinkage and Selection Operator) 

regression assigns penalties to the covariates by shrinking the associated coefficients until they 

converge to zero. The path of coefficients toward zero reflects the relative strength of each variable 

in explaining the observations of the dataset. Yearly modeling and weekly modeling of spatial 

interaction from the bike share provide a mechanism to observe the dynamics of determinants as 

a time series. An intellectual significance of the study is that further contextualization of the 

dynamics was carried out by relating trends from the coefficient time series to features of the 

COVID-19 pandemic timeline to help interpret the results. 

The third study in the dissertation builds on the findings of study 2, which showed that there were 

behavioral changes in bike share usage during the COVID-19 pandemic, particularly during the 

lockdown. Behavioral changes associated with trip volume have already been discussed by (X. 

Chen & Jiang, 2022). This study aims to go beyond this by investigating what other events can 

cause changes in the spatial interaction processes generating bike share trips and how these events 

can be detected. Detecting such events that may alter the spatial interaction process is important 

for the application of study 1 as it highlights the importance of the temporal dependency towards 

the prediction and understanding of spatial interaction. Event detection is another area where the 


 9 

temporal subset comparison framework proposed in study 2 can be adopted for events that are less 

obvious than the COVID-19 pandemic and may have a smaller impact. Unlike the ubiquitous 

impacts of COVID-19, other events, such as weather, that are critical to the decision to cycle (Bean 

et al., 2021), can significantly influence bike-sharing over much shorter time horizons. However, 

these events may be more difficult to isolate, and their ephemeral nature means that the behavioral 

changes may occur more quickly without lasting very long. As a result, incorporating a finer time 

scale may enhance event detection. 

The problem of event or anomaly detection has a large literature due to the availability of time 

series data in a wide range of domains, including security and surveillance (Karbalaie et al., 2022), 

traffic and crowd monitoring (Djenouri et al., 2019), business intelligence (Ranco et al., 2015), 

social media analysis (T. Cheng & Wicks, 2014; Vioulès et al., 2018), environmental studies 

(Meyer et al., 2019), medical diagnosis (Ukil et al., 2016), and manufacturing (Pittino et al., 2020). 

Various methods of event detection have been implemented for analyzing data streams from 

transportation systems, such as likelihood ratio tests (Pang et al., 2011) and k-nearest neighbor 

(Dang et al., 2015). However, many of these methods focus on processing events from individual 

locations (i.e., origins or destinations), while origin-destination events related to spatial interaction 

have received less attention. As a result, a novel method is explored to detect events in spatial 

interaction processes by adapting the martingale framework for data streams (Cherubin et al., 2018; 

Ho, 2005; Ho & Wechsler, 2010) in combination with spatial interaction models. The martingale 

framework has several advantages, including compatibility with regression-based data-generating 

processes, potential for “on-line” streaming deployment, and ease of tuning. These benefits 

increase the significance of introducing such a framework into spatial interaction models that are 

typically time agnostic. To evaluate this new method, simulated spatial interaction flows are first 


 10 

used to validate the ability of the method to detect known parameter changes that represent data 

generating processes (i.e., conditional associations). Then, the framework is tested on empirical 

data from three US cities, including NYC, which has one of the highest bike share usage rates in 

the country, as well as Washington, DC and San Francisco as two other major urban areas that are 

not as densely populated. The results demonstrate the framework's ability to extract different types 

of events from the data. 

 
1.2 Research objectives 

Overall, this dissertation aims to develop a comprehensive framework for understanding and 

predicting dynamically evolving spatial interaction demand in a bike share system, as shown in 

Figure 2. By doing so it addresses several research questions that were not adequately explored in 

previous studies. The framework proposed in this study addresses three critical challenges related 

to spatiotemporal dynamics in a bike share system. The first challenge is related to the expansion 

of system infrastructure causing limitations at newly added stations for models like LSTM that 

require historical flows. The second challenge comes from a gap in the knowledge about the 

changes in the processes generating bike trips. Although bike share trips after the lockdown due 

to the initial outbreak of the COVID-19 pandemic recovered to the pre-pandemic level quickly, 

the mean trip duration significantly increased in 2020 compared to 2019 suggesting behavioral 

changes from the public in making the decision to cycle during and after the lockdown. Such 

behavioral changes associated with COVID-19 are not fully addressed in the previous studies by 

analyzing changes in trip numbers. The last challenge comes from anomalies or events that cause 

smaller impacts on the trip generation process and are too subtle to be explained by weekly spatial 


 11 

interaction models. To address this research gap, the third study employs an event detection 

framework that combines the martingale test and spatial interaction model to detect the events 

associated with bike share behavior (i.e., changes in coefficients over time). Study 2 directly uses 

coefficients from spatial interaction models while the regression kriging method in study 1 and the 

martingale test in study 3 both use the residual of the spatial interaction model as the input, making 

the spatial interaction model a common thread that connects the three studies and updates this 

classic spatial analysis framework to better accommodate spatiotemporal data in the era of big data. 

 
Figure 2. Conceptual framework of the dissertation.  

 
The objectives of each research topic and associated research questions are as follows: 

Research objective of study 1. Infrastructure expansion in a bike share system will present a 

challenge for demand prediction at the station level as most prevailing modeling methods (i.e., 

LSTM models) rely on historical flows at each station. The absence of historical data for newly 


 12 

added stations makes such models unavailable and thus requires a novel solution for predicting 

‘cold start’ trips at the new stations, either from direct data borrowing interpolation or from indirect 

model-based interpolation. The first study compares methods and proposes a hybrid strategy taking 

advantage of both aspects by answering the following research questions. 

Research question 1a. How do interpolation methods, including areal natural neighbor, 

Ordinary Kriging, and Regression Kriging, compare to spatial interaction models for 

predicting the 'cold start' bike trips? 

Research question 1b. What is the best strategy for 'cold start' bike trip prediction at 

different locations? 

Research objective of study 2. Bike share usage during the COVID-19 pandemic changed on a 

much larger scale than the pre-pandemic routine, implying changes in the driving factors of bike 

trips. Dynamics of determinants of bike share trips during routine activities as well as during and 

after the COVID-19 lockdown were analyzed through a disaggregate comparative framework and 

by addressing the related questions below. 

Research question 2a. From a long-term perspective, what are the changes in the generating 

factors of bike trips before COVID-19 in 2019 and across different phases of COVID-19 

in 2020 and 2021? 

Research question 2b. From a short-term view, how did the weekly determinants of bike 

trips change with the evolution of COVID-19? 

Research objective of study 3. Short-term changes in the spatial interaction that are not explained 

using the approach from study 2 are considered potential events associated with trend breaks in 


 13 

the spatial interaction processes over a long time period. The third study explores a method to 

detect the events that can lead to behavioral changes in the spatial interaction processes related to 

bike share trips. The proposed method is developed using a martingale framework, which is 

evaluated to answer the research questions below. 

Research question 3a. To what extent can events be detected using model-based proxies 

for changes in human behavior? 

Research question 3b. How to effectively combine the spatial interaction model and the 

martingale framework to detect events? 

Research question 3c. What types of events can be reliably detected within different US 

cities from a multiyear observation from 2018 to 2021? 

 
1.3 Dissertation outline 

The dissertation is organized into five chapters. The overall goal of the dissertation is to develop 

methods to help understand and predict high-frequency spatial interactions produced by bike share 

systems. Chapter 1 describes the essential background and motivations for three research 

objectives. Chapter 2 presents a hybrid approach to predict ‘cold start’ bike trips at newly added 

stations in a docked bike share system. Chapter 3 introduces a disaggregated comparative 

framework for capturing and understanding changes over time in the determinants of spatial 

interactions associated with the COVID-19 pandemic. Chapter 4 continues the focus on changes 

in spatial interaction processes by adapting and evaluating an event detection framework based on 

the martingale test. The utility of the event detection method is tested using simulated data and by 


 14 

applying it to three different US cities. Chapter 5 concludes the dissertation by reviewing the key 

findings from the three studies, summarizing significant contributions from this research, and 

discussing the limitations and potential areas of future work. 

  
 15 

Chapter 2: Comparing spatial interaction models and flow interpolation 

techniques for predicting 'cold start' bike share trip demand 

2.1 Abstract 

Bike-sharing systems are expanding rapidly in metropolitan areas all over the world and individual 

systems are updated frequently over space and time to dynamically meet demand. Usage trends 

are important for understanding bike demand, but an overlooked issue is that of ‘cold starts’ or the 

prediction of demand at a new station with no previous usage history. This chapter explores a 

methodology for predicting bike trips from and to a 'cold start' station in the NYC Citi Bike system. 

Specifically, gravity-type spatial interaction models and spatial interpolation models, including 

natural neighbor interpolation and kriging, are employed. The overall results from experiments of 

a real-world bike-sharing system in NYC indicate that the regression kriging model outperforms 

the other models by taking advantage of the robustness and interpretability of gravity-type spatial 

interaction regression models and the capability of ordinary kriging to capture spatial dependence. 

 
2.2 Introduction 

The past century has witnessed a significant increase in urbanization with more than half of the 

world's population currently living in urban areas (United Nations, Department of Economic and 

Social Affairs, Population Division, 2019). As a result, it has become increasingly important to 

understand and anticipate human mobility within and across cities. In particular, public 

transportation systems, historically composed of buses, trains, and taxicabs, form the backbone of 

cities, facilitating interaction and providing an affordable and reliable alternative to carry out 

routine activities compared to private transportation. The development and optimization of public 


 16 

transportation infrastructure have therefore remained an important aspect of urban planning and 

community development. 

 
A more recent trend is the emergence of bike-sharing as an alternative means of transportation and 

the establishment of bike-sharing systems in cities around the world. Compared to other modes of 

mobility, cycling provides health and environmental benefits in addition to offering a more 

efficient means of navigating the urban environment (Oja et al., 2011; Otero et al., 2018; M. Wang 

& Zhou, 2017; Y. Zhang & Mi, 2018). For example, the New York City (NYC) Mobility Report 

indicates that trips made using the NYC Citi bike share system are over a minute faster than taxi 

trips across all distance categories within the Midtown area of Manhattan, and cost less than 25% 

for taxi trips for all trip length categories except those less than half a mile (NYC Department of 

Transportation, 2019). Such advantages are further highlighted during rush hour (Faghih-Imani, 

Hampshire, et al., 2017). This has led to the proliferation of bike-sharing systems, with nearly 2000 

bike-sharing systems now in operation around the world1. 

 
Individual bike-sharing programs are often piloted for a limited number of zones within an urban 

area that are known to have relatively high overall activity and transport demand. Systems are then 

expanded and updated according to the evolution of demand over time and across space. This 

dynamic development of bike-sharing systems requires a strong understanding of mobility 

behavior and flexible methods for predicting spatial-temporal demand. In particular, travel demand 

 
1  According to bikesharingworldmap.com and including both docked and dockless systems (last accessed on 

07/30/2021). 


 17 

models may focus on predicting overall system demand, the total demand at each location or 

station within a system, or demand between individual locations. It is this latter scenario that is the 

primary focus of this research as it typically receives less attention (e.g., Li & Shuai, 2020; Y. 

Zhou et al., 2019). This is perhaps because it is more challenging to model individual origin-

destination (OD) flows due to a lack of detailed data and because they usually contain more noise 

and higher levels of variability compared to the overall system usage or the total inflows or 

outflows for a set of locations. However, models of OD flows, often referred to as spatial 

interaction models, are becoming increasingly possible as detailed data are generated by GPS-

enabled sensors and the internet of things (Shaw & Sui, 2018). For example, Calafiore et al. (2021) 

leveraged user OD flows based on social media check-ins to study the characteristics of cities’ 

neighborhoods. Furthermore, recent machine learning methods for incorporating various 

dimensions of spatial and temporal dependence hold promise for building models with enhanced 

predictive capabilities (Chu et al., 2020; Ke et al., 2019; L. Liu et al., 2019; Y. Wang et al., 2019). 

Consequently, this work investigates a methodology for predicting OD flows of bike share systems 

given that the system evolves over time and demand is constantly changing. More specifically, 

this work focuses on predicting demand for a new station where there may be no knowledge of 

previous flows, also known as a ‘cold start’, which is necessary when the goal is to progressively 

update transport infrastructure in order to maintain efficiency and grow the ridership of bike share 

systems. 

 
Previous work has not examined in detail the task of predicting spatial interaction demand for new 

stations, likely because more traditional public transportation infrastructure evolves much more 

slowly compared to the relatively inexpensive and flexible bike share infrastructure. That is, much 


 18 

of the related state-of-the-art research assumes that the infrastructure is persistent across time and 

there is prior knowledge of station activity, which is not always the case for bike share systems. 

Therefore, section 2.3 of this chapter provides some additional background and formalizes the 

challenges that need to be overcome. Then, section 2.4 presents a novel methodological approach 

for predicting the spatial interaction demand associated with new bike share stations, and section 

2.5 briefs the data used in the case study and experiment settings. Section 2.6 describes the 

experimental results used to benchmark the proposed approaches. Overall, the results indicate that 

regression kriging models are slightly more performant than the other techniques considered. 

Finally, section 2.7 concludes with a discussion of the contributions of this research, the limitations 

of the proposed approach, and suggestions for future work in this area. 

 
2.3 Background 

The widespread adoption of information and communication technology has produced massive 

amounts of transportation and mobility data, sparking a wave of research into how to best model 

human movement. Meanwhile, the popularity of bike-sharing systems has inspired much recent 

research. Si et al. (2019) provide a review of bike-sharing papers published from 2010 to 2018 and 

group them based on the following categories: demand factors, rider behavior, system optimization, 

and impact on other modes. The factors associated with variation in bike-sharing ridership demand 

can largely be summarized as being related to the weather, built environment, sociodemographic, 

or temporal dimensions (Barbour et al., 2019; El-Assi et al., 2017; Eren & Uz, 2020; Guo et al., 

2017; H. Yang et al., 2020). Knowledge of the distribution of trips is useful for developing 

strategies to optimize systems through bike rebalancing or station repositioning, allowing the 

system to satisfy higher demand (P.-C. Chen et al., 2020; Faghih-Imani, Hampshire, et al., 2017; 


 19 

Pan et al., 2019; R. Zhu et al., 2020). The introduction of a bike share system may impact other 

transport modes, leading researchers to compare bike share usage to taxi and bus ridership 

(Campbell & Brakewood, 2017; X. Zhou, Wang, & Li, 2019). Another trend focuses on bike share 

trip prediction using travel demand models and OD flow models, the latter of which is the focus 

of this study and will be used to consider a range of factors influencing bike share trip demand to 

predict OD flows. 

 
Trip demand at 'cold start' stations, however, catches less attention and is only explicitly discussed 

in Noland et al. (2016) and Y. Zhang et al. (2017). And both papers used linear regression models 

on station-level demand to examine the generalization of the models. Research by X. Wang et al. 

(2021) discussed the use of a regression model for making predictions at new stations but only 

performed out-of-sample predictions by randomly selecting stations as a test set, which is not an 

exhaustive validation for trip demand at potential new stations. Furthermore, there is a lack of 

effort dedicated to OD flow predictions for new stations. 

 
Spatial interaction (SI) describes aggregate movements of individuals, commodities, capital, or 

information over geographic space, resulting from some requisite decision-making process 

(Farmer & Oshan, 2017). A typical quantitative representation of SI is the origin-destination (OD) 

matrix: 


 20 

Equation 2.1 

Where 𝑇 is the OD flows matrix from a set of origins (𝑂1, 𝑂2, . . . , 𝑂𝑛) to a set of destinations 

(𝐷1, 𝐷2, . . . , 𝐷𝑚), and 𝑇𝑖𝑗  is the magnitude of flows from origin 𝑖  to destination 𝑗. Inspired by 

Newton's law of gravity, early models of spatial interaction theorized OD flows to be proportional 

to the product of potential at origins and destinations and inversely proportional to the squared 

distance between them, yielding the following formulation, 

 Equation 2.2  

where 𝑇𝑖𝑗 still denotes the flows between origin 𝑖 and destination 𝑗,  and  represent the potential at 

𝑖  and 𝑗 , 𝑑𝑖𝑗  describes the distance between 𝑖  and 𝑗 , and 𝑘  is a scaling factor that ensures the 

number of flows predicted by the model matches the number of observed flows. In this context, 

the potential is often defined as the location population or the number of job opportunities (Gao et 

al., 2013; Krings et al., 2009; Lenormand et al., 2016), but can be expanded to consider a series of 

origin and destination attributes that may contribute towards the flow generation, each with their 

own parameter (Fotheringham & O’Kelly, 1989; Oshan, 2020). According to Kar et al. (2021), 

two groups of variables can be identified besides population or jobs: socio-economic attributes and 

built environment attributes. The former group usually explains trip generation, such as labor force 


 21 

(Pourebrahim et al., 2018; Signorino et al., 2011); GDP (L. Zhang et al., 2019); human activity 

density (Marrocu & Paci, 2013). The latter group includes common destination determinants of 

travel demand, such as amenities (e.g., schools, hospitals, markets) (Botella et al., 2021; Kar et al., 

2021), tourism attractions (e.g., hotel rooms) (Khadaroo & Seetanah, 2008), and land use types (X. 

Liu et al., 2016). 

 
Recent work seeks to exploit temporal dependence within SI flows to increase predictive 

performance. The simplest instance involves taking the average of historical observations, which 

can yield accurate predictions for future flows when there is limited variation in the historical 

observations. Typical methods using temporal dependence include historical averaging; 

autoregressive integrated moving average (ARIMA); Deep neural network structures including 

recurrent neural networks (RNN) and long short-term memory (LSTM) architectures (Cheng, 

Trepanier, & Sun, 2021; Chu et al., 2020; Ke et al., 2019). The common factor amongst all these 

methods is that they require sufficient previous OD flows and become invalid to predict future 

flows when there are no historical observations to draw upon. 

 
Meanwhile, bike share systems are also spatially dynamic in that stations are frequently updated 

or perhaps more importantly that new stations are added to extend the system, though this is often 

overlooked in related work. One exception is Lu et al. (2018) who explored the extension of system 

infrastructure by deploying agent-based methods to simulate the result of adding new stations 

towards the usage of different transportation modes. In contrast, most previous research focuses 

on predicting bike-sharing trip flows at existing locations (i.e., in-sample spatial prediction), often 

using historical data as an important input feature. This is problematic when the focus is instead 


 22 

on predicting flows associated with a new station that is being added to the system (i.e., out-of-

sample spatial prediction) because there is no historical flow data that can be used to learn temporal 

dependencies. A similar issue often occurs in recommender systems where new users or new items 

will have no historical record to be used for preference analysis (Su & Khoshgoftaar, 2009; 

Volkovs et al., 2017) and is often called the ‘cold start’ issue. This term is also used in the literature 

associated with detecting stops and trips from GPS trajectories (Schuessler & Axhausen, 2009; 

Stopher et al., 2005). Overcoming this limitation is the primary contribution of this research in 

order to develop a robust methodology for predicting historical OD flows at a new bike share 

station. 

 
It is reasonable to leverage spatial dependence to predict values at unobserved locations based on 

values from nearby observed locations. The First Law of Geography states that things that are 

closer together are typically more similar (Tobler, 1970) and is the basis for popular spatial 

statistics, such as Moran’s I measure of spatial autocorrelation and spatial interpolation methods, 

including natural neighbor interpolation, inverse distance weighting, Kriging, and more (Mitas & 

Mitasova, 1999). In particular, kriging has become a core spatial interpolation tool and is now used 

in many topic areas such as air quality analysis (Bayraktar & Turalioglu, 2005), natural resource 

analysis (Emery, 2005), water studies (Zimmerman et al., 1998) and traffic (Eom et al., 2006; X. 

Wang & Kockelman, 2009). Recently, spatial interpolation tasks have been adapted into a lattice 

or graph structure and integrated into generative adversarial networks (D. Zhu et al., 2020) or graph 

neural networks (Wu et al., 2020), but these extensions do not yet apply to the case of OD flows. 

For the interpolation of OD flows, Jang and Yao (2011) proposed an areal weighting method, 

which was further employed by Šimbera and Aasa (2019). As far as the authors are aware there 


 23 

are no studies extending interpolation methods to SI flows. However, the spatial dependence 

between SI flows has previously been leveraged for community detection (Gao et al., 2013; Yao 

et al., 2018) and land use identification (X. Liu et al., 2016), indicating that kriging, which also 

relies on the presence of spatial dependence, may be promising for flow interpolation. Therefore, 

this research explores flow-based kriging and compares it to natural neighbor interpolation and 

gravity-type spatial interaction models. 

 
2.4 Methodology 

2.4.1 Problem statement 

For a station-based bike share system, 𝑆𝑛, there are 𝑛 docking stations serving as both origins 𝑆𝑖 

and destinations 𝑆𝑗 and information is available for each trip in the system regarding its origin 

station, destination station, start time, and end time. Trips are also sorted into discrete temporal 

subsets, 𝑡, based on their starting time (e.g., hour, day, week, etc.). Therefore, each trip in the 

system can be denoted using a 3-tuple (𝑡, 𝑠𝑖 , 𝑠𝑗)and the corresponding OD flow matrix 𝑇  is 

comprised of entries denoting aggregated trips between stations at the time 𝑡 (i.e., 𝑇𝑡,𝑖,𝑗 ). The 

diagonal elements of 𝑇 (i.e., 𝑖 = 𝑗) are filtered out and set to zero to remove their undue influence 

on any subsequent modeling procedures. A 'cold start' station refers to a scenario where there is no 

information available about previous flows for a newly added station 𝑆𝑥 and the goal is to predict 

future outflows 𝑇𝑡+1,𝑥,𝑗  and/or future inflows 𝑇𝑡+1,𝑖,𝑥 . The primary issue that arises is that there are 

no previous flows to use in any of the methods that leverage historical data. A methodology is 

proposed below to overcome this limitation by classifying stations and using a combination of 

regression modeling and interpolation techniques depending on the station class. 


 24 

 
2.4.2 Station classification 

Predicting the flows associated with the addition of new stations is the primary task of this research. 

However, depending on the location of the new station and its proximity to currently existing 

stations, different techniques and information are available to carry out the predictions. At least 

three different scenarios can be distinguished. The first is referred to here as interpolation, which 

entails borrowing information directly from the existing stations. Interpolation typically only 

applies in the situation where the newly added station is sufficiently proximal to existing stations 

(i.e., within current system coverage). The second scenario is referred to here as margin 

interpolation and is the case where a new station is added on the margin of the current system. As 

such, there are some nearby stations with previous flow information that can be borrowed. The 

final scenario is referred to here as extrapolation and is concerned with the addition of stations that 

are essentially outside the coverage of the current system. Predictions for this scenario are 

hypothesized to be only possible through the extrapolation of modeled relationships, since there 

are no existing nearby stations to borrow information from. A method for identifying empirical 

instances of system expansion and classifying new candidate stations across the three scenarios is 

proposed below and then subsequently deployed and evaluated. 

 
Newly added stations are identified by examining the time series of trip data for the bike share 

system. For each station present in the system before July 2020, an array of station inflows (or 

station outflows) for each temporal subset (weekly in this case study) is used to track its operation 

status. Intuitively, the first non-zero entry of the array, 𝑡𝑖𝑛 > 0, is recorded as a preliminary 

inferred date that a station was initially put into operation. However, these preliminary dates are 


 25 

then refined by manually checking for consistent station service as in practice there sometimes are 

test trips prior to the official launch of a new station. In addition, the system record indicates that 

all new stations added after the initial system rollout in 2013 did not occur until 2015. Therefore, 

the original stations existing before 2015 (Table 2) are not included in the set of classified stations 

nor are they used for prediction validation. 

 
Table 2. Numbers of stations added over time by month and year.  

Month and Year 2013 2015 2016 2017 2018 2019 2020 

January - - 2 2 9 6 8 

February - - 2 8 2 3 5 

March - - 2 4 6 6 1 

April - - 2 4 3 17 2 

May - - 3 5 7 11 36 

June 336 - 5 4 5 5 35 

July 1          1 8 4 4 3 - 

August - 85 98 4 5 4 - 

September - 39 43 64 1 15 - 

October - 16 2 74 3 38 - 

November - 5 2 6 3 34 - 

December - 2 2 4 4 15 - 

  
Next, these new stations, 𝑆𝑥, are classified between interpolation, margin, and extrapolation. This 

is done based on their relationship to the existing system coverage, which is represented here by 

computing the convex hull of all the stations in operation at the time when 𝑆𝑥 is introduced. The 

convex hull of all the station points provides a simplified representation of the system service area 


 26 

in continuous Euclidean space, though the urban environment does not exist separately from the 

natural environment (i.e., rivers). To ensure these restricted areas are not included in the 

representation of system coverage, a few extra steps were taken to prepare a modified convex hull 

for the bike share system. First, stations on Governor’s Island and a single isolated station in the 

southern part of Brooklyn were removed from the analysis because these stations are essentially 

disconnected from the larger system. Then, multiple convex hulls were created separately for 

stations in Manhattan and stations in Brooklyn/Queens, ensuring that neither of the convex hulls 

cross into major waterways. 

 
For every temporal subset 𝑡, only the system coverage associated with stations running at time 𝑡 −

3 is used to classify the new station 𝑆𝑥 initialized at time 𝑡 where each temporal subset pertains to 

a week within the period from January 2015 through July 2020. This time lag of up to three weeks 

ensures that stations rolled out consecutively over a relatively short time period are not considered 

as previously existing to each other and is necessary because there is usually a short period of time 

before a new station becomes fully integrated within the system. If 𝑆𝑥 falls outside this current 

system coverage, it will be regarded as an extrapolation case. In contrast, if 𝑆𝑥 is within this current 

system coverage, it will be regarded as an interpolation case. It is then necessary to further 

distinguish the partial interpolation cases, which correspond to stations added to the margin of the 

current system coverage. The three cases are formally classified by the percentage of overlap 

between their Voronoi tessellation cell and the current system coverage. Stations overlapping < 5% 

are classified as an extrapolation station while those overlapping > 95% are classified as an 

interpolation station. The remaining stations are classified as margin stations. 


 27 

Applying the above classification to all newly added stations identified in the previous step. Using 

these three different categories, it is possible to apply and evaluate unique mechanisms to borrow 

flow information based on data and/or modeled relationships for each category. The hypothesis 

employed here is that different mechanisms will be more efficient for each category because 

different types and levels of information are available. In particular, it is anticipated that better 

predictive performance will be attained for stations classified as extrapolation cases using a SI 

modeling approach because borrowing information for data from relatively far stations could 

introduce a disproportionately large amount of misinformation. In contrast, the interpolation and 

margin cases are anticipated to achieve higher predictive performance using methods that 

incorporate data-borrowing through interpolation techniques. That is, flows for stations classified 

as interpolation and margin are expected to be best predicted using interpolation methods whereas 

those that are not must rely on extrapolation through modeled relationships (i.e., SI model). 

However, it is less clear whether or not the same method will be the most effective for both 

interpolation and margin stations, since different levels of information are available. To investigate 

these issues, several interpolation methods are explored in the following sections and then 

subsequently compared to each other and to predictions from SI models. 

2.4.3 Modeling strategies 

Gravity-type spatial interaction model 

The unconstrained gravity-type spatial interaction model (Gravity SI) described in section 2.3 is 

perhaps the most widely used model for diverse types of aggregate transport flows (for several 

recent examples see Kar et al., 2021; Lenormand et al., 2016; Oshan, 2020; T. Zhou et al., 2020). 

It is calibrated here using a log-linear Poisson regression with a power distance-decay function 

and a set of origin/destination attributes using the spint module of the Python Spatial Analysis 


 28 

Library (PySAL) (Oshan, 2016). Following Eren and Uz (2020), these attributes include points of 

interest (POI) for urban amenities and services, transportation infrastructure, and socio-

demographic factors. For transportation infrastructure, accessibility to other bike stations is 

computed based on a distance-weighted sum (i.e., spatial lag) of nearby bike station capacity2. POI 

data was obtained from SafeGraph (2020) and contains a hierarchical schema. Eleven categories 

were formed to aggregate POIs from this schema and include care, education, finance, food, 

housing, recreation, religion, shopping, travel, professional, and other services. Further 

descriptions of data sources and variables are summarized in Table 3. The selection process of 

distance parameters for station accessibility and POIs is based on trial and error, however, 

sensitivity results show there are no significant differences in gravity SI results when using other 

distance bands (e.g., 500 meters or 1000 meters) to compute these variables.   

 
2 Since capacity values from the most recent station information are representative of all previous system states, 

capacity values were based on system snapshots of the station information from Sep 2018, May 2019, and Feb 2020. 

Of the 1103 stations that existed from 2015-2020 this provided valid capacity information for all but 135 stations, 

which either were not running during the snapshots or had zero capacity values. These 135 stations were not included 

in the computation of the accessibility metric used here. 
 

 29 

 
Table 3. Attributes used in the spatial interaction models in Chapter 2. 

Attributes Source Features description 

Population Cenpy 

Census.gov  

Population (2017 ACS5) of each 

census tract 

Employment Longitudinal Employer-

Household Dynamics 

(LEHD)3 

Workplace Area Characteristics by 

census block 

Station accessibility Station Info4 Weighted sum of nearby station 

capacity with inverse distance 

weights 

Access to subway NYC Open Data5 

 
Euclidean distance to nearest subway 

station. 

Points of Interest SafeGraph6  Place count weighted by spatial lag 

within 1500 meters of station 

   
Natural neighbor interpolation 

Spatial interpolation methods predict unknown values for an unsampled point in space based on 

its relationship to a set of sampled points. Given values of a random field 𝑍(⋅) measured at 

locations 𝑠1, . . . , 𝑠𝑛 yielding 𝑍(𝑠𝑖) for any station 𝑠𝑖, the objective is to estimate the value 𝑍(𝑠𝑥) 

for one or more unmeasured locations 𝑠𝑥. One well-established spatial interpolation method is the 

natural neighbor technique that finds the closest subset of input samples to a query point and 

weights them proportionally based on areal overlap to estimate a value (Sibson, 1981). Based on 

 
3 https://lehd.ces.census.gov/data/ 
4 https://gbfs.citibikenyc.com/gbfs/en/station_information.json 
5 https://data.cityofnewyork.us/Transportation/Subway-Stations/arq3-7z49 
6 https://www.safegraph.com/covid-19-data-consortium 


 30 

the assumption that nearby stations typically experience similar levels of demand, it is possible to 

borrow data for a new station from neighboring stations using this method. The natural neighbor 

interpolation method for points/polygons was extended to spatial interaction flows by applying a 

modified areal-based weighting scheme (Jang & Yao, 2011). Equations 3.1-3.2 present the natural 

neighbor formula for calculating the flow 𝑇𝑥𝑗 between a new station 𝑆𝑥 and a destination station 

𝑆𝑗. First, Voronoi tessellations are computed for 𝑺𝒏 with and without 𝑆𝑥, denoted as 𝑉𝑛+𝑥 and 𝑉𝑛, 

respectively, with each individual Voronoi polygon approximating the service area for each station. 

In equation 3.1, 𝑚  is the number of stations overlapping with 𝑆𝑥 , 𝑝𝑘  is the proportion of the 

overlapping area between Voronoi shape of 𝑆𝑥 in 𝑉𝑛+𝑥 and 𝑆𝑘 in 𝑉𝑛 in relation to the total area of 

the Voronoi shape of 𝑆𝑥 in 𝑉𝑛+𝑥 (Equation 3). The resulting  [𝑇𝑥1, 𝑇𝑥2, . . . , 𝑇𝑥𝑛]⊤ are the ‘borrowed’ 

outflows of 𝑆𝑥. Thus, natural neighbor interpolation provides an areal-based method that can be 

used on flow data and will be one of the interpolation methods applied here. 

𝑇𝑥𝑗 = ∑ [𝑝𝑘 × 𝑇𝑘𝑗]𝑚
𝑘=1                Equation 2.3.1 

𝑝𝑘 =
𝐴𝑉𝑛+𝑥∩𝑉𝑛 𝑘𝑥

𝐴𝑉𝑛+𝑥𝑥
                        Equation 2.3.2 

Kriging 

While natural neighbor interpolation derives weights based only on location, kriging techniques 

derive weights based on both the location 𝑆𝑥  and value of each sample point 𝑍(𝑠𝑖) with 𝑖 ∈

(1,2, . . . , 𝑛). It is an optimal linear estimator of the form 

𝑍(𝑠𝑥) = ∑ 𝛼𝑖𝑍(𝑠𝑖)
𝑛
𝑖=1       Equation 2.4 

where the weights 𝛼𝑖 are chosen to make the estimator unbiased and of minimal prediction error. 

Kriging is a two-step process. First, the spatial covariance structure of the sampled points is 


 31 

determined by fitting a variogram (a spherical model is used here) that captures the variability 

between all data points as a function of distance. Then, weights derived from this covariance 

structure are used to interpolate values for unsampled observations across the spatial field. 

Ordinary kriging assumes the random field 𝑍(⋅)is intrinsically stationary; that is for any location 

𝑠:  

𝐸[𝑍(𝑠)]  =  𝜇                                     Equation 2.5.1 

𝑉𝑎𝑟[𝑍(𝑠)  −  𝑍(𝑠 + ℎ)]  =  2𝛾(|ℎ|)   Equation 2.5.2 

 
where 𝛾(|ℎ|) is the semi-variogram and a function of distance ℎ that separates two locations. 

Further mathematical details are available in (Cressie, 1993). 

 
Ordinary kriging typically uses a k-dimension (usuallyk <= 3) point as an input location and a 1-

dimension target value 𝑍. However, a spatial interaction flow between stations is a spatial process 

involving two points in a 2-dimension Euclidean space (k = 4). To accommodate this higher 

dimensionality, a flow-based extension inspired by (Y. Zhou et al., 2019) is proposed that 

aggregates all the outflows to the m possible destination stations or inflows from m stations to the 

single origin station 𝑠𝑖 as 𝑍(𝑠𝑖)  and the definition of 𝑇𝑖𝑗 remains the same as in Equation 2.1. The 

coordinates of the origin station remain in 2-D space, but the target value 𝑍𝑖 is an m-length vector 

representing the flows to each destination. According to (X. Liu et al., 2016), this station-centric 

view, which aggregates the OD flows into vectors describing flows coming in and going out from 

one single station, is also referred to as the OD flow signature of a station. In this station-centric 

view, the adoption of the vectorized 𝑍 is needed for handling higher-dimensional 𝑍 within kriging 


 32 

techniques. The inflow predictions 𝑍𝑖𝑛and outflow predictions 𝑍𝑜𝑢𝑡 are carried out separately and 

then concatenated as the overall OD predictions for a target station.  

 
Alternatively, regression kriging leverages information from exogenous variables in addition to 

the spatial dependence in a sample by first fitting a regression model and then kriging the residuals 

of the regression model. Therefore, a Gravity SI model is first fit and then a flow-based ordinary 

kriging (OK) model as described above is used to interpolate the residuals of the gravity-type SI 

model. Specifically, when using the station-centric view in the regression kriging model (RK), the 

kriging step uses 2-dim input with k-dim targets, while the regression step uses location data in 

OD flow form with 4-dim features. 

 
Figure 3. Illustration of areal natural neighbor interpolation and ordinary kriging interpolation 

with toy data. 


 33 

 
To better highlight the difference between the kriging interpolation and natural neighbor 

interpolation, an illustration with pseudo data is attached to demonstrate the two interpolation 

processes step by step in Figure 3. Both methods take identical data structure as input data, e.g., 

existing locations 𝑆𝑛, a 'cold start' location 𝑆𝑥, and flow matrix among existing stations 𝑇. The 

bottom-left section shows the steps of areal-based natural neighbor interpolation. First, the service 

area of each station defined by the Voronoi shapes is calculated before and after a 'cold start' station 

is added. Then the overlapped area over the 'cold start' station is calculated at each existing station, 

consisting of the portion (𝑝) of new stations which is the final proportional weight to calculate the 

interpolated outflows and inflows at the new station shown in the last row. In contrast to the natural 

neighbor that exploits locations only, ordinary kriging interpolation leverages both locations and 

flow similarity. Under the assumption of ordinary kriging, variations between stations at a fixed 

range of distance is only decided by the distance ℎ. Thus, distances between each pair of origin 

and destination are grouped into bins (Right-bottom section, Figure 3 left) and the variance in each 

group is calculated and plotted (Right-bottom section, Figure 3 right). It is proved that the 

optimization equation can be solved using the semivariogram only. Details can be found in 

geostatistical textbooks such as Le & Zidek (2006). So, it is essential to interpolate the flow 

variance between new stations and any existing stations by fitting a covariance function (black 

solid line in Right-bottom section, Figure 3 right). Finally, the optimized alpha that minimizes the 

prediction error is calculated and used for the flow interpolation results just as the natural neighbor 

interpolation method. 

 
 34 

2.5 Data 

2.5.1 Study area 

The Citi Bike system in New York City started operation in 2013 and now becomes the biggest 

bike share system in the US. Among the 809 stations added to the system since 2015, 55 of them 

had too few trips to produce reliable output in any one of the methods, specifically producing a 

null or negative correlation index value. Thus, the following results come from the remaining 

sample of stations ( 𝑛 = 754 ) with 313 interpolation stations, 123 margin stations, and 318 

extrapolation stations using the (0.05,0.95) classification threshold. The spatial distribution of the 

stations is mapped in Figure 4 along with the original (pre-2015) set of stations. All the trips are 

aggregated into unique spatial and time tuples by their original stations, destination stations, and 

the week of the trips starting from Jan 1st, 2015. In the regression-based models, the following 

expression is used: “flows ~ cost + origin attributes (15) + destination attributes (15)” Origin and 

destination attributes including 15 variables each listed in Table 3 where POI includes 11 

categories, e.g., care, education, finance, food, housing, recreation, religious, shopping, travel, 

professional, and other services. In the interpolation-based models, OD-matrices are formatted as 

the demonstration in Figure 3. 


 35 

 
Figure 4. Spatial distribution of bike stations color-coordinated by the three derived classes for 

newly added stations and the original stations. 

 
2.5.2 Study design 

Each new station is associated with a unique time frame for training and testing. So iterating 

station-wise, SI models were trained for each added station 𝑆𝑥, with a training set including all the 

OD flows available in the time period before 𝑆𝑥 is added, and with prediction test set from the time 

period after 𝑆𝑥 is rolled out. Training data consisted of flows one week before a new station was 

added at time 𝑡, (i.e.,  𝑡 − 1) while the evaluation data was set to flows from one week after the 

roll out of a new station (i.e., 𝑡 + 2). This one-unit period provides a chance for demand to stabilize 

after a new station is added to the system for a complete week. Bike trips with duration more than 


 36 

3 hours are excluded as abnormal trips. To evaluate the performance of each prediction method 

for all 𝑆𝑥, Pearson’s R correlation coefficient is employed7. 

 
2.6 Results 

2.6.1 Gravity-type spatial interaction model calibration 

Before producing the prediction results, it is necessary to calibrate the gravity-type SI models. An 

advantage of the SI model over the interpolation methods is the ability to elucidate how 

independent variables affect bike trip flows based on the parameter coefficients. To highlight the 

weight of each independent variable, the top of Figure 5 shows the average parameter magnitude 

and 95% confidence intervals based on the models for each station. The bottom of Figure 5 

captures the top 11 most important features over time. 

 
7 Zero flows were not included when calculating the performance metrics. 


 37 

 
Figure 5. Top: Feature weight from the SI model results averaged over the time periods. The X-

axis represents the absolute value of the parameter estimates and the color represents the sign of 

the average magnitude. The dark green strip shows the average range. Bottom: Time series of top 

11 largest feature weights. The X axis is the number of weeks elapsed since 01/01/2015. 

 
 38 

Generally, parameter estimates associated with a variable have the same direction, either negative 

or positive, and similar magnitudes for the origin stations and the destination stations. The cost 

factor, which is the average trip completion time of the OD pair and is typically called the distance-

decay effect, is the most important factor, attesting to the First Law of Geography. The next most 

important factors included recreation and other services POIs. The former POI group suggested 

the purpose of the bike share trips for leisure activities. The latter group, other services, includes 

many parking facilities. It could be interpreted as a potential common commuting pattern: people 

park their cars and then take bike trips. Surprisingly, population was not a top factor, suggesting 

the residential population around stations is not necessarily a strong indicator of the size of the 

potential bike share user base. In terms of the parameter estimate stability over time, the top 11 

important factors typically have consistently positive or negative signs. Some general trends can 

also be observed over time. First, the distance-decay factor associated with trip duration became 

more negative (i.e., stronger) over time as the system expanded. There is also a periodic fluctuation 

that captures the seasonal trend where distance-decay is less negative in the winter (the spikes 

around 50th, 110th, 210th weeks). Second, the impact of the COVID-19 lockdown was a decrease 

in the magnitude for the parameter estimates of most factors, likely because the regular trips of the 

public were essentially halted during this time and the alternative behavior was either less 

associated with these factors or generally more random. Besides, the gravity SI model also showed 

robustness in the parameter importance when changing the distance band from 1500m to 1000m 

to compute spatially lagged explanatory variables (i.e., POIs and accessibility), providing that the 

direction and magnitude of coefficients remain the same over the two distances. 

 
 39 

2.6.2 Prediction results 

Table 4. A summary of the prediction results based on Pearson’s R for each method using data 

for each classification category and using all data. Standard deviations are in parentheses after 

the mean values. 

Pearson’s R   

(Standard deviation) 

Interpolation Margin Extrapolation All 

Natural neighbor 0.53 (0.22) 0.47 (0.22) - - 

Ordinary Kriging 0.52 (0.20) 0.45 (0.22) 0.41 (0.20) 0.46 (0.21) 

Gravity SI 

(Poisson) 

0.44 (0.19) 0.41 (0.24) 0.40 (0.24) 0.42 (0.22) 

Gravity SI 

(NegBin) 

0.42 (0.18) 0.37 (0.22) 0.39 (0.22) 0.40 (0.20) 

Regression Kriging 

(Poisson) 

0.54 (0.21) 0.49 (0.24) 0.41 (0.22) 0.47 (0.23) 

 
The prediction results based on the correlation index Pearson’s R between predicted and actual 

trip counts were recorded in Table 4 for the five methods (natural neighbor, ordinary kriging, 

regression kriging, gravity-type SI model, and negative binomial model) using data for each 

classification category (interpolation, margin, extrapolation), as well as the entire dataset. Overall, 

the results demonstrate that regression kriging is probably the most well-rounded model to capture 

the correlation in all three station types. The results also provide evidence in support of the primary 

hypothesis that different mechanisms in spatial interaction and spatial interpolation will be 

different depending on the location and the regression kriging can outperform the single methods 

of ordinary kriging or SI model. Meanwhile, the standard deviations around 0.2 indicate the 


 40 

correlation varies much across stations. It is hard to say regression kriging can always outperform 

other methods.  

  
2.6.3 Spatial trends 

The relative comparison of the candidate methods within one location is more meaningful than the 

quantitative comparison of metrics across stations. Figure 6 reveals the spatial distribution of 'cold 

start' stations rendered with the best method in the individual station. Warm colors are associated 

with interpolation methods (Red: Natural Neighbor; Orange: Ordinary Kriging) and cold colors 

stand for regression methods (Cyan: Gravity SI; Blue: Negative Binomial). Green dots represent 

the compound model: regression kriging method. Overall, there is a substantial portion (518 out 

of 754) of interpolation methods topped as the best methods over the two regression-only methods, 

which supports flow interpolation as flow prediction tools. Then focusing on the lower Manhattan 

Island where stations are mostly added as interpolation types. Interpolation methods are commonly 

rated as the best methods there which comply with the intuition that nearby flow patterns facilitate 

interpolation methods. Two regression methods as best methods could hardly be spotted in lower 

Manhattan, but located in North Manhattan, Queens, and Brooklyn (north, east, and south of the 

system, respectively). 

 
 41 

 
Figure 6. Spatial distribution of the best method for flow prediction at each 'cold start' station. 

 
On the edge of the bike share system, where added stations are usually of the extrapolation type, a 

mixture of best methods can be seen. Even though extrapolation stations are out of the coverage 

of the existing system with limited nearby spatial dependence, regression models are not 

necessarily superior to interpolation methods in all cases, which is unexpected. It implies the 

necessity of considering the spatial dependence or spatial structure of the bike share system at all 

times regardless of when locations are added. Another explanation could be that fewer trips at 

extrapolation stations introduce more noise than signal, having a negative impact on all methods. 

When comparing two gravity models, the Negative Binomial (NegBin) model seems to do better 


 42 

in the outskirts of the system where there are probably more stations with a few numbers of rides 

and therefore more overdispersion. 

 
2.7 Discussion      

Spatial interaction demand for bike-sharing trips has caught the attention of many researchers in 

recent years as new bike share systems are installed at an astonishing speed around the world. Yet 

the prediction of demand at 'cold start' stations is an overlooked issue. Previous studies work 

around the issue either by filtering out the new stations or using a different spatial aggregation, 

such as grids. This chapter leveraged spatial interaction models and flow interpolation techniques 

to propose an approach to compare different models through a case study using the NYC Citi Bike 

system. Specifically, the proposed approach attempts to predict OD flows of newly added stations 

after the system infrastructure changes. 

 
After comparing all candidate methods on each identified 'cold start' station, results show that the 

gravity SI model comes with the advantage of the interpretability of the parameter estimates. For 

example, the interpretation of gravity SI models shows that the leading trip generation factor is the 

number of recreation sites nearby, which provides useful additional information to advise the 

planning of bike share systems. Alternatively, interpolation methods, including natural neighbor 

and ordinary kriging, show better predictability for stations classified as interpolation but are less 

performant for stations classified as extrapolation. Regression kriging combines both gravity SI 

and ordinary kriging and yields the best performance. Finally, spatial trends reveal some 

heterogeneity in the prediction performance of different methods. Specifically, the downtown 

Manhattan region shows higher predictability for interpolation methods, but stations added in 


 43 

Brooklyn are less performant for regression methods only. Interpolation methods unexpectedly 

outperform regression models in many extrapolation cases, implying the presence of long-range 

spatial dependence and suggesting the use of spatial interaction models that incorporate spatial 

dependence (Oshan, 2021) in future work. 

 
One significance of this work is that it begins to fill the gap of the missing historical data that is 

needed for many popular methods. For example, the methodology could be used to generate 

pseudo historical flows so that methods dependent upon them can be used to make predictions 

over time even for 'cold start' stations. Therefore, the proposed approach facilitates a more robust 

demand modeling framework. 

 
However, trip prediction for stations classified as extrapolation is still limited using the proposed 

approach because neither the gravity-type spatial interaction model nor the interpolation 

techniques work as well as they do for the interpolation stations. For future work, the idea of 

geostatistical transfer learning may help overcome the issue (Hoffimann et al., 2021). Meanwhile, 

the evolving techniques of graph convolutional neural networks (such as Pareja et al., 2020) may 

be adapted as an alternative of flow interpolation method to be compared with the proposed 

Kriging methods in future work. Another limitation that needs to be more fully explored is the 

weak predictability of 2020 stations, which is possibly due to the low trip counts or drastic 

behavioral changes during the pandemic, or both. When there are low trip counts, the methods 

used here tend to underestimate the few stations with higher activity and more effort is needed to 

develop methods that can handle this scenario. 

 
 44 

The methods included here only provide an initial investigation of the 'cold start' issue. Future 

work may unfold differences in the performance of methods at station level to show the magnitude 

of the differences and further investigate differences in spatial dependence at a finer spatial scale. 

Future studies could also further investigate and leverage the temporal dependencies of the OD 

flows. In this work, a weekly time interval was used, though other intervals could also be explored, 

as well as the temporal windows used to characterize demand before and after stations are added 

to the system. Future work could therefore explore different time parameters or even combine 

them to increase predictive accuracy. These are just a few suggestions that could be on the line of 

inquiry initiated here to grapple with the issues caused by 'cold start' stations in dynamic 

transportation systems. 

 
2.8 Conclusion 

This chapter focuses on the overlooked 'cold start' issue as historical flows are available at newly 

added stations. Instead, flow interpolation methods based on Kriging are proposed in the flow 

domain. Specifically, Ordinary Kriging and Regression Kriging with the adaptations in the flow 

domain are investigated and compared with existing flow interpolation of areal natural neighbor, 

as well as spatial interaction models that calibrate relationships between independent variables and 

spatial interaction flows. Based on experiments conducted on a bike-sharing system in NYC, the 

regression kriging model exhibits superior performance compared to other models regardless of 

the locations of 'cold start' stations by leveraging the strengths of both gravity-type spatial 

interaction regression models, which are robust and interpretable, and ordinary kriging, which is 

capable of capturing spatial dependence. The distribution of best methods at each station also 


 45 

suggests a hybrid strategy to use interpolation-based methods at stations added within the system 

coverage and regression-based methods at stations added at the new areas of the system. 

  
 46 

Chapter 3: Modeling the t