algorithms

Article

Estimating the Tour Length for the Close Enough Traveling
Salesman Problem

Debdatta Sinha Roy 1,* , Bruce Golden 2 , Xingyin Wang 3 and Edward Wasil 4

����������
�������

Citation: Sinha Roy, D.; Golden, B.;

Wang, X.; Wasil, E. Estimating the

Tour Length for the Close Enough

Traveling Salesman Problem.

Algorithms 2021, 14, 123.

https://doi.org/10.3390/a14040123

Academic Editor: Marc Sevaux

Received: 28 February 2021

Accepted: 8 April 2021

Published: 12 April 2021

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional affil-

iations.

Copyright: © 2021 by the authors.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

1 Staples Inc., Framingham, MA 01702, USA
2 Robert H. Smith School of Business, University of Maryland, College Park, MD 20742, USA;

bgolden@umd.edu
3 Engineering Systems and Design, Singapore University of Technology and Design,

Singapore 487372, Singapore; xingyin_wang@sutd.edu.sg
4 Kogod School of Business, American University, Washington, DC 20016, USA; ewasil@american.edu
* Correspondence: debsroy@terpmail.umd.edu

Abstract: We construct empirically based regression models for estimating the tour length in the
Close Enough Traveling Salesman Problem (CETSP). In the CETSP, a customer is considered visited
when the salesman visits any point in the customer’s service region. We build our models using as
many as 14 independent variables on a set of 780 benchmark instances of the CETSP and compare the
estimated tour lengths to the results from a Steiner zone heuristic. We validate our results on a new
set of 234 instances that are similar to the 780 benchmark instances. We also generate results for a new
set of 72 larger instances. Overall, our models fit the data well and do a very good job of estimating
the tour length. In addition, we show that our modeling approach can be used to accurately estimate
the optimal tour lengths for the CETSP.

Keywords: close enough traveling salesman problem; tour-length estimation; regression models

1. Introduction

Operations researchers have long been interested in estimating the length of tours and
routes in the traveling salesman problem (TSP) and the vehicle routing problem (VRP). One
of the earliest papers by Beardwood et al. [1] developed analytically derived formulas for
the TSP. Christofides and Eilon [2], Hindle and Worthington [3], and Cavdar and Sokol [4]
improved these formulas by using empirically estimated parameters. Golden and Alt [5]
constructed interval estimates of the optimal solution value. Chien [6] and Kwon et al. [7]
used parameters for the shape of the area covering the customers and the depot, the
distance between customers, and the coordinates of the customers. Basel and Willemain [8]
estimated the optimal tour length for 17 TSP instances using the square root of the number
of cities and the variability of tour lengths. Nicola et al. [9] developed empirically based
regression models for estimating the travel distance in the TSP, in the capacitated VRP with
time windows, and in the multi-region, multi-depot pickup and delivery problem.

As pointed out by Nicola et al. [9], carriers and logistics companies may need to
solve a large number of routing problems which might require a significant computational
effort. Approaches that generate fast, accurate estimates for the travel distance are highly
desirable in practice for a wide range of real-world routing problems. These estimates can
be used by companies to make strategic, tactical, and operational decisions quickly [9].

In this paper, we construct empirically based regression models for estimating the tour
length in the Close Enough Traveling Salesman Problem (CETSP). In the TSP, a salesman
must visit the exact customer location. In the CETSP, a customer has a service region and
is considered visited when the salesman visits any point in the customer’s service region.
The CETSP arises in practical applications including utility meter reading at residential
locations using radio frequency identification and using drones for aerial surveillance. In

Algorithms 2021, 14, 123. https://doi.org/10.3390/a14040123 https://www.mdpi.com/journal/algorithms

https://www.mdpi.com/journal/algorithms
https://www.mdpi.com
https://orcid.org/0000-0002-4375-1387
https://orcid.org/0000-0002-5270-6094
https://orcid.org/0000-0002-4021-3954
https://orcid.org/0000-0002-8397-4809
https://doi.org/10.3390/a14040123
https://doi.org/10.3390/a14040123
https://creativecommons.org/
https://creativecommons.org/licenses/by/4.0/
https://creativecommons.org/licenses/by/4.0/
https://doi.org/10.3390/a14040123
https://www.mdpi.com/journal/algorithms
https://www.mdpi.com/article/10.3390/a14040123?type=check_update&version=2


Algorithms 2021, 14, 123 2 of 9

another application, during multi-visit drone routing [10] it becomes important to quickly
assess whether the duration to cover a region of interest would exceed a drone’s battery life.

In the CETSP, a service region is assumed to be a circular disk centered at the customer
location with a specified radius. The objective is to visit all customers in the shortest
distance traveled starting and ending at the depot. The TSP is a special case of the CETSP
when all radii of the circular disks are zero, making the CETSP at least as difficult to solve
as the TSP. In order to solve an instance of the CETSP, it is not enough to determine the
sequence in which the customers are visited. We must also determine the locations at which
these customers are visited within their respective service regions. In Figure 1, we show an
instance of a CETSP with 12 customers denoted by C1, . . . , C12 and a depot denoted by C0.
The service region is specified by a circle centered at the customer’s location. A feasible
CETSP tour is shown by the solid lines with arrows. The tour passes through at least one
point in the service region of each customer. If a tour passes through an overlap of several
disks, all customers that define those disks are served. A Steiner zone is an overlap of
disks. If a Steiner zone is contained in at most k disks, it has degree k. For example, the
location of each of customers C1, C6, C8, and C11 is within a degree 1 Steiner zone. Similarly,
the location of each of customers C3, C4, C5, C9, and C12 is within a degree 2 Steiner zone.
Whereas, the location of each of customers C2, C7, and C10 is within a degree 3 Steiner zone.
For more details on Steiner zones, see Wang et al. [11].

Figure 1. An instance of a CETSP with 12 customers [11].


Algorithms 2021, 14, 123 3 of 9

Progress in computational integer programming has been remarkable over the past
30 years; see Bertsimas et al. [12] for details. In particular, exact solvers for the TSP (e.g.,
Concorde) can now generate solutions quickly to most large instances with hundreds or
thousands of customers. However, there are very few exact approaches for the CETSP.
Exact approaches have been developed by Behdani and Smith [13] and Coutinho et al. [14].
Tight bounds have been developed by Carrabs et al. [15]. Coutinho et al. [14] proposed a
branch-and-bound algorithm and applied it to instances from the literature. Computation
times ranged from seconds for small instances to four hours for large instances with
hundreds of customer locations.

In order to generate high-quality solutions quickly, heuristics have been developed
and tested for the CETSP. These include the use of supernodes by Gulczynski et al. [16]
and Dong et al. [17], Steiner zones by Mennell [18], Mennell et al. [19], and Wang et al. [11],
and genetic algorithms by Silberholz and Golden [20], Yuan et al. [21], and Yang et al. [22].

The remainder of the paper is organized as follows. In Section 2, we give the regression
models and discuss the fitness measures to test the performance of the models. In Section 3,
we show the results of the regression models, best subset model selection, and model
validation. In Section 4, we present our conclusions and future directions.

2. Regression Models and Fitness Measures

The Steiner zone variable neighborhood search (SZVNS) heuristic developed by
Wang et al. [11] finds high-quality solutions to instances of the CETSP. We use SZVNS tour
lengths in our regression models to estimate CETSP tour lengths.

Our regression model can be represented by yi = β̂1 + β̂2 × ni + β̂3 × Ai + β̂4 ×
MinPi + β̂5 ×MaxPi + β̂6 × VarPi + β̂7 × SumMinPi + β̂8 × SumMaxPi + β̂9 ×MinMi +
β̂10×MaxMi + β̂11×SumMi + β̂12×VarMi + β̂13× (VarX×VarY)i + β̂14×AvgRi + β̂15×
SZi, where yi = E(Yi), β̂k = E(βk), k ∈ {1, . . . , 15}, i denotes a CETSP instance, and E()
denotes the expected value. The dependent variable Yi is the tour length generated by the
SZVNS heuristic. In Table 1, we give the definitions of the independent variables for the
regression model. Nodes represent customers and the depot. The size of an instance is
captured by n and A. MinP, MaxP, VarP, SumMinP, and SumMaxP capture the distances
between nodes. MinM, MaxM, SumM, and VarM capture the distances to the average node
represented by the mean x-coordinate and the mean y-coordinate. VarX×VarY captures
the spread of the instance across the two axes. AvgR captures the mean of the radii of the
customer service regions. The service region radius of a depot is always zero. SZ captures
the feature of the instance that is exploited by the SZVNS heuristic. AvgR and SZ are used
to capture the geometric features unique to the CETSP. These two independent variables
would not be used in a regression model that estimates TSP tour lengths.

We use the 780 CETSP benchmark instances and their tour lengths produced by the
SZVNS heuristic given in Wang et al. [11]. The node locations (depot and the customers)
are generated randomly and all customers in an instance have the same radius for the
service regions. The instances have 6, 8, 10, 12, 14, 16, 18, 20, 25, or 30 customers. The
radii for the customer service regions are 0.25, 0.50, or 1.00. There are 30 instances for each
combination of the radius of the customer service regions and the number of customers up
to 20. There are 10 instances for each combination of the radius of the customer service
regions and the number of customers greater than 20.

We use mean percentage error (MPE) and mean absolute percentage error (MAPE)
to assess the quality of the approximation of the CETSP tour lengths from the regression
model (yi) with respect to the tour lengths from the SZVNS heuristic (Yi). MPE and MAPE
are defined by 100× (∑N

i=1(Yi− yi)/Yi)/N and 100× (∑N
i=1 | Yi− yi | /Yi)/N, respectively,

where N denotes the number of instances. A value of MPE close to zero indicates that there
is almost an equal distribution of instances with tour lengths being overestimated (Yi < yi)
and underestimated (Yi > yi). The value of MAPE is always non-negative, and a small
value indicates that the tour-length estimates are close to the SZVNS tour lengths for most
of the instances.


Algorithms 2021, 14, 123 4 of 9

Table 1. Definitions of the independent variables for the regression model.

Independent Variable Definition

n Number of nodes
A Area of the smallest rectangle covering all nodes

MinP Minimum distance across all pairs of nodes
MaxP Maximum distance across all pairs of nodes
VarP Variance of distances across all pairs of nodes

SumMinP Sum of distances to the nearest neighbor of each node
SumMaxP Sum of distances to the farthest neighbor of each node

MinM Minimum distance to the average node
MaxM Maximum distance to the average node
SumM Sum of distances to the average node
VarM Variance of distances to the average node

VarX×VarY Product of variances of the nodes across two axes
AvgR Average radius of the customer service regions

SZ Number of Steiner zones of degree three and less that are not
dominated by other Steiner zones of degree three and less

We use adjusted R2, Studentized residuals, Mallows’s Cp, and Bayesian information
criterion (BIC) to assess the quality of the model fits. Residuals (Yi − yi) have a mean of
zero. Studentized residuals are scaled residuals with unit variance. Mallows’s Cp and
BIC are used in the context of model selection where the goal is to find the best model
involving a subset of the independent variables. Mallows’s Cp addresses the issue of model
overfitting by penalizing for adding extra variables. The value of Mallows’s Cp should be
close to the number of independent variables in the model (p) to indicate the absence of
overfitting. BIC penalizes a model for having more independent variables, and the penalty
increases as the number of instances (size of the data set) increases. The lower the value of
BIC, the better is the model fit.

3. Regression Results

In Table 2, we present the regression results for the 780 instances using all 14 inde-
pendent variables in our model. The regression model has an adjusted R2 value of 0.921
which indicates a very good model fit. The variables n and SZ are not significant at the 10%
level. The remaining 12 variables are significant at various levels. In Figure 2a, we give
the histogram of Studentized residuals for the model with 14 variables. This histogram
shows that there is almost an equal distribution of instances with positive and negative
residuals. This is also indicated by the MPE value of −0.192% which is close to zero. The
MAPE value indicates that the tour-length estimates from the regression model differ by
an average of 3.984% from the SZVNS tour lengths.

Figure 2. Histograms of Studentized residuals for four models.


Algorithms 2021, 14, 123 5 of 9

Table 2. Regression results.

Coefficient Mean Values

Intercept (β1) 15.231 ****
n (β2) 0.225
A (β3) 0.048 ****

MinP (β4) −0.654 ****
MaxP (β5) 0.224 **
VarP (β6) 0.106 *

SumMinP (β7) 0.361 ****
SumMaxP (β8) 0.036 **

MinM (β9) 0.459 ***
MaxM (β10) −0.418 **
SumM (β11) −0.067 **
VarM (β12) 0.683 ****

VarX×VarY (β13) 0.014 ****
AvgR (β14) −9.092 ****

SZ (β15) −0.026

Adjusted R2 0.921
MPE −0.192%

MAPE 3.984%
* p < 0.1; ** p < 0.05; *** p < 0.01; **** p < 0.001.

3.1. Best Subset Model Selection

Although the regression model shown with all 14 variables performed well in esti-
mating the SZVNS tour lengths, two of the variables were not significant at the 10% level
and one variable was significant only at the 10% level. We now examine whether we can
remove some variables to create a parsimonious model without compromising much on
the model fit.

We create models with 1 to 14 independent variables in the following way. We start
by constructing models with only one independent variable and identifying the 1-variable
model that produced the largest adjusted R2 value. Then we find the 2-variable model
that produced the largest adjusted R2 value, and so on until we consider the 14-variable
model. In Table 3, we show the best subset models based on the adjusted R2 value. For
example, out of all possible models with two independent variables, the model containing
the variables SumMaxP and AvgR produced the largest adjusted R2 value. Following this
process, we create 14 models that contain 1 to 14 variables. In Table 4, we show the adjusted
R2, Mallows’s Cp, and BIC values of the best subset models from Table 3.

Over these 14 models, we select the model that has the largest adjusted R2 value. This
is the model with 13 variables and an adjusted R2 value of 0.921. There are other models
with the same adjusted R2 value when rounded to three decimal places.

We show the coefficients of the first model in Table 5 under the column labeled “Best
adjusted R2”. We note that the adjusted R2 value of 0.921 indicates a very good model
fit. The variable SZ is not in this model and the variable n is not significant at the 10%
level. The remaining 12 variables are significant at various levels. In Figure 2b, we give
the histogram of Studentized residuals for this model. This histogram shows that there is
almost an equal distribution of instances with positive and negative residuals. This is also
indicated by the MPE value of −0.192% which is close to zero. The MAPE value indicates
that the tour-length estimates from this regression model differ by an average of 3.983%
from the SZVNS tour lengths.


Algorithms 2021, 14, 123 6 of 9

Table 3. Best subset models based on adjusted R2.

Variable
Number of Variables

1 2 3 4 5 6 7 8 9 10 11 12 13 14

n * * *
A * * * * * * * * * * * *
MinP * * * * * * *
MaxP * * * * * *
VarP * * *
SumMinP * * * * * * * * * * * *
SumMaxP * * * * * * * * * * *
MinM * * * * * * * *
MaxM * * * * *
SumM * * * * *
VarM * * * * * * * * * *
VarX×VarY * * * * * * * * *
AvgR * * * * * * * * * * * * *
SZ *

Table 4. Adjusted R2, Mallows’s Cp, and BIC values of the best subset models from Table 3.

Best Subset Model Adjusted R2 Mallows’s Cp BIC

1-variable 0.598 3189.7 −698
2-variable 0.787 1320.5 −1189
3-variable 0.876 447.1 −1605
4-variable 0.899 218.1 −1762
5-variable 0.911 107.7 −1850
6-variable 0.918 42.1 −1906
7-variable 0.919 30.2 −1912
8-variable 0.920 16.9 −1921
9-variable 0.921 15.0 −1918

10-variable 0.921 13.0 −1916
11-variable 0.921 14.5 −1911
12-variable 0.921 15.7 −1906
13-variable 0.921 16.9 −1901
14-variable 0.921 18.0 −1895

Next, from the 14 models in Table 3, we select the model that has its Mallows’s Cp
value closest to the number of independent variables in the model. This is the model with
10 independent variables and a Mallows’s Cp value of 13.0.

We show the coefficients of the second model in Table 5 under the column labeled
“Best Mallows’s Cp”. We note that the adjusted R2 value of 0.921 indicates a very good
model fit. The four variables n, VarP, SumM, and SZ are not in this model. The remaining 10
variables are significant at various levels. In Figure 2c, we give the histogram of Studentized
residuals for this model. This histogram shows that there is almost an equal distribution of
instances with positive and negative residuals. This is also indicated by the MPE value of
−0.194% which is close to zero. The MAPE value indicates that the tour-length estimates
from this regression model differ by an average of 3.995% from the SZVNS tour lengths.

Finally, from the 14 models in Table 3, we select the model that has the smallest BIC
value. This is the model with eight independent variables and a BIC value of −1921.


Algorithms 2021, 14, 123 7 of 9

Table 5. Regression results on the three selected best subset models.

Coefficient Best Adjusted R2 Best Mallows’s Cp Best BIC

Intercept (β1) 15.320 **** 15.485 **** 16.334 ****
n (β2) 0.188
A (β3) 0.048 **** 0.046 **** 0.044 ****

MinP (β4) −0.668 **** −0.734 **** −0.674 ****
MaxP (β5) 0.224 ** 0.230 **
VarP (β6) 0.101 *

SumMinP (β7) 0.359 **** 0.362 **** 0.362 ****
SumMaxP (β8) 0.036 ** 0.025 **** 0.027 ****

MinM (β9) 0.455 *** 0.508 **** 0.498 ****
MaxM (β10) −0.410 ** −0.273 **
SumM (β11) −0.064 **
VarM (β12) 0.685 **** 0.805 **** 0.797 ****

VarX×VarY (β13) 0.014 **** 0.011 **** 0.012 ****
AvgR (β14) −9.059 **** −9.059 **** −9.059 ****

SZ (β15)

Number of variables 13 10 8
Adjusted R2 0.921 0.921 0.920

Mallows’s Cp 13.0
BIC −1921

MPE −0.192% −0.194% −0.202%
MAPE 3.983% 3.995% 4.008%

* p < 0.1; ** p < 0.05; *** p < 0.01; **** p < 0.001.

We show the coefficients of the third model in Table 5 under the column labeled “Best
BIC”. We note that the adjusted R2 value of 0.920 indicates a very good model fit. The
six variables n, MaxP, VarP, MaxM, SumM, and SZ are not in this model. The remaining
eight variables are significant at the 0.1% level. In Figure 2d, we give the histogram of
Studentized residuals for this model. This histogram shows that there is almost an equal
distribution of instances with positive and negative residuals. This is also indicated by
the MPE value of −0.202% which is close to zero. The MAPE value indicates that the
tour-length estimates from this regression model differ by an average of 4.008% from the
SZVNS tour lengths.

All three best subset models based on adjusted R2, Mallows’s Cp, and BIC performed
well. In fact, their adjusted R2 values are nearly the same (about 0.921), as are their MAPE
values (about 4%). We recommend the model with the least number of independent
variables (Best BIC model with eight variables) to estimate the SZVNS tour lengths for
CETSP instances with random node locations. The coefficients of all eight variables in the
Best BIC model are highly significant (p < 0.001).

3.2. Model Validation

We generated 234 new CETSP instances, similar to the 780 instances we used to
develop the regression models, to test the Best BIC model with eight variables. The node
locations (depot and the customers) are generated randomly and all customers in an
instance have the same radius for the service regions. The instances have 6, 8, 10, 12, 14, 16,
18, 20, 25, or 30 customers. The radii for the customer service regions are 0.25, 0.50, or 1.00.
There are nine new instances for each combination of the radius of the customer service
regions and the number of customers up to 20. There are three new instances for each
combination of the radius of the customer service regions and the number of customers
greater than 20. We estimated the SZVNS tour lengths for the 234 new instances using
the Best BIC model. The out-of-sample MPE and MAPE values are −0.344% and 4.233%,
respectively, which indicate that the Best BIC model performs well in estimating the SZVNS
tour lengths on the new CETSP instances.


Algorithms 2021, 14, 123 8 of 9

We generated the optimal tour lengths for the 234 new instances using a branch-and-
bound algorithm [14] in order to test the usefulness of the eight independent variables
selected in the Best BIC model in estimating the optimal tour lengths. SZ is the only
independent variable specific to the SZVNS algorithm and is not a part of any of the
three models shown in Table 5. Therefore, the eight independent variables in the Best
BIC model should have more general usage in estimating tour lengths. The regression
model built using the eight independent variables in the Best BIC model and trained on
the optimal tour lengths of the 234 new instances is given by: optimali = 13.394 + 0.054×
Ai − 0.226×MinPi + 0.316× SumMinPi + 0.033× SumMaxPi + 0.640×MinMi + 0.852×
VarMi + 0.018× (VarX×VarY)i − 8.774×AvgRi, where optimali denotes the optimal tour
length for instance i. The signs and the orders of magnitude of the coefficients are the same
as the coefficients in the Best BIC model. The regression model for estimating the optimal
tour lengths with eight independent variables has an adjusted R2 value of 0.937, indicating
a very good model fit. The MPE value of −0.161%, which is close to zero, indicates that
there is almost an equal distribution of instances with positive and negative residuals. The
MAPE value of 3.735% also indicates the usefulness of this model in estimating optimal
tour lengths.

The eight independent variables in the Best BIC model adequately capture the geo-
metric properties of the CETSP instances where the node locations are generated randomly.
The model can be trained on the tour lengths generated from any heuristic or optimal
algorithm to accurately estimate the tour lengths for that specific heuristic or optimal
algorithm on other similar CETSP instances.

We generated a new set of 72 larger CETSP instances to further assess the Best BIC
model with eight variables. The node locations (depot and the customers) are generated
randomly and all customers in an instance have the same radius for the service regions.
The instances have 35, 40, 45, or 50 customers. The radii for the customer service regions
are 0.25, 0.50, or 1.00. There are six new instances for each combination of the radius of
the customer service regions and the number of customers. We estimated the SZVNS tour
lengths for the 72 new larger instances using the Best BIC model. The out-of-sample MPE
and MAPE values are −0.356% and 4.389%, respectively, which indicate that the Best BIC
model performs well in estimating the SZVNS tour lengths on the larger CETSP instances.

We tried generating the optimal tour lengths for the new set of 72 larger instances
using the branch-and-bound algorithm from Coutinho et al. [14] with a maximum run time
of one hour. We were unable to find the optimal solution to seven instances, so we do not
report regression results based on optimal tour lengths. This experiment reinforces the
need for a regression-based model to quickly estimate the CETSP tour lengths of larger and
more difficult instances for a specific heuristic or optimal algorithm. All CETSP instances
used in our experiments are given in Sinha Roy et al. [23].

4. Conclusions and Future Directions

We applied regression models to an important problem in the routing literature–the
CETSP. We demonstrated that it is possible to quickly and accurately estimate tour lengths
using a regression model without generating the actual tours. We showed that the tour
lengths generated by the SZVNS heuristic or by an optimal algorithm could be estimated
with an average error of about 4% by a regression model with eight independent variables
where the node locations are generated randomly. In the future, we would like to develop
models for instances with node locations that are generated in different structured ways.

Author Contributions: Conceptualization, B.G.; Data curation, D.S.R. and X.W.; Formal analysis,
D.S.R. and X.W.; Supervision, B.G. and E.W.; Writing—original draft, D.S.R.; Writing—review &
editing, B.G. and E.W. All authors have read and agreed to the published version of the manuscript.

Funding: This research received no external funding.

Data Availability Statement: The data presented in this study are openly available in zenodo at
http://doi.org/10.5281/zenodo.4632436.

http://doi.org/10.5281/zenodo.4632436
http://doi.org/10.5281/zenodo.4632436


Algorithms 2021, 14, 123 9 of 9

Conflicts of Interest: The authors declare no conflict of interest.

References
1. Beardwood, J.; Halton, J.H.; Hammersley, J.M. The shortest path through many points. Math. Proc. Camb. Philos. Soc. 1959, 55,

299–327. [CrossRef]
2. Christofides, N.; Eilon, S. Expected distances in distribution problems. J. Oper. Res. Soc. 1969, 20, 437–443. [CrossRef]
3. Hindle, A.; Worthington, D. Models to estimate average route lengths in different geographical environments. J. Oper. Res. Soc.

2004, 55, 662–666. [CrossRef]
4. Cavdar, B.; Sokol, J. A distribution-free TSP tour length estimation model for random graphs. Eur. J. Oper. Res. 2015, 243, 588–598.

[CrossRef]
5. Golden, B.; Alt, F. Interval estimation of a global optimum for large combinatorial problems. Nav. Res. Logist. Q. 1979, 26, 69–77.

[CrossRef]
6. Chien, T.W. Operational estimators for the length of a traveling salesman tour. Comput. Oper. Res. 1992, 19, 469–478. [CrossRef]
7. Kwon, O.; Golden, B.; Wasil, E. Estimating the length of the optimal TSP tour: An empirical study using regression and neural

networks. Comput. Oper. Res. 1995, 22, 1039–1046. [CrossRef]
8. Basel, J.; Willemain, T.R. Random tours in the traveling salesman problem: Analysis and application. Comput. Optim. Appl. 2001,

20, 211–217. [CrossRef]
9. Nicola, D.; Vetschera, R.; Dragomir, A. Total distance approximations for routing solutions. Comput. Oper. Res. 2019, 102, 67–74.

[CrossRef]
10. Poikonen, S.; Golden, B. Multi-visit drone routing problem. Comput. Oper. Res. 2020, 113, 104802. [CrossRef]
11. Wang, X.; Golden, B.; Wasil, E. A Steiner zone variable neighborhood search heuristic for the close-enough traveling salesman

problem. Comput. Oper. Res. 2019, 101, 200–219. [CrossRef]
12. Bertsimas, D.; King, A.; Mazumder, R. Best subset selection via a modern optimization lens. Ann. Stat. 2016, 44, 813–852.

[CrossRef]
13. Behdani, B.; Smith, J.C. An integer-programming-based approach to the close-enough traveling salesman problem. INFORMS J.

Comput. 2014, 26, 415–432. [CrossRef]
14. Coutinho, W.P.; do Nascimento, R.Q.; Pessoa, A.A.; Subramanian, A. A branch-and-bound algorithm for the close-enough

traveling salesman problem. INFORMS J. Comput. 2016, 28, 752–765. [CrossRef]
15. Carrabs, F.; Cerrone, C.; Cerulli, R.; Gaudioso, M. A novel discretization scheme for the close-enough traveling salesman problem.

Comput. Oper. Res. 2017, 78, 163–171. [CrossRef]
16. Gulczynski, D.; Heath, J.; Price, C. The close enough traveling salesman problem: A discussion of several heuristics. In Perspectives

in Operations Research: Papers in Honor of Saul Gass’ 80th Birthday; Springer: New York, NY, USA, 2006; pp. 271–283.
17. Dong, J.; Yang, N.; Chen, M. Heuristic approaches for a TSP variant: The automatic meter reading shortest tour problem. In

Extending the Horizons: Advances in Computing, Optimization, and Decision Technologies; Springer: New York, NY, USA, 2007;
pp. 145–163.

18. Mennell, W.K. Heuristics for Solving Three Routing Problems: Close-Enough Traveling Salesman Problem, Close-Enough
Vehicle Routing Problem, Sequence-Dependent Team Orienteering Problem. Ph.D. Thesis, Decision, Operations & Information
Technologies, University of Maryland, College Park, MD, USA, 2009.

19. Mennell, W.K.; Golden, B.; Wasil, E. A Steiner-zone heuristic for solving the close-enough traveling salesman problem. In Opera-
tions Research, Computing, and Homeland Defense; INFORMS: Catonsville, MD, USA, 2011; pp. 162–183.

20. Silberholz, J.; Golden, B. The generalized traveling salesman problem: A new genetic algorithm approach. In Extending the
Horizons: Advances in Computing, Optimization, and Decision Technologies; Springer: New York, NY, USA, 2007; pp. 165–181.

21. Yuan, B.; Orlowska, M.; Sadiq, S. On the optimal robot routing problem in wireless sensor networks. IEEE Trans. Knowl. Data Eng.
2007, 19, 1252–1261. [CrossRef]

22. Yang, Z.; Xiao, M.-Q.; Ge, Y.-W.; Feng, D.-L.; Zhang, L.; Song, H.-F.; Tang, X.-L. A double-loop hybrid algorithm for the traveling
salesman problem with arbitrary neighbourhoods. Eur. J. Oper. Res. 2018, 265, 65–80. [CrossRef]

23. Sinha Roy, D.; Golden, B.; Wang, X.; Wasil, E. Instances for the Close Enough Traveling Salesman Problem. Data Set. 2021.
Available online: http://doi.org/10.5281/zenodo.4632436 (accessed on 8 April 2021).

http://dx.doi.org/10.1017/S0305004100034095
http://dx.doi.org/10.1057/jors.1969.101
http://dx.doi.org/10.1057/palgrave.jors.2601751
http://dx.doi.org/10.1016/j.ejor.2014.12.020
http://dx.doi.org/10.1002/nav.3800260108
http://dx.doi.org/10.1016/0305-0548(92)90002-M
http://dx.doi.org/10.1016/0305-0548(94)00093-N
http://dx.doi.org/10.1023/A:1011263204536
http://dx.doi.org/10.1016/j.cor.2018.10.008
http://dx.doi.org/10.1016/j.cor.2019.104802
http://dx.doi.org/10.1016/j.cor.2018.07.023
http://dx.doi.org/10.1214/15-AOS1388
http://dx.doi.org/10.1287/ijoc.2013.0574
http://dx.doi.org/10.1287/ijoc.2016.0711
http://dx.doi.org/10.1016/j.cor.2016.09.003
http://dx.doi.org/10.1109/TKDE.2007.1062
http://dx.doi.org/10.1016/j.ejor.2017.07.024
http://doi.org/10.5281/zenodo.4632436

	Introduction
	Regression Models and Fitness Measures
	Regression Results
	Best Subset Model Selection
	Model Validation

	Conclusions and Future Directions
	References