148

Detock: High Performance Multi-region Transactions at Scale

CUONG D. T. NGUYEN, University of Maryland, USA
JOHANN K. MILLER, University of Maryland, USA
DANIEL J. ABADI, University of Maryland, USA

Many globally distributed data stores need to replicate data across large geographic distances. Since syn-
chronously replicating data across such distances is slow, those systems with high consistency requirements
often geo-partition data and direct all linearizable requests to the primary region of the accessed data. This
significantly improves performance for workloads where most transactions access data close to where they
originate from. However, supporting serializable multi-geo-partition transactions is a challenge, and they often
degrade the performance of the whole system. This becomes even more challenging when they conflict with
single-partition requests, where optimistic protocols lead to high numbers of aborts, and pessimistic protocols
lead to high numbers of distributed deadlocks. In this paper, we describe the design of concurrency control and
deadlock resolution protocols, built within a practical, complete implementation of a geographically replicated
database system called Detock, that enables processing strictly-serializable multi-region transactions with
near-zero performance degradation at extremely high conflict and order of magnitude higher throughput
relative to state-of-the art geo-replication approaches, while improving latency by up to a factor of 5.

CCS Concepts: • Information systems → Distributed database transactions; Deadlocks; Database
transaction processing.

Additional Key Words and Phrases: multi-region database, deterministic database, deadlock resolution

ACM Reference Format:
Cuong D. T. Nguyen, Johann K. Miller, and Daniel J. Abadi. 2023. Detock: High Performance Multi-region
Transactions at Scale. Proc. ACM Manag. Data 1, 2, Article 148 (June 2023), 27 pages. https://doi.org/10.1145/
3589293

1 INTRODUCTION
Modern data stores typically replicate data for improved availability, durability, read throughput,
and/or latency. Data stores designed for global applications typically replicate data across large
geographic distances, which further improves robustness to region failure, and can allow reads to
occur locally to an application client.
Data stores that allow replicas to temporarily diverge in a manner visible to the client are

termed weakly consistent, and those for which such divergence does not exist or is kept invisible
to the client are termed strongly consistent. The gold standard for strongly consistent guarantees
in the context of transactional systems is strict serializability [11, 12, 24, 42], which ensures that
transactions submitted after earlier transactions complete never observe a state prior to those
completed transactions, even if they are processed on a different replica.

There are two common approaches for implementing strict serializability in practice. The simplest
approach is to have a primary copy of each data item. All writes are performed by that primary

Authors’ addresses: Cuong D. T. Nguyen, ctring@umd.edu, University of Maryland, College Park, MD, USA; Johann K.
Miller, jkmiller@umd.edu, University of Maryland, College Park, MD, USA; Daniel J. Abadi, abadi@umd.edu, University of
Maryland, College Park, MD, USA.

This work is licensed under a Creative Commons Attribution International 4.0 License.

© 2023 Copyright held by the owner/author(s).
2836-6573/2023/6-ART148
https://doi.org/10.1145/3589293

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.

HTTPS://ORCID.ORG/0000-0003-4250-2896
HTTPS://ORCID.ORG/0000-0001-5290-7796
HTTPS://ORCID.ORG/0000-0003-3771-2995
https://doi.org/10.1145/3589293
https://doi.org/10.1145/3589293
https://orcid.org/0000-0003-4250-2896
https://orcid.org/0000-0001-5290-7796
https://orcid.org/0000-0001-5290-7796
https://orcid.org/0000-0003-3771-2995
https://creativecommons.org/licenses/by/4.0/
https://creativecommons.org/licenses/by/4.0/
https://doi.org/10.1145/3589293


148:2 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

copy, and strongly consistent reads are directed either to that primary copy or other copies that are
replicated to synchronously from that primary copy [18, 33, 45, 46]. The second approach allows
writes to occur at any replica, but performs a consensus protocol to avoid replica divergence [9, 17,
52, 54, 58, 60].

Both of these approaches can support geographic partitioning of data for improved performance.
In the first approach, the primary copy of different data items can be stored in different regions [15,
18, 33]. In the second approach, separate consensus groups can be formed in different regions [49, 52].
Either way, geo-partitioning decreases latency of transactions that initiate near the region of their
accessed data and can therefore increase the overall performance of a workload if such transactions
are prevalent.
Unfortunately, even for workloads where such transactions are common, there still exist some

transactions which must access data in more than one partition. Such transactions necessarily
require coordination across partitions (and therefore across geographic regions) to ensure strict
serializability, and this increases latency. The bigger problem, however, is that when they conflict
with single-partition transactions, they become hard to complete: optimistic concurrency control
approaches result in extremely high abort rates under high contention, and pessimistic approaches
result in high amounts of deadlock. Even single-region transactions can end up getting involved
in deadlocks or OCC aborting because of the presence of these slow multi-region transactions.
Therefore, it is extraordinarily difficult to achieve high throughput under high contention and a
non-trivial number of multi-region transactions.
One approach to avoiding these issues is to use deadlock avoidance techniques to enable pes-

simistic concurrency, providing high-throughput transaction processing under high contention.
For example, the work on SLOG creates separate consensus groups per region, geographically
partitions data across these regions, and runs every multi-region transaction through a global
ordering mechanism to completely avoid deadlock [49]. However, this approach adds the latency
of the global ordering layer in addition to the latency required for coordination during normal
processing of strictly serializable multi-region transactions, further impacting the performance of
conflicting single-region transactions.

Instead, in this paper, we present a new graph-based concurrency control protocol that enables
multi-region transactions (in addition to single-region transactions) to be scheduled deterministi-
cally at each region such that all regions involved in processing a transaction will construct the
same graph independently and process transactions completely without cross-region coordination
after receiving all parts of the transaction. The graphs constructed by each region are formed based
on conflicting accesses by different transactions, and indeed may contain deadlocks. However,
since each region constructs the same graph, deadlocks can be resolved by dynamically reordering
accesses by deadlocked transactions to resolve the deadlock deterministicallywithout ever having
to resort to aborting transactions and without having to communicate this reordering with other
regions.

Nevertheless, high network delays between regions can cause the size and number of deadlocks
to grow unbounded. We therefore implement a practical version of this algorithm within a new
system called Detock that annotates transactions with real-time based timestamps, which are
used to strategically schedule transactions to reduce the probability of deadlock. Detock also
implements a novel protocol for migrating data to other geo-partitions using a simpler approach
than used in previous work [49].

When comparing Detock to several alternative state-of-the-art systems that support geographic
partitioning such as SLOG and CockroachDB, we find thatDetock can lessen throughput reductions
caused by multi-region transactions under high conflict by several orders of magnitude, while
simultaneously reducing latency by avoiding unnecessarily cross-region coordination.

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:3

…

Partition 2

Partition 1

Scheduler

Deadlock 
Resolver

Log Manager

Workers

Sequencer

Paxos

Region A

Local Log

Region B

Forwarder

Home Directory

Ta Ta Ta

T

Ta Ta Ta Ta

Tb Tb Tb

…

Partition 2

Partition 1

Scheduler

Deadlock 
Resolver

Log Manager

Workers

Sequencer

Paxos

Region B

Region A

Local Log

Forwarder

Home Directory

Tb Tb Tb

T

Ta Ta Ta Ta

Tb Tb Tb

B

A

B

A

Thread Ta
Annotated 
transaction

Unannotated 
transaction

T
Cross-region 
communication

Fig. 1. Architecture of Detock

This paper thus makes the following contributions:
• A concurrency control protocol for geo-distributed transactions.
• An abort-free deterministic deadlock resolution protocol that enables replicas to resolve deadlocks
independently.

• A practical implementation of these protocols within an open source system that leverages
real-time-based timestamps.

• Experiments that investigate the impact of these practical optimizations and compares to two
state of the art systems.

• A novel, simple, data migration protocol.

2 SYSTEM ARCHITECTURE
Detock is a geo-partitioned database system, and uses a similar high-level architecture to re-
cent state-of-the-art geo-partitioned database systems, while introducing novel approaches to
concurrency control, deadlock resolution, and data migration. This section overviews the basic
architectural approach that Detock shares with other geo-partitioned database systems — most
notably SLOG and CockroachDB — and we defer the discussion of the unique aspects of Detock’s
design to the following section.

The system is deployed across multiple geographic regions. A region consists of servers connected
via a low-latency network. These servers typically reside within a single data-center or multiple
data-centers that are in close proximity with each other. Each item is assigned to exactly one
geographic region. This is called a home region in both SLOG and CockroachDB. The identifier of
the home region for a data item is stored in the header of that data item.
A region may store local data for which it is designated as the home region, and remote data

which are materialized by replaying logs asynchronously replicated from other regions. Data is
also partitioned locally within a region independently of home status: each partition might contain
a mix of local and remote data.

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


148:4 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

Similar to SLOG, Detock relies on deterministic transaction execution to substantially reduce
cross-region coordination. Fig. 1 shows the architecture of this style of deterministic system in a
deployment over two regions A and B. Clients send transactions to their closest region. The first
server that receives a transaction becomes its coordinator, which first resolves non-deterministic
commands in the transaction (e.g. random() and time()), then attempts to extract its read/write
set. When this is not possible via static analysis [25], the OLLP protocol is used, which obtains an
initial estimate of the transaction’s read/write set via a reconnaissance query [54]. Each region
maintains a distributed index called a Home Directory that contains the cached value of the current
home for each known data item. The Forwarder of the coordinator uses this index to augment the
read/write set with the home information of every data item. It then annotates the transaction with
this augmented read/write set and forwards the transaction to its home region(s).
We denote in Fig. 1 annotated transactions housed by region A as Ta, and by region B as Tb.

Once these transactions reach their home regions, they are put into batches and inserted into a
Paxos-maintained local log by the Sequencers. This log is synchronously replicated within a region
to tolerate failure of individual servers, and optionally to nearby regions to increase availability
during (rare) failures of an entire region.
A region can deterministically replay a local log from any other region 𝑅 to obtain the state

of 𝑅’s local data. Therefore, persisting the local logs is sufficient for durability, and replication is
performed by shipping these local logs. While it is not required for a region to hold any remote
data, having a possibly stale copy of the remote data allows local snapshot reads of data from other
regions and makes executing multi-home transactions (including the home-movement transactions
in Section 4) faster. To this end, regions in Detock and SLOG asynchronously exchange their
local logs to each other, so each region eventually receives and replays the complete local log from
every other region, as can be seen in the Log Managers of both regions in Fig. 1.
To replay the logs, a Scheduler constructs a dependency graph for the transactions in the logs

with the help of the Deadlock Resolver (Section 3.2), and schedules them to be executed by the
Workers.

If all data accessed by a transaction belong to a single region, it is called a single-home transaction;
otherwise, it is multi-home. Multi-home transactions insert records into the local logs of each home
region for the data accessed by that transaction. As mentioned above, SLOG globally orders multi-
home transactions to avoid inconsistently ordering them across regions (e.g. T1 before T2 at region
1, but T2 before T1 at region 2) which could result in serializability violations, OCC aborts, or
deadlock [49]. Detock eliminates this global ordering, but must therefore deal with the problems
that arise from inconsistent ordering (discussed in the next section). By eliminating multi-home
transaction ordering, Detock is able to guarantee that each transaction, regardless of its type, only
needs a single-round trip from the initiating region to the participating regions: the initiating
region sends the multi-home transaction to each region which houses data that it accesses, waits to
receive the local log records back from those regions through which it can derive the state at those
regions over which that transaction must run, and can then process that transaction to completion
locally. Every other region, including the ones that write local data, see the same local logs and
also process that transaction locally.

3 TRANSACTION PROCESSING
When a new transaction arrives at the system, its coordinator invokes the function StartTxn shown
in Algorithm 1. Although most deterministic systems such as SLOG and Calvin do not require
assigning a globally unique identifier to a transaction upon arrival, most other highly consistent
ACID-compliant distributed systems — including CockroachDB and Spanner — give transactions
globally unique identifiers. Detock takes the latter approach despite being a deterministic system

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:5

since the identifier will be used in the concurrency control protocol. The globally unique ID is
generated by concatenating (in binary) a local transaction counter with a globally unique ID
statically assigned to the coordinator’s server (Line 2). The Home Directory is used to find the
cached values of the home regions for all data items in the read/write set (Line 3-10).

Algorithm 1: Starting a new transaction
1 function StartTxn(txn)
2 txn.id = new globally unique ID

3 if txn.isHomeMovement then /* see Section 4 */

4 key = txn.movedKey

5 txn.oldHome = HomeDirectory(key)

6 txn.homeInfo = {(key, txn.oldHome), (key, txn.newHome)}

7 else
8 txn.homeInfo = ∅ /* set of (key, region) pairs */

9 foreach key in txn.readSet
⋃

txn.writeSet do
10 Add HomeDirectory(key) to txn.homeinfo

11 regions = unique regions in txn.homeInfo

12 txn.isMultiHome = size of regions is larger than 1

13 if txn.isMultiHome then
14 /* oneway[r] is estimated one-way network delay to region r */

15 maxOneWay = Max(oneway[r] : ∀r in regions)

16 txn.timestamp = Now() + maxOneWay + overshoot

17 Call AppendLocalLog(txn) for every region in regions

The number of unique home regions retrieved is used to determine whether the transaction
is single-home or not (Line 11-12). Incorrect read/write sets (from the OLLP protocol) or home
information (from stale values in the Home Directory) are deterministically detected later during
execution and cause the transaction to abort and restart. However, these restarts are expected
to be uncommon in practice: OLLP aborts only occur when the access set of data depends on
the current state of the database [50, 54], while home information aborts only occur for a short
period of time after data is rehoused in a different region. The transaction is then forwarded to
the participating regions (Line 17). [The timestamp assigned in Line 16 is an optimization that is
described in Section 3.2.]

3.1 Single-home Transactions
The initial steps of transaction processing of single-home transactions Detock are identical to
those of SLOG: When a single-home transaction reaches a node at its presumed home region, the
Sequencer of that node runs the code in Algorithm 2, which puts the transaction into an in-memory
batch along with other concurrent transactions that arrive at the same node (Line 8). The size of
the batch window is configurable, and defaults to 5ms. The Sequencer then appends the batch to
the region’s local input log via Paxos (Line 10). After this point, the transaction is durably logged
for recovery. The position in the Paxos log, along with the contents of the batch is asynchronously
replicated from that region to every other region, such that every region eventually receives the
complete set of ordered batches from every other region (Line 11). Detock and SLOG also support
synchronous replication to near-by regions for improved robustness to region-failure.
At each region, the Paxos logs from all regions (including its own) are interleaved arbitrarily

by the Log Managers to form that region’s view of the global log (Line 14-19). Each region may

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


148:6 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

Algorithm 2: Appending transactions to the logs
1 function AppendLocalLog(txn)
2 localTxn = txn

3 if txn.isMultiHome then
4 Sleep until Now() ≥ txn.timestamp

5 exclude = keys in txn.homeInfo where region ≠ curRegion

6 localTxn.readSet = localTxn.readSet \ exclude
7 localTxn.writeSet = localTxn.writeSet \ exclude
8 Append localTxn to batch

9 upon batch is ready do
10 pos = Append batch to local Paxos log and get its position

11 Asynchronously call AppendGlobalLog(curRegion, pos, batch) for every region

12 batch = ∅
13 function AppendGlobalLog(reg, pos, batch)
14 localLogs[reg, pos] = batch

15 lastPos = position of the last batch in localLogs[reg] that was added to globalLog

16 while localLogs[reg, lastPos + 1] ≠ null do
17 foreach txn in localLogs[reg, lastPos + 1] do
18 Append txn to globalLog

19 lastPos = lastPos + 1

interleave transactions from different local logs into the global log in different ways; however, if
transaction B is after A in any region’s local log, B will be after A in every region’s global log, since
logs records from the same region are numbered by that region and never reordered.
From this point forward, Detock’s processing of single-home transactions differs from SLOG.

In both systems, each region executes all transactions found in its global log in parallel, but in a
manner equivalent to if they had been executed sequentially in log order. However, SLOG uses a
locking based mechanism to achieve this, while Detock uses an approach based on dependency
graphs in order to facilitate deadlock detection and resolution. Algorithm 3 presents the pseudocode
of the Scheduler where the dependency graph is constructed (Line 2-6).

Definition 3.1. Two transactions 𝑇𝑖 and 𝑇𝑗 are said to conflict on a tuple (𝑑, 𝑟 ), where 𝑑 is a data
item and 𝑟 is a region, if both transactions access 𝑑 , at least one of the transactions writes to 𝑑 , and
both of them expect 𝑟 to be 𝑑’s home region.

A dependency graph is a directed graph where vertices correspond to transactions, and an edge
(𝑇𝑖 ,𝑇𝑗 ) exists if and only if:
• 𝑇𝑖 is at a position earlier than 𝑇𝑗 in the global log1, and
• There exists a tuple (𝑑, 𝑟 ) such that both 𝑇𝑖 and 𝑇𝑗 conflict on (𝑑, 𝑟 ), and there does NOT exist a
transaction 𝑇𝑘 such that 𝑇𝑘 is between 𝑇𝑖 and 𝑇𝑗 in the global log and 𝑇𝑘 conflicts with both 𝑇𝑖
and 𝑇𝑗 on (𝑑, 𝑟 ).
Despite each region having slightly different versions of the global log, they are all guaranteed to

(eventually) construct the same dependency graph, since the definition of conflict prevents conflicts
across regions, and thus the order of interleaving logs from different regions does not impact the
ultimate structure of the dependency graph. Therefore, each region can process the transactions
1Traditionally, a wait-for graph is constructed with directed edges pointing away from the waiting transaction. However,
reversing the edges simplifies our implementation.

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:7

Algorithm 3: Scheduling transactions for execution
1 upon a new transaction txn is appended to globalLog do
2 newEdges = ∅
3 foreach (k, r) in txn.homeInfo do
4 prev = latest transaction that conflicts with txn on (k, r)

5 if prev ≠ null then Add (prev.id, txn.id) to newEdges

6 Add txn.id and newEdges to dependency graph G
7 Broadcast txn.id and newEdges to all local partitions

8 upon a transaction txn becomes ready; in a Worker thread do
9 if not txn.isHomeMovement then

10 foreach (key, region) in txn.homeInfo do
11 if region ≠ storage.getHome(key) then Abort txn

12 else if txn.oldHome ≠ storage.getHome(txn.movedKey) then
13 Abort txn

14 Execute the code in txn

15 Remove txn.id and its associated edges from G
16 /* Called periodically in a background thread */

17 function FindAndResolveDeadlocks()
18 G′

= FindStableSubgraph(G) /* see Section 3.2 */

19 foreach scc in FindSCCs(G′) do
20 Deterministically serialize scc in G

in its global log independently, without any communication with other regions, and arrive at the
same final state.

If all transactions are single-home, the dependency graph constructed from the global log will be
a directed acyclic graph (DAG). This is because edges can only arise among transactions within a
region, which are strictly ordered. Therefore, processing of the transactions follows the topology
order of that graph. A finished transaction is removed from the graph along with its outgoing edges.
A transaction is executed only when there are no more incoming edges pointing to it. Different
partitions in the same replica may only see partial views of the DAG. However, since there is
no cycle in a DAG (because all transactions are single-home), the transactions can be processed
without a distributed deadlock detection mechanism.

Transactions accessing multiple partitions in the same replica follow a deterministic execution
protocol similar to Calvin [54] and thus do not require two-phase commit. Unlike Calvin, before
accessing a data item, a transaction needs to check whether the home region identifier stored
alongside the data item matches with the expected home region retrieved previously from the Home
Directory. If they do not match, the transaction must be aborted and restarted (Line 9-13). Since all
regions eventually receive the same set of logs and the transactions are processed deterministically,
the regions apply the same sequence of updates to each data item. The home region identifier, being
part of a data item, is thus updated at the same point in that update sequence. Consequently, all
regions make the same decision as to whether to abort a transaction or not based on the comparison
between the actual home region identifier stored alongside the data item and the assumed home
region identifier stored in the transaction. Once a transaction finishes its execution, the scheduler
removes it from the graph and schedules transactions that become ready as a result of this removal
(Line 14-15).

Fig. 2 shows an example of single-home transaction processing. There are 3 regions: US, EU, and
AP, each of which holds a complete copy of the database. In this example, there is only one partition

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


148:8 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

T1: W(A)
T2: R(A)
T3: R(A)
T4: W(A)
T5: W(B)
T6: W(B)

T1

T2

T3

Dependency graph

A B  C

Local log

A B C

A B  C

US

EU

AP

Global log

T1 T2 T3

T5 T6

T1 T2 T3T5 T6

T5 T1 T3T2T6

T5 T1 T6T3T2

T4 T5 T6

T4

T4

T4

T4

Fig. 2. Single-home transaction processing

and one replica in each region. The local data of each region is shown in red and underlined.
The 4 transactions T1-T4 access data item A so they are ordered in the local log of the US region.
Transactions T5 and T6 access data item B so they are ordered in the EU region. Each region
eventually obtains the local log from every other region and interleaves them to form its view
of the global log. As noted above, the generated dependency graphs for the regions eventually
become identical, despite each region having a different version of the global log. The associated
home regions of the data items are not changed in this example, so conflict of two transactions is
determined based solely on their read and write operations on the data items, shown on the top
left of the figure.

3.2 Multi-home Transactions
When a multi-home transaction reaches a participating region, it follows a different protocol than
that of a single-home transaction. Each region uses the home information stored in the transaction
to generate a special kind of transaction called a GraphPlacementTxn that contains a list of the
keys from the original multi-home transaction that are local to the current region (Algorithm 2,
Line 3-7). One GraphPlacementTxn per transaction, designated by the coordinator, also contains
the original code for that transaction.
GraphPlacementTxns are initially treated like single-home transactions: they are put into the

local logs at their home regions, make their way to the global logs through local log replication,
and finally get added to the dependency graphs at every region.
For each transaction, 𝑇 , each region will eventually receive GraphPlacementTxns from each

region that the coordinator expected to house relevant data. The first GraphPlacementTxn for 𝑇
that is placed in a region’s global log causes a new vertex representing 𝑇 to be created in that
region’s graph. Subsequent GraphPlacementTxns of 𝑇 share this same vertex. Edges created by
these GraphPlacementTxns are added to that vertex.
GraphPlacementTxns establish an order between multi-home and single-home transactions at

the region that generated the GraphPlacementTxn. However, they do not globally order multi-home
transactions, since two different regions may generate GraphPlacementTxns for a set of multi-home

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:9

transactions in different orders. There is thus a concern that the generated graph may contain
cycles, which would lead to deadlock during processing.

T1:
R(A, B)
W(A, B)

T2:
R(A, B, C)
W(A, B, C)

T3:
R(A, C)
W(A, C)

T1 T2 T3Dependency graph 
with a deadlock

A B  C

Local log

A B C

A B  C

US

EU

AP

T2EUT1EU

T2APT3AP

Global log

T2US T1US T3US

T2EUT1EU T2US T1US T3US T2APT3AP

T2APT3AP T2EUT1EU T2US T1US T3US

T2US T1US T3US T2APT3AP T2EUT1EU

T1 T2 T3
Dependency graph with 

the deadlock resolved

Fig. 3. Deadlocks from multi-home transactions

Fig. 3 shows an example scenario that would lead to deadlock, using the same setup as Fig. 2.
Three multi-home transactions arrive at the system: T1 and T2 both access data item A and B,
so they are sent to the US and EU regions. T3 accesses data item A and C, so it is sent to the
US and AP regions. All accesses are read-modify-write operations. Each region generates the
GraphPlacementTxns for the multi-home transactions and places them in its local log; however,
the order they are placed differs across regions. Every region eventually receives all local logs and
constructs the dependency graph. The deadlocks manifest themselves as cycles in the dependency
graph. In this case, all three transactions are in a deadlock.
In order for the transactions to progress, the deadlocks must be eliminated, either by aborting

transactions or modifying the dependencies such that the graph is free of cycles. We chose the
latter approach because aborting and restarting transactions increase latency of those restarted
transactions. However, a key constraint is that this modification must be deterministic: every region
must independently make the same decision on how to resolve the deadlock without runtime
communication across regions.

One dependency modification strategy is to serialize the transactions following the order of their
IDs (see Section 3.1 for how IDs are generated). For example, the dependencies in the graph in
Fig. 3 can be changed such that the processing order of the transactions is T1, T2, and T3, as shown
at the bottom of the figure.
However, making this change deterministically is more complicated than it initially appears:

performing the deadlock resolution as soon as a cycle is detected may cause divergence. For example,
if a server in the EU region in Fig. 3 resolved the deadlock between T1 and T2 as soon as it saw
the first four log entries in the global log, only a single edge (T1, T2) would remain between T1
and T2. When the rest of the log arrived, the edges (T1, T3) and (T3, T2) were added to the graph.
This final graph is a DAG whose topological order is T1, T3, and T2, which is different from what

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


148:10 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

the order would have been if the deadlock was resolved only after receiving all the log entries.
Therefore, both the timing for when to run deadlock resolution along with the resolution itself
must be deterministic. This is complicated by the fact that each region interleaves local log records
into its global log differently and waiting too long to resolve deadlocks increases latency.

Deterministic deadlock resolution (DDR).We give an intuition for our deadlock resolution
algorithm by considering its naïve version over a finite set of transactions. Each region waits until
all transactions arrive then constructs a condensation of the dependency graph. A condensation of a
directed graph G is formed by contracting each strongly connected components (SCC) into a super
vertex, and adding a directed edge between two super vertices𝑈 and𝑉 if there is a directed edge in
G that starts in 𝑈 and ends in 𝑉 (e.g. Fig. 4a). A condensation is a DAG since it does not have any
SCC over its super vertices with a size larger than one. Therefore, we can find a topological order
on the condensation. We additionally impose an order agreed upon by every region on the vertices
within each super vertex (e.g. by their IDs). Determinism across regions can then be achieved by
executing the transactions following both of the these orders.

(a)

X

Y

(b)

Fig. 4. Example dependency graphs (in orange) and their condensations (in blue)

In reality, it is extremely inefficient or impossible to wait for all transactions to arrive as the
set of transactions may not be finite. On the other hand, running the algorithm at arbitrary time
may cause the regions to see different condensations. For example, another region will observe a
different condensation as shown in Fig. 4b if it runs the algorithm after the transaction 𝑋 arrives,
merging two of the SCCs together. To avoid this problem, the DDR algorithm identifies a stable
subgraph then finds and executes the SCCs only within this subgraph. These SCCs are guaranteed
to never mutate as new transactions arrive, thus ensuring convergence across the regions.

For clarity, we initially assume that the data at each region is not partitioned and will remove this
assumption shortly. For every vertex corresponding a multi-home transaction 𝑇 in the dependency
graph, let𝐺𝑃𝑡𝑜𝑡𝑎𝑙 (𝑇 ) be the total number of GraphPlacementTxns generated for𝑇 , a counter𝐺𝑃 (𝑇 )
is associated with𝑇 to keep track of the number of GraphPlacementTxns of𝑇 that have been added
to the graph so far. We define two types of vertices:
• A complete vertex 𝑇 is either a single-home transaction or a multi-home transaction with𝐺𝑃 (𝑇 )
equal to 𝐺𝑃𝑡𝑜𝑡𝑎𝑙 (𝑇 ).

• A stable vertex 𝑇 is a complete vertex and there does not exist a path going from an incomplete
vertex to 𝑇 .
For example, before 𝑋 is added to the graph in Fig. 4b, 𝑌 was not a complete vertex because

the edge (𝑋,𝑌 ) being missing implies at least one GraphPlacementTxn of 𝑌 was not added to the
graph. As a result, any vertex that 𝑌 had a path to was not stable.
Let G(T , E) be a dependency graph comprised of a vertex set T and an edge set E. The stable

subgraph of G is the graph G′ (T ′, E′) such that T ′ is the subset of T that contains all stable vertices
and E′ is the subset of E that contains all edges whose both incident vertices are in T ′. The stable

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:11

vertices set T ′ can be found with breadth-first search, starting from the set of incomplete vertices;
any traversed vertices (including the starting vertices) are marked as unstable; the remaining
untraversed vertices are stable vertices.

The DDR algorithm finds all SCCs in the stable subgraph G′; for each found SCC, it removes all
its edges and adds a new simple chain of edges between the vertices found within that SCC ordered
by their transaction IDs. After running this algorithm, the stable subgraph G′ will become a DAG
and, as an optimization, can be ignored in subsequent runs of the algorithm. We run this algorithm
in a background thread (Algorithm 3, Line 17-20) at a configurable interval, which we set to 40 ms.
This thread builds its own copy of the graph and communicates with the Scheduler via a message
queue to avoid access conflict on an otherwise shared graph.
When the data is partitioned within a replica, each partition may only have a partial view of

the dependency graph. Therefore, we make two modifications to the above algorithm. First, each
partition will periodically broadcast its view of the graph to all other partitions (Algorithm 3,
Line 7). Second, for every vertex 𝑇 , let 𝑃𝑎𝑟𝑡𝑡𝑜𝑡𝑎𝑙 (𝑇 ) be the number of partitions participating in 𝑇
and 𝑃𝑎𝑟𝑡 (𝑇 ) be the number of partitions participating in 𝑇 that have sent their views that included
𝑇 to the current server; for the vertex 𝑇 to be considered complete, regardless of whether it is
single-home or multi-home, it must satisfy that 𝑃𝑎𝑟𝑡 (𝑇 ) is equal to 𝑃𝑎𝑟𝑡𝑡𝑜𝑡𝑎𝑙 (𝑇 ), in addition to the
conditions stated above.

3.3 Proof of Correctness
We now prove that Detock achieves determinism and strict serializability. We use “component”
as short for “strongly connected component”. To simplify the proof, we slightly modify the DDR
and transaction execution algorithms. After reordering the vertices within a component C, the
deadlock resolver adds a virtual edge visible only to the deadlock resolver from the last vertex
to the first vertex of the series, so that the vertices in C continue to form a strongly connected
component after reordering. The transactions are executed following the non-virtual edges. After a
transaction’s execution, it is marked as executed instead of being removed from the graph, and its
outgoing edges become virtual edges.

Definition 3.2. A region 𝑅 determines a vertex𝑇 to be in a component C at a prefix 𝑝 of the global
log in 𝑅 if and only if the DDR algorithm computes𝑇 to be in C in the stable subgraph of the graph
constructed from 𝑝 .
Lemma 3.3. For any two regions 𝑅𝐴 and 𝑅𝐵 and two conflicting transactions 𝑇𝑖 and 𝑇𝑗 , if 𝑅𝐴

determines 𝑇𝑖 and 𝑇𝑗 to be in the same component at some prefix 𝑝𝐴 of the global log in 𝑅𝐴, then if
𝑅𝐵 determines 𝑇𝑖 and 𝑇𝑗 to be in some component(s) at some prefix 𝑝𝐵 of the global log in 𝑅𝐵 , it also
determines that 𝑇𝑖 and 𝑇𝑗 to be in the same component.

Proof. (By contradiction) Assume that 𝑅𝐵 determines𝑇𝑖 and𝑇𝑗 to be in two different components
C𝑖 and C𝑗 , respectively, at 𝑝𝐵 . In 𝑅𝐵 , since𝑇𝑖 and𝑇𝑗 are determined to be in two different components,
there does not exist a path either from 𝑇𝑖 to 𝑇𝑗 or from 𝑇𝑗 to 𝑇𝑖 in the graph constructed from 𝑝𝐵 .
Without loss of generality, we assume that the path from 𝑇𝑖 to 𝑇𝑗 does not exist.

In 𝑅𝐴, since 𝑇𝑖 and 𝑇𝑗 are determined to be in the same component, there exists a path 𝜌 from 𝑇𝑖
to 𝑇𝑗 in the graph at 𝑝𝐴.
While 𝜌 does not exist in 𝑅𝐵 at 𝑝𝐵 , there always exists in 𝑅𝐵 at 𝑝𝐵 a subpath of 𝜌 ending in 𝑇𝑗

(the subpath containing only 𝑇𝑗 is one such subpath). Let 𝑇𝑘 be the starting vertex of the longest
such subpath. 𝑇𝑘 cannot be 𝑇𝑖 because the whole path 𝜌 does not exists in 𝑅𝐵 at 𝑝𝐵 . Hence, there
exists a vertex 𝑇ℎ that is immediately precedes 𝑇𝑘 on 𝜌 .
The edge (𝑇ℎ , 𝑇𝑘 ) does not exist in 𝑅𝐵 at 𝑝𝐵 because the subpath from 𝑇𝑘 to 𝑇𝑗 is already the

longest. Per Detock’s protocol, 𝑅𝐵 will eventually construct the edge (𝑇ℎ , 𝑇𝑘 ) at some extension

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


148:12 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

of 𝑝𝐵 . This means 𝑇𝑘 is an an incomplete vertex at 𝑝𝐵 , which makes 𝑇𝑗 an unstable vertex at 𝑝𝐵
because there is a path from 𝑇𝑘 to 𝑇𝑗 . This is a contradiction because 𝑅𝐵 cannot determine 𝑇𝑗 to be
in the component 𝐶 𝑗 at 𝑝𝐵 if 𝑇𝑗 is still not stable. Therefore, 𝑅𝐵 must determine 𝑇𝑖 and 𝑇𝑗 to be in
the same component at 𝑝𝐵 . □

Lemma 3.4. If a region 𝑅 determines a vertex 𝑇 to be in a component C at some global log prefix 𝑝
in 𝑅, then 𝑅 determines 𝑇 to be in C at any extension of 𝑝 .

Proof. Let A𝑝 be the set of vertices each of which has a path leading to 𝑇 (A𝑝 contains 𝑇 ) in
the graph constructed from 𝑝 . Let 𝑝′ be some extension of 𝑝 .

|A𝑝′ | ≥ |A𝑝 | because we don’t remove vertices. |A𝑝′ | > |A𝑝 | only if there exists a vertex 𝑆 at
𝑝′ such that 𝑆 ∉ A𝑝 and 𝑆 has an outgoing edge pointing to a vertex in A𝑝 . An edge pointing to
some vertex 𝑉 can only come from a transaction preceding 𝑉 in the global log, but all vertices in
A𝑝 are already complete because 𝑇 is a stable vertex at 𝑝 . Therefore, such a vertex 𝑆 does not exist.
Hence, |A𝑝′ | = |A𝑝 |.

Consequently, the size of C does not grow as the global log extends from 𝑝 to 𝑝′. C also does not
shrink because we don’t remove edges and vertices as viewed by the deadlock resolver. As a result,
𝑅 still determines 𝑇 to be in C at 𝑝′. □

Lemma 3.3 implies that two regions eventually agree on which vertices constitute a component.
Lemma 3.4 implies that once a determination is made, it stays true forever. Together these lemmas
assert that the DDR algorithm is deterministic. Next, we show that Detock guarantees strict
serializability.

Lemma 3.5. The execution order of any two conflicting transactions 𝑇𝑖 and 𝑇𝑗 is the same in every
region.

Proof. It follows from lemmas 3.3 and 3.4 that eventually all regions agree on one of the following
cases:
𝑇𝑖 and𝑇𝑗 are in the same component. The DDR algorithm deterministically reorders them by their

IDs, and they are executed following this order in every region.
𝑇𝑖 and𝑇𝑗 are in different components. Since they conflict with each other, without loss of generality,

there is a path from 𝑇𝑖 to 𝑇𝑗 in the dependency graph in every region. By the execution algorithm,
𝑇𝑖 is executed before 𝑇𝑗 , and this order is the same in every region. □

Proposition 3.6. Transaction schedules are strictly serializable.

Proof. Detock eliminates cycles in the dependency graph using the DDR algorithm, thus its
execution schedule follows the topological order of a DAG, which can be written as a serial schedule.
Because of Lemma 3.5, all regions follow the same execution order, hence Detock guarantees
one-copy serializability.
Furthermore, non-concurrent transactions are executed according to their temporal order in

this serializable schedule: Let 𝑇𝑖 and 𝑇𝑗 be two conflicting transactions such that 𝑇𝑗 is sent to
Detock after𝑇𝑖 is executed and returned to a client.𝑇𝑖 getting executed means that the component
containing𝑇𝑖 (which may include only𝑇𝑖 ) is part of a stable subgraph. Therefore,𝑇𝑗 cannot be in the
same component as𝑇𝑖 , thus must be executed strictly after𝑇𝑖 . As a result, execution of transactions
in Detock is strictly serializable. □

3.4 Avoiding Livelock
The DDR algorithm finds and resolves stable SCCs that will never grow as new transactions
are added to the graph. These stable SCCs correspond to deadlocks which it deterministically

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:13

resolves. However, unstable SCCs also correspond to deadlocks, which the DDR algorithm cannot
immediately resolve. In theory, it is possible for an SCC to grow indefinitely and never become
stable, which results in livelock. Although such livelock is easy to prevent by forcing coordinators
to run an admission control algorithm that temporarily holds back transactions that conflict with
transactions tied up in a large unstable SCC until that SCC becomes stable, such admission control
increases transaction latency, and should only be used as a last resort. Preferably, large unstable
SCCs should be prevented in the first place. This is equivalent to taking measures to limit situations
where deadlock is likely to occur.

T1 T2 T3 T4

RA
T1 T3 T2 T5 T4 T7 T6

RB
T2 T1 T4 T3 T6 T5 …

T6 T7T5

Fig. 5. An example where an SCC can grow indefinitely.

Deadlock occurs when different regions insert multi-home transactions into their respective
local logs in different orders. High network delay between two regions increases the probability of
this occurring. For example, assume there are 2 regions 𝑅𝐴 and 𝑅𝐵 that are hundreds of milliseconds
apart in terms of network delay. The whole database consists of two data items 𝐴 and 𝐵, local to
𝑅𝐴 and 𝑅𝐵 , respectively. All transactions in this example access both of these keys and thus are
multi-home transactions. The first transaction T1 starts in 𝑅𝐴 and enters the local log of 𝑅𝐴 almost
immediately, while it takes hundreds of milliseconds for T1 to reach 𝑅𝐵 . In the meantime, another
transaction T2 starts in 𝑅𝐵 and enters the local log of 𝑅𝐵 before T1 reaches 𝑅𝐵 . It takes some time
for T2 to reach 𝑅𝐴, but before that happens, T3 starts in 𝑅𝐴 and enters the local log of 𝑅𝐴, and so on.
Fig. 5 shows the local logs and dependency graph of up to 7 transactions that got into this scenario.
At any time, the SCC cannot be resolved because the last transaction on this chain is always an
incomplete transaction and has a path leading to every other transaction (T7 in Fig. 5).
The best way to avoid deadlock and livelock is to attempt to have multi-home transactions be

inserted into every relevant region’s local log in the same order. However, Detock’s performance
requirements prevents it from being able to globally order multi-home transactions before they
begin, as done in other systems (such as SLOG, Fauna [4], and Calvin [54]). Instead Detock uses a
best-effort scheme called opportunistic ordering that merely reduces the probability of conflicting
orders of multi-home transactions (and thus deadlock), but does not eliminate it entirely.

When a transaction𝑇 first entersDetock, its coordinator assigns it a future (real time) timestamp
based on its local clock (Algorithm 1, Line 16). Each participating region inserts 𝑇 into its local
log as soon as possible after its local time exceeds this timestamp (Algorithm 2, Line 4). Thus, if
two transactions reach a region before their designated start times, they can be inserted into the
local log at that region by this timestamp order. This is true, even if the clock at that region is not
synchronized with the clocks at the regions which originally generated the timestamps of these
transactions. Therefore, if these two transactions arrive everywhere such that their timestamps
are later than the local clock at the location of their arrival, they are guaranteed to be consistently
ordered everywhere.

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


148:14 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

However, if a region receives 𝑇 at a local time later than the timestamp assigned to 𝑇 , all it can
do is to immediately insert 𝑇 to its local log. Therefore, two transactions can be placed into the
local log at a region out of (timestamp) order if at least one of them arrives after its designated
time. To reduce the probability of this occurring, the coordinator attempts to assign a timestamp
far enough into the future so that it will arrive everywhere prior to its designated start time. To
accomplish this, the future timestamp is computed by adding to the coordinator’s local time the
one-way delay to the farthest participating region (delay-wise) plus a small overshoot (2 ms in our
implementation) (Algorithm 1, Line 16).

The one-way delay from a region 𝑅𝐴 to another region 𝑅𝐵 is estimated by periodically sending
a message from 𝑅𝐴 to 𝑅𝐵 containing the sending time 𝑡𝐴, then 𝑅𝐵 responses with the time offset
𝑡𝐵 − 𝑡𝐴 where 𝑡𝐵 is the time when 𝑅𝐵 receives the message. The servers at 𝑅𝐴 compute a moving
average of this offset to smooth out noise and use this value as the one-way delay from 𝑅𝐴 to
𝑅𝐵 . This offset also incorporates the clock difference between the two regions (and thus can be
negative) so this scheme does not require highly synchronized clocks. Inaccurate estimation does
not affect the correctness of the system but potentially degrades its performance due to increased
number of deadlocks.

4 HOME-MOVEMENT TRANSACTIONS
When the locality of a workload changes (e.g. a user moves to a new continent), data migration
between regions is needed to keep up with access pattern changes. Detock carries out such data
migration by using home-movement transactions. It performs home movement within a transaction
to ensure strict serializability and avoid down time. Our description focuses on home-movement
transactions that involves only one data item at a time, but it is straightforward to extend this to
multiple data items.

As mentioned previously, the identifier of a data item’s home region is physically stored next to
the data itself. A home-movement transaction’s only action is to modify this identifier. In Algo-
rithm 1, a home-movement transaction 𝑇ℎ𝑚 has three self-explanatory fields: movedKey, oldHome,
and newHome. The values of movedKey and newHome are determined by the Detock system
component (or system administrator) that decides that the data item should be located at a particu-
lar region and submits the home-movement transaction. The value of oldHome is retrieved from
the HomeDirectory index at the transaction’s coordinator (Algorithm 1, Line 5). Unlike ordinary
transactions, a home-movement transaction does not only store the current home of a data item in
the homeInfo map but also its soon-to-be new home (Line 6). Consequently, a home-movement
transaction is always a multi-home transaction.

The home-movement transaction 𝑇ℎ𝑚 is treated exactly like an ordinary multi-home transaction
from this point onward. Each region processes the transaction after receiving its two component
GraphPlacementTxns,𝑇𝑜𝑙𝑑

ℎ𝑚
and𝑇𝑛𝑒𝑤

ℎ𝑚
, and then updates its HomeDirectory accordingly. Concurrent

transactions which access the moved data itemmay see the old or new home location when entering
the system. If they see the old location, they will be sent there and inserted into the old region’s
local log. If it gets inserted into that local log after 𝑇𝑜𝑙𝑑

ℎ𝑚
, it will abort and restart at runtime during

home validation. If it gets inserted prior to 𝑇𝑜𝑙𝑑
ℎ𝑚

, it logically occurs before the data item moved, and
will succeed.

According to Definition 3.1, two transactions conflict not only on the key they access but also on
the expected home region. If 𝑇 is a transaction that sees the old location and is placed before 𝑇𝑜𝑙𝑑

ℎ𝑚

but after 𝑇𝑛𝑒𝑤
ℎ𝑚

in the global log, this definition prevents 𝑇 from being blocked by 𝑇𝑛𝑒𝑤
ℎ𝑚

because
although they access the same key, 𝑇 expects the key’s home region to be oldHome whereas 𝑇𝑛𝑒𝑤

ℎ𝑚

expects that to be newHome.

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:15

Discussion. SLOG also introduces an algorithm for data home-movement [49]. However, the
algorithm described here is both easier to reason about and simpler to implement. Additionally,
SLOG’s home-movement algorithm requires storing a counter in the header of every data item, thus
increasing the size of the database, while this new algorithm does not require such a counter. The
key difference that enables these advantages is that SLOG’s algorithm makes the home-movement
transactions single-home, whereasDetock’s algorithm constructs them as multi-home transactions.

5 EVALUATION
We implemented Detock in C++ with ZeroMQ [1] for message passing between processes on
different nodes and between threads in the same process. Detock supports pluggable storage
layers, and currently defaults to an in-memory key-value store. Transactions are implemented as
stored procedures containing read and write operations over a set of keys2
The goal of Detock is to achieve high throughput and low latency for strictly serializable

transactions over a geo-replicated and geo-partitioned database. Hence, we compare Detock to
four other systems that also support globally distributed transactions: Calvin [54], SLOG [49], Janus
[40], and CockroachDB [52]. To reduce performance artifacts that are unrelated to the architectural
designs discussed in this paper, we re-implemented Calvin, SLOG, and Janus inside the Detock
codebase so that all four systems can use the same storage layer, communication library, local
consensus code, and logging infrastructure. Unlike Detock, Calvin globally orders all transactions,
and SLOG globally orders all multi-home transactions. The SLOG paper discusses two ways to do
this: (1) sending them all to the same region/ordering service and (2) performing global consensus
via Paxos or Raft. Option (2) increases the latency of every multi-home transaction by at least the
latency of the global consensus protocol (hundreds of milliseconds for a truly global deployment).
Option (1) only increases the latency of multi-home transactions that initiate far from the ordering
service. We experiment with both versions in Section 5.2, but since option (1) yields better latency,
we use it for both Calvin and SLOG in the other experimental sections, in order to present their
latency in the best possible light, even though it is less robust to region failure than option (2). Janus
generalizes the EPaxos protocol [38] to process distributed transactions, which (similar to Detock)
includes a reordering technique over a dependency graph and execution of transactions across
all replicas and shards deterministically following the graph order. However, unlike Detock’s
asynchronous protocol, Janus synchronously replicates data to every region so it needs at least one
WAN round-trip to all regions to commit.

2The source code is available at https://github.com/umd-dslam/Detock

Table 1. Round-trip time for all pairs of regions (ms)

apse2 apse1 apne2 apne1 euw2 euw1 usw2* usw1* use2
use1 197 211 173 148 75 67 66 61 12
use2 187 197 160 132 85 77 52 50

usw1* 137 169 134 107 145 136 20
usw2* 139 174 124 95 128 127
euw1 254 183 229 202 11
euw2 263 171 236 209
apne1 128 71 32
apne2 148 72
apse1 91
* Regions used only in the CockroachDB experiment.

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.

https://github.com/umd-dslam/Detock


148:16 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

We choose CockroachDB as another comparison point because it has support for state-of-
the-art geo-partitioning that uses a non-deterministic approach. CockroachDB inherits many
of its architectural principles from Spanner [17]. Both CockroachDB and Spanner are widely
used; however CockroachDB is the better comparison point since it is available as independent
downloadable code and can be deployed on the same cluster as our other experimental systems, and
supports geo-partitioning. Nonetheless, it is far more production-ready than the research prototype
of Detock, uses a more robust and fully-featured storage layer, and any raw performance numbers
with Detock would be oranges to apples. Therefore, we only report relative measurements and
observe the performance trends of each system.
Unless stated otherwise, we ran our experiments on Amazon EC2 using r5.4xlarge instances.

Each machine has 16 vCPU and 128GB of memory. We deployed the systems over 8 AWS regions:
us-east-1 (N. Virginia), us-east-2 (Ohio), eu-west-1 (Ireland), eu-west-2 (London), ap-northeast-1
(Tokyo), ap-northeast-2 (Seoul), ap-southeast-1 (Singapore), and ap-southeast-2 (Sydney). Table 1
contains the round-trip time for every pair of regions. In each region, a replica of the database is
partitioned across 4 machines.

The clients generating the workloads were deployed on separate machines spread evenly across
all regions, and had enough capacity to avoid being bottlenecks. Each client thread issued one
transaction at a time. Transactions are either single-home (SH) or multi-home (MH); separately,
as an unrelated consideration, they can be either single-partition (SP) or multi-partition (MP). SH
transactions access data in the region closest to the client that generates it. MH transactions access
data from two regions. Calvin and Janus do not assign home regions to data items, and do not
support geo-partitioning so their transactions can only be SP or MP. Each client weights the regions
following a Zipfian distribution such that regions closer to the client are more likely to be selected
for a MH transaction. We vary the percentage of MH and MP transactions.

5.1 Microbenchmark experiments
In our first set of experiments, we use a version of the Yahoo! Cloud Serving Benchmark (YCSB)
[16] adapted for transactions. The data consists of a single table containing a billion rows and two
columns: a 64-bit integer key and a value consisting of 100 random bytes. Identically to previous
work running experiments on this same dataset [49], contention of the workload is varied by
dividing the table into “hot records” and “cold records”. A transaction performed read-modify-write
on 2 hot records and 8 cold records; all records were uniformly selected at random. Contention is
varied by changing the size of the hot record set. We define HOT to be the reciprocal of the size
of the hot re cord set per partition. Thus, contention increases with the value of HOT. Our initial
set of experiments places all machines in the us-east-2 (Ohio) region, and uses tc [5] to simulate
the network round-trip time of the 8-region deployment with symmetric network paths and a
jitter uniformly distributed within 1 ms. This will allow us to directly vary network conditions in
Section 5.1.3. In Section 5.2 we remove the simulation and run over the real full 8-region deployment.

5.1.1 Throughput. Fig. 6a shows the peak throughput of Calvin, Janus, SLOG and Detock, with
and without opportunistic ordering. We varied the % MP and % MH parameters at two HOT settings
corresponding to a low contention workload (HOT = 0.0001) and a high contention workload (HOT
= 0.01).

Calvin and Janus do not distinguish between SH and MH transactions (since they do not support
geo-partitioning); thus their throughput stays constant as % MH is varied. In contrast, Detock and
SLOG are able to benefit from the presence of SH transactions in workload by processing them at
different regions in parallel, while Calvin and Janus have to order all transactions within a single
global log. Therefore, for workloads which geo-partition well (e.g. 15% or fewer MH transactions),

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:17

0 5 15 25 5075 100
0

25

50

75

100

125

th
ou

sa
nd

 tx
n/

s

HOT = 0.0001, MP = 0%

0 5 15 25 5075 100

HOT = 0.0001, MP = 50%

0 5 15 25 5075 100

HOT = 0.0001, MP = 100%

0 5 15 25 5075 100
% multi-home

0

25

50

75

100

125

th
ou

sa
nd

 tx
n/

s

HOT = 0.01, MP = 0%

0 5 15 25 5075 100
% multi-home

HOT = 0.01, MP = 50%

0 5 15 25 5075 100
% multi-home

HOT = 0.01, MP = 100%

Detock
Detock (w/o opportunistic ordering)

SLOG
SLOG (slow)

Calvin
Janus

(a) Throughput

0

200

400

600

800

nu
m

be
r o

f d
ea

dl
oc

ks

5 15 25 50 75 100
% multi-home

101

102

103

104

siz
e 

of
 a

 d
ea

dl
oc

k

Detock
Detock (w/o opportunistic ordering)

(b) Deadlocks

Fig. 6. Microbenchmark results

Detock and SLOG significantly outperform Calvin and Janus. However, as MH% increases past
30%, the geo-partitioning advantage of Detock and SLOG disappears and all systems perform
similarly. At extremely high levels of MH transactions, they even perform slightly worse than
Calvin, since they incur additional overhead to process MH transactions (i.e. dividing the transaction
into its region-local components, and inserting in the local log of each region). However, this lower
throughput relative to Calvin only occurs when the MP% is 0. When MP transactions are common,
all systems have to pay overhead processing the transaction across the different local machines that
own the partitions of data accessed. Therefore, Calvin only outperforms Detock in the scenario of
high MH% and low MP% — an unlikely scenario since if a workload is partitionable, it is generally
locally geo-partitionable as well.
Janus requires traversal of the dependency graph, including communication with other shards

for missing information, on the critical path of every transaction to determine when it is ready to
be executed. This overhead causes Janus to perform worse than other systems. Conversely, Detock
traverses the dependency graph and communicates with other partitions in a background thread
that wakes up periodically, thus its cost is amortized across the periodic runs. The throughput of
Janus drops further under high contention, especially when there are MP transactions, because the
graph grows faster and more cross-shard messages are needed.

Under low contention (top row), the best versions of Detock and SLOG have similar throughput.
However, under high contention (bottom row), Detock outperforms SLOG when there are MH
transactions in the workload. This is because SLOG must globally order all MH transactions.
Different regions involved in the transaction find out the order (and then insert the transaction
into their local log) at different points in time. The closer they are to the region that determines the
order, the faster they get started with the transaction. However, a transaction cannot release locks
until they receive local logs from all regions involved in the transaction, even remote regions. The
slowest transactions in SLOG therefore must hold locks for longer than the slowest transactions in
Detock. Under low contention, this does not affect performance. But under high contention, this
longer hold time reduces throughput. In contrast, Detock uses opportunistic ordering to insert the

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


148:18 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

GraphPlacementTxn to the local log at roughly the same time, thus distributing the blocking time
evenly across the regions.

SLOG’s performance depends on resources allocated for its ordering service. To show this, we ran
a version of SLOG where we cut down the number of threads used for deserializing and batching
the transactions in the ordering service from 8 to 1. The throughput of this version, shown as SLOG
(slow) in Fig. 6a, quickly drops as the amount of MH transactions increases. Detock is not bounded
by this constraint because it does not need an ordering service.

Surprisingly, even at large numbers of multi-home transactions, Detock’s performance is almost
unchanged relative to its performance at low contention. Such independence from performance
degradation for strictly serializable multi-region transactions at extremely high contention is rare
amongst current state-of-the-art systems and is an important advantage of Detock’s approach. The
comparison to CockroachDB’s state-of-the-art geo-partitioning system in Section 5.4 will further
highlight the Detock’s exceptional resilience to high contention workloads.
To understand the drop of throughput of the Detock version without opportunistic ordering,

we plotted the number and size of deadlocks (SCCs) of the two Detock versions for HOT = 0.01
and MP = 100% in Fig. 6b. The reason for the performance drop is thus due to the growth of the
number and size of deadlocks. When the contention is high, the probability that new incomplete
dependencies emerging while a deadlock is forming increases, preventing the DDR algorithm to
immediately resolve the deadlock.

0 25 50 75 100
% multi-home

0
50

100
150
200
250
300
350
400

la
te

nc
y 

(m
s)

Single-Home, HOT = 0.0001

0 25 50 75 100
% multi-home

Single-Home, HOT = 0.01

0 25 50 75 100
% multi-home

Multi-Home, HOT = 0.01

Detock p50
Detock p99

SLOG p50
SLOG p99

Calvin p50
Calvin p99

Janus p50
Janus p99

Fig. 7. Microbenchmark latency (MP = 100%)

5.1.2 Latency. We measured the end-to-end latency using a smaller number of clients to avoid
including queuing time. Fig. 7 presents the p50 and p99 latency at 100% MP. Under low contention,
SLOG and Detock3 achieve low latency for SH transactions as expected. Meanwhile, Calvin must
globally order all transactions so it results in an order of magnitude worse latency. Under high
contention, it becomes more likely for SH transactions to conflict with the longer-running MH
transactions, so SLOG and Detock have higher p99 latency for SH transactions in the presence of
MH transactions. However, the impact of this on Detock is much lower than SLOG.

For MH transactions, SLOG has a latency higher than Calvin’s and Detock’s because every MH
transaction in SLOG needs (1) a round-trip to the ordering region and (2) additional communication
to exchange the local logs across regions. Calvin also must pay cost (1) but avoids cost (2). Detock
must pay cost (2) but avoids cost (1). Therefore they have similar latency at p50. However, cost (2)
3The latency difference between Detock with and without opportunistic ordering is the amount of the overshoot (2ms).
All other delays caused by opportunistic ordering is overlapped with transaction processing and do not cause a latency
increase. To avoid cluttering the graph, we only show Detock with opportunistic ordering.

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:19

has a longer tail latency when locks are held during this communication; therefore, Calvin achieves
better p99 latency for MH transactions.

Janus has the highest latency because its fast quorums contain all replicas, thus every transaction
coordinator generally has to wait for the response from the slowest region.

5.1.3 Performance under different network conditions. The effectiveness of opportunistic ordering
and consequently the overall performance of Detock is affected by the accuracy of its estimation
of the one-way delay between two regions, and irregular network conditions may affect such
estimation. Therefore, we now study the effect of asymmetric network delay and jitter on Detock
performance. In the following experiments, we ran a workload with HOT = 0.01 (high contention),
MP = 100% and MH = 10%.

50:50 60:40 70:30 80:20 90:10
asymmetry ratio

0

10000

20000

30000

40000

50000

th
ro

ug
hp

ut
 (t

xn
/s

)

throughput
0

50

100

150

200

250

300

350

la
te

nc
y 

(m
s)

p50 latency
p90 latency
p99 latency

Fig. 8. Detock’s performance under asymmetric delay

A network path is asymmetric if the forward and backward one-way delays differ. We varied
the ratio between the one-way delays on the same network path. This ratio was applied to all
paths of every pair of servers across two different regions, with the direction of asymmetry
randomly chosen. Fig. 8 shows that delay asymmetry does not affect throughput. This is because
the opportunistic ordering scheme constantly monitors the one-way latency between regions and
adjusts its predictions accordingly. Accurate one-way predictions are sufficient to avoid deadlock.

In contrast to throughput, latency increases as the ratio becomes more extreme. This is because
in an asymmetric network, the region with the largest one-way delay from the coordinator might
not be the one with the largest round-trip time and vice-versa. Therefore, opportunistic ordering
scheme might cause the farthest region from the coordinator to hold off the transactions as if it is a
closer region, increasing the overall latency of the transaction.
Nonetheless, the impact on overall latency is insignificant except at extreme asymmetries, yet

studies have shown that 90% of the measured one-way delay on the Internet are within 40%–60%
of the round-trip time [43]. Furthermore, consistently severe asymmetry is unlikely in real-world
data center environments [57].
Network jitter is variance in network delay. We simulated different uniform jitter values in all

inter-region paths. Fig. 9a shows peak throughput, p99 latency at peak throughput (blue line), and
p99 latency at a lower load (red line). Increase in jitter impacts peak throughput more than latency
at an unsaturated load. Fig. 9b shows the reason for this result: at 15ms or more jitter, opportunistic
ordering’s delay estimations become inaccurate, and it becomes increasingly ineffective at pre-
venting deadlocks, which reduces throughput. Detock deployments with high network jitter need
higher overshoots. Fig. 9b showed that, with the default overshoot of 2 ms, Detock is robust to
jitter of up to 15 ms. Fig. 9c and 9d show that increasing to a 10 ms overshoot significantly reduces
the number of deadlocks, thereby raising throughput.

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


148:20 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

0 5 10 15 20 25 30 35 40
jitter (ms)

0

10

20

30

40

50

60

th
ro

ug
hp

ut
 (t

ho
us

an
ds

 tx
n/

s) throughput

102

103

104

M
H 

la
te

nc
y 

(m
s)

p99 latency
p99 latency (peak)

(a) Throughput and latency (2 ms)

0 5 10 15 20 25 30 35 40
jitter (ms)

0

50

100

150

200

250

300

to
ta

l s
ize

 (t
ho

us
an

ds
)

size

0

200

400

600

800

1000

1200

1400

co
un

t

count

(b) Deadlocks (2 ms)

0 5 10 15 20 25 30 35 40
jitter (ms)

0

10

20

30

40

50

60

th
ro

ug
hp

ut
 (t

xn
/s

)

throughput

102

103

104

M
H 

la
te

nc
y 

(m
s)

p99 latency
p99 latency (peak)

(c) Throughput and latency (10 ms)

0 5 10 15 20 25 30 35 40
jitter (ms)

0

50

100

150

200

250

300

to
ta

l s
ize

 (t
ho

us
an

ds
)

size

0

200

400

600

800

1000

1200

1400

co
un

t

count

(d) Deadlocks (10 ms)

Fig. 9. Network delay jitter experiments (numbers in parentheses are the opportunistic ordering overshoots)

5.2 TPC-C
Next, we evaluate Detock on the TPC-C benchmark [3] that is designed based on the activities of a
wholesale supplier with 9 tables and 5 types of transactions. TPC-C data is typically partitioned by
the warehouse table and we follow this partitioning by assigning different warehouses across the
eight physical regions in our deployment. We initialized the database with 1200 warehouses and 10
districts per warehouse. We followed the specification for the transaction mix ratio, displayed in
Table 2. However, transactions whose access set are dependent on a read were modified to remove
these dependencies, since the Detock codebase does not currently support dependent transactions.
For example, payment transactions select customers only by their IDs (instead of combination of
IDs and last names). Some new-order and payment transactions may access “remote” warehouses
in addition to their default warehouses. The specification only requires a remote warehouse to be
any warehouse other than the default one. However, we redefine a remote warehouse to specifically
be a warehouse that resides in a remote region; hence, these transactions become multi-home
transactions.

Table 2. TPC-C transactions mix ratio

new-order payment order-status delivery stock-level

SH 40.7% 42.3% 4.1% 4.1% 4.0%
MH 4.4% 0.4% 0% 0% 0%
total 45.1% 42.7% 4.1% 4.1% 4.0%

We ran the TPC-C workload while increasing the number of clients until reaching peak through-
put and plotted the p50 and p99 latency at different throughputs in Fig. 10a. Detock and SLOG

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:21

25000 30000 35000 40000 45000
throughput (txn/s)

101

102

103

la
te

nc
y 

(m
s)

Detock p50
Detock p99
SLOG p50

SLOG p99
Calvin p50
Calvin p99

Janus p50
Janus p99

(a) Throughput vs. latency with

increasing number of clients

0.00

0.25

0.50

0.75

1.00
us-east-2 us-east-1 eu-west-1 eu-west-2

100 101 102 103

latency (ms)

0.00

0.25

0.50

0.75

1.00
ap-northeast-1

100 101 102 103

latency (ms)

ap-northeast-2

100 101 102 103

latency (ms)

ap-southeast-1

100 101 102 103

latency (ms)

ap-southeast-2

Detock (MH) Detock (SH) SLOG (MH) SLOG (SH) Calvin Janus

(b) CDF of latency per region

Fig. 10. TPC-C results

have the same median latency because the majority of transactions in the TPC-C workload are
single-home. Calvin and Janus have much higher latency since every transaction has to be globally
ordered. Detock’s 99% latency is 66ms lower than SLOG’s because of its ability to avoid the global
ordering step for multi-home transactions which constitute 4.8% of the workload (see above, Section
5.1.2).

Detock reaches a higher peak throughput than SLOG, Calvin, and Janus do for the same reasons
discussed in Section 5.1.1.
To further explore the advantage of Detock for MH transactions, we plotted in Fig. 10b the

CDF of SH and MH transaction latency in every region. Note that the x-axis is log scaled. In
SLOG, the ordering service was in us-east-2. Therefore, the farther a region is from us-east-2 (US
East Coast), the more benefit is accrued from Detock’s ability to serve MH transactions without
making a round-trip to the ordering service. This results in many transactions having a factor of 5
better latency than SLOG’s. Us-east-1 and us-east-2 also have improvement in their transaction
latency because the ordering service incurs queuing and processing delays, which are not present
in Detock. The Calvin version in this experiment uses one region us-east-2 for ordering, thus the
latency of transactions increases as they originate farther away from the ordering region. On the
other hand, every transaction in Janus generally has to wait for responses from all regions for its
fast quorum, hence every region experiences high latency.

5.3 Scalability
We evaluate the scalability of Detock by running the microbenchmark as the number of machines
per region increases from 3 to 21. Fig. 11 shows the results of Detock in comparison to SLOG
under different settings of the parameters HOT, % MP, and% MH.

When MP and MH are 0, SLOG scales better than Detock due to the overhead of the background
thread in Detock which periodically scans the dependency graph for deadlocks. This overhead
can be mitigated by dynamically adjusting the activity of the background thread based on the
frequency of deadlocks. The cost of dependency graph management in Detock grows under high
contention when there are MH transactions in the workload and even more so when there are also
MP transactions. Thus, Detock reaches scalability limitations earlier in the lower graph where
contention is high. Figure 12 shows the reason for this is that the unstable part of the graph takes
longer to be resolved and thus grows large in the presence of MP transactions, as each partition
only has a partial view of the graph and needs to wait for more information from other partitions

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


148:22 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

3 5 7 9 11 13 15 17 19 21
machines per region

100

150

200

250

300

th
ou

sa
nd

 tx
n/

s

SLOG | MP = 0 | MH = 0
Detock | MP = 0 | MH = 0

SLOG | MP = 5 | MH = 5

SLOG | MP = 0 | MH = 5

Detock | MP = 0 | MH = 5
Detock | MP = 5 | MH = 5

HOT = 0.0001

3 5 7 9 11 13 15 17 19 21
machines per region

SLOG | MP = 0 | MH = 0
Detock | MP = 0 | MH = 0

Detock | MP = 5 | MH = 5

SLOG | MP = 0 | MH = 5

Detock | MP = 0 | MH = 5

SLOG | MP = 5 | MH = 5

HOT = 0.01

Fig. 11. Scalability of Detock and SLOG

to proceed. Either way, when there are multi-home transactions in the workload, the throughput of
both Detock and SLOG cease to increase after 15 machines per region. This is because multi-home
transactions generate extra GraphPlacementTxns in Detock and LockOnlyTxns in SLOG, the
routing of which becomes more complex as the cluster size increases. An improved routing layer
would increase the scalability of both systems.

0

50

100

th
ou

sa
nd

 v
er

tic
es

HOT = 0.0001, 0% MP 5% MH HOT = 0.0001, 5% MP 5% MH

0

50

100

th
ou

sa
nd

 v
er

tic
es

HOT = 0.01, 0% MP 5% MH HOT = 0.01, 5% MP 5% MH

stable subgraph unstable subgraph

Fig. 12. Graph size over time at 19 machines per region

5.4 Comparison to CockroachDB
CockroachDB is a distributed transactional database system that allows users to control the locality
of individual rows, and thus compares directly with Detock. Although, CockroachDB does not
guarantee strict serializability because it is susceptible to the causal reverse anomaly [27], it
guarantees strict serializability for the vast majority of practical workloads and its architecture is
based on Spanner which does guarantee strict serializability for all workloads. As discussed above,
the absolute performance numbers between Detock and CockroachDB are incomparable because
the two systems come from separate codebases. Nonetheless, information about the consequences
of the architectural differences can still be gleaned from their relative performance trends.
We deployed the two systems in 6 regions: us-east-1 (N. Virginia), us-east-2 (Ohio), us-west-1

(N. California), us-west-2 (Oregon), eu-west-1 (Ireland), and eu-west-2 (London); the inter-region
latency is included in Table 1. Each region had 3 c5.4xlarge EC2 instances (16 vCPU and 32GB
memory), as recommended by CockroachDB [6]. We used CockroachDB v21.1, which was the latest
version when we ran the experiment. To eliminate as many differences between the two systems
as possible, we configured CockroachDB such that it used the in-memory storage engine, had
only one copy per replica within a region, and parsed the SQL queries only once using prepared
statements. We sent every transaction in one shot, with automatic retry turned off so that we could

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


Detock: High Performance Multi-region Transactions at Scale 148:23

1e-06 1e-05 0.0001 0.001 0.01 0.1
HOT

10−2

10−1

100

no
rm

al
ize

d 
th

ro
ug

hp
ut

Detock
CockroachDB

MH = 0
MH = 50

MH = 100

(a) Normalized throughput

1e-06 1e-05 0.0001 0.001 0.01 0.1
HOT

0

20

40

60

80

100

%
 tr

an
sa

ct
io

ns

Committed
MH = 0

Aborted (deadlock)
MH = 50

Aborted (other)
MH = 100

(b) % committed/aborted of CockroachDB

Fig. 13. Comparison to CockroachDB

collect abort information. We ran the YCSB workload while varying the % MH and HOT parameters.
Since CockroachDB distributed data uniformly across servers in each region, most transactions are
multi-partition. Therefore, we compared against 100% MP for Detock.

The results of this experiment is normalized in Fig. 13a such that the throughput of Detock and
CockroachDB at different contention levels is relative to their throughput at the lowest contention
level. CockroachDB’s throughput plunges at high contention, and even more at high MH%. In the
worst case, the throughput of CockroachDB drops to less than 1% relative to its throughput at the
lowest contention. Conversely, Detock’s throughput decreases more gradually and slowly. It is
able to retain at least 76% of the original throughput at the highest contention level.
CockroachDB partitions the database into multiple consensus groups. Its geo-partitioning fea-

ture places each group within a single region so that nearby reads and writes have low latency.
CockroachDB uses a form of locking to handle write-write conflicts and thus is susceptible to
deadlocks. It breaks a deadlock by randomly aborting one of the transactions. Additionally, its
use of two-phase commit exacerbates the time a transaction needs to hold locks. Fig. 13b shows
the percentage of CockroachDB’s transactions that are committed, aborted due to deadlocks, and
aborted due to other reasons. When contention increases, CockroachDB aborts more transactions
because of deadlocks, causing wasted work. In contrast, Detock does not abort transactions due to
deadlocks.

6 RELATEDWORK
Graph-based concurrency control. Previous work has proposed analyzing transaction dependen-
cies to improve the performance of concurrency control protocols [22, 39, 40, 56, 59]. Furthermore,
dependency graphs have been used in practice for decades for deadlock detection [10]. Unlike
traditional deadlock detection algorithms, DDR guarantees deterministic deadlock resolution, and
therefore must generate a more complete dependency graph than traditional algorithms that only
need to identify and destroy simple cycles. DDR minimizes the cost of this extra work by using
opportunistic ordering to reduce the probability of deadlock.
Geo-replication. To achieve good latency and throughput, many geo-replicated systems use

asynchronous replication and opt for weak consistency models such as eventual consistency [2, 13,
20, 29, 32, 44, 53], strong session serializability [19], timeline consistency [15], or causal consistency
[21, 34, 35]. For stronger consistency (e.g. linearizability) over wide area network (WAN), the Paxos
[30, 31] and Raft [41] consensus protocols are commonly implemented [4, 17, 37, 39, 49, 52, 54, 58].

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.


148:24 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

These protocols require clients to send commands to a stable leader, causing a remote client to
pay for extra WAN round-trip time. EPaxos [38] is a leaderless consensus protocol which involves
tracking dependencies between commands and reordering strongly connected components, and has
similarities to Detock’s DDR algorithm. However, EPaxos has to rate limit to reduce the effect of
livelock by prioritizing executing old commands over starting new commands. In contrast, Detock
targets high conflict transactional workloads in which livelock would be common, and therefore
uses opportunistic ordering instead.

Distributed database systems.MDCC [28], Replicated Commit [37], TAPIR [60], Carousel [58],
and Ocean Vista [23] are globally distributed database systems that aim to cut down the number of
WAN round-trips but still incur cross-region latency for every transaction. In contrast, Detock
processes transactions with strict serializability without incurring cross-region latency (except for
multi-home transactions).

CockroachDB [52, 55], Spanner [17], Ocean Vista [23], and Dast [14] order transactions strictly
by their timestamps. In contrast, Detock does not require global ordering, and only generates
timestamps to reduce the probability of livelock, and can tolerate much large error bounds clock
accuracy. Also, unlike Spanner, inaccurate clocks never affect the correctness of Detock.
RingBFT [48] uses a deadlock avoidance technique where it passes cross-shard transactions

around the shards in a predetermined ring order, hence avoiding deadlocks and achieving high
throughput. Detock instead allows distributed deadlocks to occur and resolves them on the fly,
providing a new point in the tradeoff space when considering deadlock avoidance vs. detection
for geo-partitioned systems. RingBFT can tolerate Byzantine failures, which comes with high
latency, while Detock focuses on applications that benefit from low latency in non-Byzantine
environments.
G-Store [18], L-Store [33], DynaMast [7], and MorphoSys [8] co-locate data in a single node

using data migration or dynamic remastering to guarantee single-partition transactions. In contrast,
home movement in Detock is a rare event, and is never required during transaction processing.
Instead, it supports efficient multi-partition transactions.

Most proposed deterministic database systems cannot provide low-latency geo-distributed trans-
actions due to reliance on a centralized global sequencing layer [26, 36, 47, 51, 54].

7 CONCLUSION
While the related work described above must trade off consistency for latency, Detock is able to
completely side-step this trade-off for geographically partitionable workloads. Furthermore, even
for non-partitionable workloads, Detock is able to process strictly-serializable multi-partition
transactions with single round-trip latency and high throughput. Even under extremely high con-
tention we observed near-zero performance degradation, and orders of magnitude better throughput
robustness than CockroachDB.

ACKNOWLEDGMENTS
We would like to thank Pooja Nilangekar and the anonymous reviewers for their insightful com-
ments and suggestions that helped us improve the quality of this paper. This work is supported by
the National Science Foundation under grants DGE-1840340 and IIS-1910613.

REFERENCES
[1] 2007. ZeroMQ. https://zeromq.org/.
[2] 2009. MongoDB. https://mongodb.com.
[3] 2010. TPC Benchmark C. http://www.tpc.org/tpcc/.
[4] 2012. Fauna. https://fauna.com.

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.

https://zeromq.org/
https://mongodb.com
http://www.tpc.org/tpcc/
https://fauna.com


Detock: High Performance Multi-region Transactions at Scale 148:25

[5] 2021. tc(8) — Linux manual page. https://man7.org/linux/man-pages/man8/tc.8.html.
[6] 2022. Production checklist | CockroachDB Docs. https://www.cockroachlabs.com/docs/stable/recommended-

production-settings.htm.
[7] Michael Abebe, Brad Glasbergen, and Khuzaima Daudjee. 2020. DynaMast: Adaptive Dynamic Mastering for Replicated

Systems. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). 1381–1392. https://doi.org/10.1109/
ICDE48307.2020.00123

[8] Michael Abebe, Brad Glasbergen, and Khuzaima Daudjee. 2020. MorphoSys: Automatic Physical Design Metamorphosis
for Distributed Database Systems. Proc. VLDB Endow. 13, 13 (sep 2020), 3573–3587. https://doi.org/10.14778/3424573.
3424578

[9] Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei
Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing Scalable, Highly Available Storage for
Interactive Services. In Proceedings of the Conference on Innovative Data system Research (CIDR). 223–234. http:
//www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf

[10] Philip A Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1986. Concurrency Control and Recovery in Database
Systems. Addison-Wesley Longman Publishing Co., Inc., USA.

[11] P. A. Bernstein, D. W. Shipman, and W. S. Wong. 1979. Formal Aspects of Serializability in Database Concurrency
Control. IEEE Trans. Softw. Eng. 5, 3 (may 1979), 203–216. https://doi.org/10.1109/TSE.1979.234182

[12] Yuri Breitbart, Hector Garcia-Molina, and Avi Silberschatz. 1992. Overview of Multidatabase Transaction Management.
The VLDB Journal 1, 2 (oct 1992), 181–240. https://doi.org/10.1007/BF01231700

[13] Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo,
Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani.
2013. TAO: Facebook’s Distributed Data Store for the Social Graph. In Proceedings of the 2013 USENIX Conference on
Annual Technical Conference (San Jose, CA) (USENIX ATC’13). USENIX Association, USA, 49–60.

[14] Xusheng Chen, Haoze Song, Jianyu Jiang, Chaoyi Ruan, Cheng Li, Sen Wang, Gong Zhang, Reynold Cheng, and
Heming Cui. 2021. Achieving Low Tail-Latency and High Scalability for Serializable Transactions in Edge Computing.
In Proceedings of the Sixteenth European Conference on Computer Systems (Online Event, United Kingdom) (EuroSys ’21).
Association for Computing Machinery, New York, NY, USA, 210–227. https://doi.org/10.1145/3447786.3456238

[15] Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,
Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo!’s Hosted Data Serving Platform. Proc. VLDB
Endow. 1, 2 (aug 2008), 1277–1288. https://doi.org/10.14778/1454159.1454167

[16] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud
Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (Indianapolis, Indiana, USA)
(SoCC ’10). Association for Computing Machinery, New York, NY, USA, 143–154. https://doi.org/10.1145/1807128.
1807152

[17] James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat,
Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li,
Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito,
Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. Spanner: Google’s Globally Distributed
Database. 31, 3, Article 8 (aug 2013), 22 pages. https://doi.org/10.1145/2491245

[18] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. 2010. G-Store: A Scalable Data Store for Transactional Multi Key
Access in the Cloud. In Proceedings of the 1st ACM Symposium on Cloud Computing (Indianapolis, Indiana, USA) (SoCC
’10). Association for Computing Machinery, New York, NY, USA, 163–174. https://doi.org/10.1145/1807128.1807157

[19] K. Daudjee and K. Salem. 2004. Lazy database replication with ordering guarantees. In Proceedings. 20th International
Conference on Data Engineering. 424–435. https://doi.org/10.1109/ICDE.2004.1320016

[20] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s Highly Available
Key-Value Store (SOSP ’07). Association for Computing Machinery, New York, NY, USA, 205–220. https://doi.org/10.
1145/1294261.1294281

[21] Diego Didona, Rachid Guerraoui, Jingjing Wang, and Willy Zwaenepoel. 2018. Causal Consistency and Latency
Optimality: Friend or Foe? Proc. VLDB Endow. 11, 11 (jul 2018), 1618–1632. https://doi.org/10.14778/3236187.3236210

[22] Jose M. Faleiro, Alexander Thomson, and Daniel J. Abadi. 2014. Lazy Evaluation of Transactions in Database Systems. In
Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD
’14). Association for Computing Machinery, New York, NY, USA, 15–26. https://doi.org/10.1145/2588555.2610529

[23] Hua Fan and Wojciech Golab. 2019. Ocean Vista: Gossip-Based Visibility Control for Speedy Geo-Distributed Transac-
tions. Proc. VLDB Endow. 12, 11 (jul 2019), 1471–1484. https://doi.org/10.14778/3342263.3342627

[24] Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. 12,
3 (jul 1990), 463–492. https://doi.org/10.1145/78969.78972

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.

https://man7.org/linux/man-pages/man8/tc.8.html
https://www.cockroachlabs.com/docs/stable/recommended-production-settings.htm
https://www.cockroachlabs.com/docs/stable/recommended-production-settings.htm
https://doi.org/10.1109/ICDE48307.2020.00123
https://doi.org/10.1109/ICDE48307.2020.00123
https://doi.org/10.14778/3424573.3424578
https://doi.org/10.14778/3424573.3424578
http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
https://doi.org/10.1109/TSE.1979.234182
https://doi.org/10.1007/BF01231700
https://doi.org/10.1145/3447786.3456238
https://doi.org/10.14778/1454159.1454167
https://doi.org/10.1145/1807128.1807152
https://doi.org/10.1145/1807128.1807152
https://doi.org/10.1145/2491245
https://doi.org/10.1145/1807128.1807157
https://doi.org/10.1109/ICDE.2004.1320016
https://doi.org/10.1145/1294261.1294281
https://doi.org/10.1145/1294261.1294281
https://doi.org/10.14778/3236187.3236210
https://doi.org/10.1145/2588555.2610529
https://doi.org/10.14778/3342263.3342627
https://doi.org/10.1145/78969.78972


148:26 Cuong D. T. Nguyen, Johann K. Miller, & Daniel J. Abadi

[25] Shady Issa, Miguel Viegas, Pedro Raminhas, Nuno Machado, Miguel Matos, and Paolo Romano. 2020. Exploiting
Symbolic Execution to Accelerate Deterministic Databases. In 2020 IEEE 40th International Conference on Distributed
Computing Systems (ICDCS). 678–688. https://doi.org/10.1109/ICDCS47774.2020.00040

[26] Bettina Kemme and Gustavo Alonso. 2000. Don’t Be Lazy, Be Consistent: Postgres-R, A New Way to Implement
Database Replication. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00). Morgan
Kaufmann Publishers Inc., San Francisco, CA, USA, 134–143.

[27] Spencer Kimball and Irfan Sharif. 2022. Living Without Atomic Clocks. https://www.cockroachlabs.com/blog/living-
without-atomic-clocks/.

[28] Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, and Alan Fekete. 2013. MDCC: Multi-Data Center
Consistency. In Proceedings of the 8th ACM European Conference on Computer Systems (Prague, Czech Republic) (EuroSys
’13). Association for Computing Machinery, New York, NY, USA, 113–126. https://doi.org/10.1145/2465351.2465363

[29] Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper.
Syst. Rev. 44, 2 (apr 2010), 35–40. https://doi.org/10.1145/1773912.1773922

[30] Leslie Lamport. 1998. The Part-Time Parliament. ACM Trans. Comput. Syst. 16, 2 (may 1998), 133–169. https:
//doi.org/10.1145/279227.279229

[31] Leslie Lamport. 2001. Paxos Made Simple. ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121,
December 2001) (December 2001), 51–58. https://www.microsoft.com/en-us/research/publication/paxos-made-simple/

[32] Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno Preguiça, and Rodrigo Rodrigues. 2012. Making
Geo-Replicated Systems Fast as Possible, Consistent When Necessary. In Proceedings of the 10th USENIX Conference on
Operating Systems Design and Implementation (Hollywood, CA, USA) (OSDI’12). USENIX Association, USA, 265–278.

[33] Qian Lin, Pengfei Chang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Zhengkui Wang. 2016. Towards a Non-2PC
Transaction Management in Distributed Database Systems (SIGMOD ’16). Association for Computing Machinery, New
York, NY, USA, 1659–1674. https://doi.org/10.1145/2882903.2882923

[34] Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2011. Don’t Settle for Eventual: Scalable
Causal Consistency for Wide-Area Storage with COPS (SOSP ’11). Association for Computing Machinery, New York,
NY, USA, 401–416. https://doi.org/10.1145/2043556.2043593

[35] Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2013. Stronger Semantics for Low-
Latency Geo-Replicated Storage. In Proceedings of the 10th USENIX Conference on Networked Systems Design and
Implementation (Lombard, IL) (nsdi’13). USENIX Association, USA, 313–328.

[36] Yi Lu, Xiangyao Yu, Lei Cao, and Samuel Madden. 2020. Aria: A Fast and Practical Deterministic OLTP Database. Proc.
VLDB Endow. 13, 12 (jul 2020), 2047–2060. https://doi.org/10.14778/3407790.3407808

[37] Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, and Amr El Abbadi. 2013. Low-Latency
Multi-Datacenter Databases Using Replicated Commit. Proc. VLDB Endow. 6, 9 (jul 2013), 661–672. https://doi.org/10.
14778/2536360.2536366

[38] Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is More Consensus in Egalitarian Parliaments
(SOSP ’13). Association for Computing Machinery, New York, NY, USA, 358–372. https://doi.org/10.1145/2517349.
2517350

[39] Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting More Concurrency from Distributed
Transactions. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (Broomfield,
CO) (OSDI’14). USENIX Association, USA, 479–494.

[40] Shuai Mu, Lamont Nelson, Wyatt Lloyd, and Jinyang Li. 2016. Consolidating Concurrency Control and Consensus for
Commits under Conflicts. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation
(Savannah, GA, USA) (OSDI’16). USENIX Association, USA, 517–532.

[41] Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In Proceedings of
the 2014 USENIX Conference on USENIX Annual Technical Conference (Philadelphia, PA) (USENIX ATC’14). USENIX
Association, USA, 305–320.

[42] Christos H. Papadimitriou. 1979. The Serializability of Concurrent Database Updates. J. ACM 26, 4 (oct 1979), 631–653.
https://doi.org/10.1145/322154.322158

[43] Abhinav Pathak, Himabindu Pucha, Ying Zhang, Y. Charlie Hu, and Z. Morley Mao. 2008. A Measurement Study of
Internet Delay Asymmetry. In Proceedings of the 9th International Conference on Passive and Active Network Measurement
(Cleveland, OH, USA) (PAM’08). Springer-Verlag, Berlin, Heidelberg, 182–191.

[44] Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer, and Alan J. Demers. 1997. Flexible Update
Propagation for Weakly Consistent Replication. In Proceedings of the Sixteenth ACM Symposium on Operating Systems
Principles (Saint Malo, France) (SOSP ’97). Association for Computing Machinery, New York, NY, USA, 288–301.
https://doi.org/10.1145/268998.266711

[45] Seth Proctor. 2013. Exploring the Architecture of the NuoDB Database, Part 1. https://www.infoq.com/articles/nuodb-
architecture-1.

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 148. Publication date: June 2023.

https://doi.org/10.1109/ICDCS47774.2020.00040
https://www.cockroachlabs.com/blog/living-without-atomic-clocks/
https://www.cockroachlabs.com/blog/living-without-atomic-clocks/
https://doi.org/10.1145/2465351.2465363
https://doi.org/10.1145/1773912.1773922
https://doi.org/10.1145/279227.279229
https://doi.org/10.1145/279227.279229
https://www.microsoft.com/en-us/research/publication/paxos-made-simple/
https://doi.org/10.1145/2882903.2882923
https://doi.org/10.1145/2043556.2043593
https://doi.org/10.14778/3407790.3407808
https://doi.org/10.14778/2536360.2536366
https://doi.org/10.14778/2536360.2536366
https://doi.org/10.1145/2517349.2517350
https://doi.org/10.1145/2517349.2517350
https://doi.org/10.1145/322154.322158
https://doi.org/10.1145/268998.266711
https://www.infoq.com/articles/nuodb-architecture-1
https://www.infoq.com/articles/nuodb-architecture-1


Detock: High Performance Multi-region Transactions at Scale 148:27

[46] Seth Proctor. 2013. Exploring the Architecture of the NuoDB Database, Part 2. https://www.infoq.com/articles/nuodb-
architecture-2.

[47] Thamir Qadah, Suyash Gupta, and Mohammad Sadoghi. 2020. Q-Store: Distributed, Multi-partition Transactions via
Queue-oriented Execution and Communication.. In EDBT (Copenhagen, Denmark). 73–84.

[48] Sajjad Rahnama, Suyash Gupta, Rohan Sogani, Dhruv Krishnan, and Mohammad Sadoghi. 2022. RingBFT: Resilient
Consensus over Sharded Ring Topology. In EDBT (Edinburgh, UK). 298–311.

[49] Kun Ren, Dennis Li, and Daniel J. Abadi. 2019. SLOG: Serializable, Low-Latency, Geo-Replicated Transactions. Proc.
VLDB Endow. 12, 11 (jul 2019), 1747–1761. https://doi.org/10.14778/3342263.3342647

[50] Kun Ren, Alexander Thomson, and Daniel J. Abadi. 2014. An Evaluation of the Advantages and Disadvantages of
Deterministic Database Systems. Proc. VLDB Endow. 7, 10 (jun 2014), 821–832. https://doi.org/10.14778/2732951.2732955

[51] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. 2007.
The End of an Architectural Era: (It’s Time for a Complete Rewrite). In Proceedings of the 33rd International Conference
on Very Large Data Bases (Vienna, Austria) (VLDB ’07). VLDB Endowment, 1150–1160.

[52] Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy
Woods, Anne Birzin, Raphael Poss, Paul Bardea, Amruta Ranade, Ben Darnell, Bram Gruneir, Justin Jaffray, Lucy
Zhang, and Peter Mattis. 2020. CockroachDB: The Resilient Geo-Distributed SQL Database. In Proceedings of the 2020
ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for
Computing Machinery, New York, NY, USA, 1493–1509. https://doi.org/10.1145/3318464.3386134

[53] D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. 1995. Managing Update
Conflicts in Bayou, a Weakly Connected Replicated Storage System. In Proceedings of the Fifteenth ACM Symposium on
Operating Systems Principles (Copper Mountain, Colorado, USA) (SOSP ’95). Association for Computing Machinery,
New York, NY, USA, 172–182. https://doi.org/10.1145/224056.224070

[54] Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin:
Fast Distributed Transactions for Partitioned Database Systems (SIGMOD ’12). Association for Computing Machinery,
New York, NY, USA, 1–12. https://doi.or