Practical Hurdles In Crab Latching Concurrency
This post contains challenges I faced in implementing a thread-safe, concurrent B+Tree implementation. You can find the code here and tests here. A lot of engineering effort in code reviews, tests, analyzers (ThreadSanitizer) was required to finally arrive at an implementation without deadlocks and data races.
You will see that the same thinking and techniques applies to implementing any thread-safe, concurrent data structures.
What is Crab Latching?
A B+Tree node embeds a Readers/Writer Latch which allows either a shared read-only access (multiple readers) or an exclusive write access (single writer only). When a thread acquires a latch on a node, it can now traverse down the tree by acquiring a latch on a child node.
When a latch is acquired on the child node, only then is the parent latch released. Holding the parent latch is a safety requirement to ensure that the reference to the child node is valid and not a dangling reference. Since exclusive access is only ever granted to a single writer, we can be certain that while holding the latch on the parent node, the child references were not modified by another writer thread.
A writer thread performs an additional check to see if the current operation will result in an either an overflow (node has to split into two) or an underflow (sibling nodes have to be merged into one). If the answer is "yes", then the exclusive write latch on the parent node is not released immediately.
This is expected to happen rarely and the only time when a path segment in the B+Tree is latched for the duration of a write operation.
If later the writer finds a descendant node which will not cause an overflow or underflow, then all the ancestor latches are released after acquiring a latch on the descendant node.
The crab latching is a fine-grained concurrency protocol which allows multiple readers and writers to concurrently and safely access the B+Tree data structure with minimal contention.
The Need for Top-Down Latch Ordering
There are broadly two ways to handle deadlocks. The first approach is to avoid it through careful programming techniques and guarantee deadlocks will not happen at compilation time. The other way is to detect a deadlock when it happens and resolve it during runtime. For concurrent data structures, which do not embed a runtime the preferred approach is the first one, where we avoid deadlocks from ever occurring.
In crab latching concurrency protocol, a top-down latch ordering is enforced to avoid deadlocks.
In the example shown above a deadlock happens when two threads cross each other in the opposite direction.
The writer Thread A
holds an exclusive latch on the leaf node (green), and is blocked waiting to acquire an exclusive latch on its parent inner node (yellow).
The reader Thread B
holds a shared latch on the parent inner node (yellow), and is blocked waiting to acquire a shared latch on its child leaf node (green).
Remember in crab latching, we release the currently held latch only after we acquire a latch on the next node. Here Thread A
cannot acquire a latch on the parent inner node until Thread B
releases its latch on the parent inner node. Similarly, Thread B
cannot acquire a latch on the child leaf node until Thread A
releases its latch on the child leaf node.
The end result is that both threads will block indefinitely causing a deadlock.
If a strict top-down latch ordering is enforced, this situation will never happen. In a correct implementation, Thread A
will release its exclusive latch on the leaf node abandoning its current traversal. It will restart traversal from the root node now with the knowledge that the parent node needs to be exclusively latched before the leaf node. All readers and writers will proceed from a top-down direction, never creating a deadlock.
Crab Latching is Performant
Latches are held only during a critical section, that is, while a data structure is read or updated. Deadlocks are avoided by appropriate coding disciplines, for example, requesting multiple latches in carefully designed sequences. Deadlock resolution requires a facility to rollback prior actions, whereas deadlock avoidance does not. Thus, deadlock avoidance is more appropriate for latches, which are designed for minimal overhead and maximal performance and scalability. Latch acquisition and release may require tens of instructions only, usually with no additional cache faults since a latch can be embedded in the data structure it protects.
Goetz Graefe, A Survey of B-Tree Locking Techniques (2010)
As Graefe notes, latches are held only during a critical section for a very short duration. This requires a careful code review of the critical sections in implementation. You can see here an implementation of the Readers-Writer Latch which is a light-weight wrapper over std::shared_mutex.
Here you can see what it means to embed the latch in a B+Tree node for better cache locality.
The Optimistic Fast Path for Writers
The chances of a node splitting (overflow) from inserting an entry or a node merging with a sibling node (underflow) from removing an entry is rare. An optimistic approach banks on this property of a B+Tree to execute a fast path for insertions and deletions.
In the fast path, a writer thread acquires shared latches starting at the root node of the B+Tree. Only when it reaches the leaf node does it upgrade to an exclusive latch for modifying the contents of the leaf node. In the average case, the operation ends without either a node overflow or underflow.
However, the optimistic approach is abandoned if the leaf node will split after inserting an entry or it has to merge with a sibling after removing an entry. Then a pessimistic traversal is restarted from the root node, acquiring exclusive latches.
For performance reasons, even in the pessimistic approach care is taken to release exclusive latches wherever possible. This is done by categorizing a descendant node as either safe or unsafe. An unsafe node is expected to either overflow or underflow during the current write operation. Here, the exclusive latch on the parent node is held for the duration of the write.
In the pessimistic approach if we encounter a unsafe descendant followed by a safe descendant, then the exclusive latches acquired on ancestors are immediately released. This is an optimization which is necessary to minimize the duration for which path segments in a B+Tree are exclusively latched. The paths which are exclusively latched force sequential access blocking other threads to wait, until the current write operation completes.
The worst case scenario is when the path from root to leaf node is exclusively latched. This blocks all the other threads from entering the B+Tree until the exclusive latch on the root node is released.
The Conflict with Range Scans
The B+Tree data structure stores the keys in sorted order in its leaf nodes. The leaf nodes are connected to each other like a doubly linked list. This makes it possible to perform efficient key range operations.
What happens if we have an iterator going forwards and another going backwards? We know that if we acquire latches going in the opposite directions, it is a recipe for creating deadlocks. Mixing a regular top-down traversal, with a sideways scan can also create deadlocks.
The main challenge here is to support bi-directional scans without creating deadlocks. This is a critical feature necessary for supporting index scans. Here is some code to demonstrate the API:
// A forward index scan
for (auto iter = index.Begin(); iter != index.End(); ++iter) {
int key = (*iter).first;
int value = (*iter).second;
}
// A reverse index scan
for (auto iter = index.RBegin(); iter != index.REnd(); --iter) {
int key = (*iter).first;
int value = (*iter).second;
}
/**
* index.Begin() : first element
* index.End() : one-past-the-last element
* index.RBegin() : last element
* index.REnd() : one-past-the-first element
*/
Non-blocking Shared Latch
Earlier we avoided deadlocks in traversals because they follow a natural top-down direction. Here unfortunately, it is not possible to restrict scans to either a left-right or right-left order. Solving this problem requires adding constraints to the iterator.
If the iterator cannot be used to modify a B+Tree node, we only need to acquire a shared read-only latch for bidirectional scans. Next, we need to introduce a non-blocking latch which always returns immediately. It returns true
if the latch is acquired and false
otherwise.
The implementation of a non-blocking shared latch looks like this:
#include <shared_mutex>
class SharedLatch {
// Blocking
//
// If another thread holds the latch, execution will block
// until the latch is acquired.
//
// Used in insert, delete & search index operations
void LockShared() {
latch_.lock_shared();
}
// Non-blocking
//
// Tries to acquire a latch. If successful returns `true`,
// otherwise returns `false`.
//
// Used in ascending & descending index scans
bool TryLockShared() {
return latch_.try_lock_shared();
}
private:
// A wrapper around std::shared_mutex
std::shared_mutex latch_;
}
An Explicit Scan Retry State
If the non-blocking shared latch on the next sibling node cannot be acquired, then we set its state to RETRY
. The shared latch which is held on the current leaf node is also released immediately.
enum IteratorState {
VALID, INVALID, END, REND, RETRY
};
It is now the responsibility of the caller to decide how to proceed when the RETRY
state happens during index scans.
Deadlock due to Lock Order Inversion
The problem with iterators is that the shared latch they hold on the current leaf node is not released until the current node is consumed through iteration. The following code can therefore result in a lock order inversion (deadlock).
auto iter1 = index.Begin();
// The second iterator creates a lock order inversion.
auto iter2 = index.Begin();
The first iterator already holds a shared latch on a leaf node. The second iterator has to begin traversal from the root. This violates the strict top-down ordering because in this thread we already hold a latch on the leaf node, and now we are attempting to acquire a latch on the root node.
If we replace the second iterator in the code above with either search, insert, or delete operation, even then the lock order inversion applies.
To prevent this we have to ensure that the iterator lifetime does not overlap with any other tree operation including other scans. The code above rewritten like this will avoid deadlocks:
// SAFE: Iterators are used in separate, non-overlapping scopes.
{
auto iter1 = index.Begin();
// ... use iter1
} // iter1 is destroyed here, its latch is released.
{
auto iter2 = index.Begin();
// ... use iter2
} // iter2 is destroyed here, its latch is released.
Deadlock Avoidance in Symmetric Deletion
If a node underflows after removing an entry you can either merge with a sibling or redistribute entries from a sibling without merging. I have previously written about node underflows.
In symmetric deletion you can merge with either a left or right sibling node. This is applicable to both inner nodes and leaf nodes. Since a merge can happen in either a left-right or right-left direction, it is important to enforce a single ordering to prevent deadlocks.
Say the ordering we choose is left-right. So in a right-left merge, we need to release the exclusive shared latch on the current (right) node. Then we acquire an exclusive latch on the previous (left) node, followed by reacquiring an exclusive latch again on the current (right) node. Now we have acquired the locks in the correct order: left-right.
Summary
- A latch is embedded in the B+Tree node for better cache locality. Latches are lightweight and performant.
- Avoiding deadlocks requires enforcing a global ordering in how latches are acquired within a single thread as well as across multiple threads.
- Careful programming is required to ensure that latches are only held for short duration in the critical sections where node data is read or updated.
- Index scans use a non-blocking shared latch to avoid deadlocks.
- Crab latching concurrency protocol avoids root blocking and degradation to serial execution. Even in the pessimistic case it ensure that exclusive latches in path segments are held when strictly necessary.