How Frontrun Works¶
Frontrun provides three approaches for controlling thread interleaving. They operate at different levels of granularity and make different trade-offs between interpretability, coverage, and the kinds of bugs they can find.
DPOR (Systematic Exploration)¶
DPOR (Dynamic Partial Order Reduction) systematically explores every meaningfully different thread interleaving. Where the bytecode explorer samples randomly, DPOR guarantees completeness: every distinct interleaving is tried exactly once and redundant orderings are never re-run.
Like the bytecode explorer, DPOR instruments at the opcode level and needs no manual markers. It automatically detects attribute reads/writes, subscript accesses, and lock operations. The exploration engine is written in Rust for performance.
from frontrun.dpor import explore_dpor
class Counter:
def __init__(self):
self.value = 0
def increment(self):
temp = self.value
self.value = temp + 1
result = explore_dpor(
setup=Counter,
threads=[lambda c: c.increment(), lambda c: c.increment()],
invariant=lambda c: c.value == 2,
)
assert result.property_holds, result.explanation
When a race is found, result.explanation contains an interleaved source-line
trace showing which threads accessed which shared state and the conflict pattern.
DPOR’s traces are more interpretable than the bytecode explorer’s because DPOR
knows why the interleaving matters — it detected specific conflicting accesses
to the same object and chose to explore the alternative ordering.
Scope: DPOR explores alternative schedules where it detects a
conflict — either two threads accessing the same Python object (with at
least one write) or two threads performing I/O on the same network
endpoint or file path. When run under the frontrun CLI, a native
LD_PRELOAD library intercepts C-level I/O operations (send,
recv, read, write, etc.) and feeds them into the DPOR
engine, so even opaque database drivers (libpq, mysqlclient) and Redis
clients (hiredis) are covered — see C-Level I/O Interception
below.
DPOR does detect many C-level operations: list.append,
dict.__setitem__, and similar mutating methods are seen via
sys.setprofile; builtins like sorted(), sum(), min(),
max(), and str.join() are registered as passthrough reads on
their arguments; and container constructors (list(), dict(),
enumerate(), zip(), etc.) are tracked as reads on their inputs.
The gap is C-level iteration interleaving. DPOR treats each C call
as a single atomic operation, but under PEP 703 (free-threaded Python),
C functions that iterate via PyIter_Next — such as
list(od.keys()) — acquire and release the per-object lock on each
element, allowing another thread to mutate the collection between
iterations. When both sides of a race are single C opcodes (e.g.
list(od.keys()) vs od.move_to_end("a")), no bytecode-level tool
can expose the interleaving. See PEP-703-REPORT.md for details.
DPOR also cannot see shared state managed entirely inside a C extension without any I/O or Python-visible operations — for example, in-process mutations of NumPy arrays or C-level caches with no Python API calls.
For a practical guide see DPOR in Practice. For the algorithm details and theory see DPOR: Dynamic Partial Order Reduction.
Bytecode Instrumentation¶
Bytecode instrumentation automatically instruments functions at the opcode level — no markers needed.
How It Works:
Each thread is run with a sys.settrace callback that sets
f_trace_opcodes = True on every frame, so the callback fires at every
bytecode instruction rather than every source line. At each opcode the thread
calls scheduler.wait_for_turn(), which blocks until the schedule says it’s
that thread’s turn. Only user code is traced — stdlib and threading internals
are skipped.
Because the scheduler controls which thread runs each opcode, any blocking call
that happens in C code (like threading.Lock.acquire()) would deadlock — the
blocked thread holds a scheduler turn but can’t make progress. To prevent this,
all standard threading and queue primitives (Lock, RLock,
Semaphore, BoundedSemaphore, Event, Condition, Queue,
LifoQueue, PriorityQueue) are monkey-patched with cooperative versions
that spin-yield via the scheduler instead of blocking. The patching is scoped
to each test run: primitives are replaced before setup() and restored
afterwards.
explore_interleavings() does property-based exploration in the style of
Hypothesis: it generates random
opcode-level schedules and checks that an invariant holds under each one,
returning any counterexample schedule.
from frontrun.bytecode import explore_interleavings
class Counter:
def __init__(self, value=0):
self.value = value
def increment(self):
temp = self.value
self.value = temp + 1
def test_counter_is_atomic():
result = explore_interleavings(
setup=lambda: Counter(value=0),
threads=[
lambda c: c.increment(),
lambda c: c.increment(),
],
invariant=lambda c: c.value == 2,
max_attempts=200,
max_ops=200,
seed=42,
)
assert result.property_holds, result.explanation
explore_interleavings() often finds races very quickly — sometimes on the
first attempt — because even a single random schedule has a reasonable chance
of interleaving the critical section. It can also catch races that are invisible
to DPOR: if a C extension mutates shared state without any I/O (e.g.
in-process C-level mutations), bytecode exploration may stumble into the
bad interleaving through random scheduling even though neither tool can
see the C-level conflict directly.
The trade-off is interpretability. When a race is found, result.explanation
contains an interleaved source-line trace and a best-effort conflict
classification, but the bytecode explorer doesn’t know why the interleaving
matters the way DPOR does. The reproduce_on_failure parameter (default 10)
controls how many times the counterexample schedule is replayed to measure
reproducibility.
Controlled Interleaving (Internal/Advanced):
The controlled_interleaving context manager and run_with_schedule function allow
running threads under a specific opcode-level schedule. These are primarily intended for
debugging this library or building tooling on top of it, rather than for general use in tests.
Note
Opcode-level schedules are not stable across Python versions. CPython does not guarantee
that the same source code will compile to the same bytecode between minor releases, so a
specific schedule that reproduces a race on Python 3.12 may not reproduce the same
interleaving on 3.13. Counterexample schedules returned by explore_interleavings
are likewise best treated as ephemeral debugging artifacts rather than long-lived test fixtures.
The async variant (frontrun.async_bytecode) uses await_point() markers rather
than opcodes, so its schedules are stable — see that module for details.
Trace Markers¶
Trace Markers use lightweight comment-based markers to define synchronization points in your code, requiring no semantic code changes.
How It Works:
Each thread is run with a sys.settrace callback that fires on every source
line. The callback scans each line for # frontrun: <name> comments using a
MarkerRegistry that caches marker locations per file. When a marker is hit,
the thread calls ThreadCoordinator.wait_for_turn() which blocks until the
schedule says it’s that thread’s turn to proceed past that marker. This gives
deterministic control over the order threads reach each synchronization point,
without changing any executable code — markers are just comments.
A marker gates the code that follows it. Name markers after the operation
they gate (e.g. read_value, write_balance) rather than with temporal
prefixes like before_ or after_.
from frontrun.common import Schedule, Step
from frontrun.trace_markers import TraceExecutor
class Counter:
def __init__(self):
self.value = 0
def increment(self):
temp = self.value # frontrun: read_value
temp += 1
self.value = temp # frontrun: write_value
Async Support:
Async trace markers use the same comment-based syntax. Each async task runs in
its own thread (via asyncio.run), with the same sys.settrace mechanism
controlling interleaving between tasks.
The synchronization contract:
A marker gates the next
awaitexpression (or the line it’s on if inline). When a task reaches a marker, it pauses until the scheduler grants it a turn. Only then does the gatedawaitexecute.Between two markers, the task runs without interruption from other scheduled tasks. Any intermediate
awaitcalls within that span complete normally.Because async code can only interleave at
awaitpoints, markers should be placed to gate theawaitexpressions whose ordering you want to control.
from frontrun.async_trace_markers import AsyncTraceExecutor
from frontrun.common import Schedule, Step
class AsyncCounter:
def __init__(self):
self.value = 0
async def get_count(self):
return self.value
async def set_count(self, value):
self.value = value
async def increment(self):
# frontrun: read_counter
current = await self.get_count()
# frontrun: write_counter
await self.set_count(current + 1)
Automatic I/O Detection¶
Both the bytecode explorer and DPOR automatically detect socket and file
I/O operations. This is enabled by default (detect_io=True) and works
by monkey-patching socket.socket methods and builtins.open to
report resource accesses to the scheduler.
Python-level detection (monkey-patching):
Sockets:
connect,send,sendall,sendto,recv,recv_into,recvfromFiles:
open()(read vs write determined by mode)
Resource identity is derived from the socket’s peer address
(host:port) or the file’s resolved path. Two threads accessing the
same endpoint or file are treated as conflicting; different endpoints are
independent.
C-Level I/O Interception¶
When run under the frontrun CLI, a native LD_PRELOAD library
(libfrontrun_io.so) intercepts libc I/O functions directly. This
covers opaque C extensions — database drivers (libpq, mysqlclient),
Redis clients, HTTP libraries, and anything else that calls libc’s
send(), recv(), read(), write(), etc.
Intercepted functions: connect, send, sendto, sendmsg,
write, writev, recv, recvfrom, recvmsg, read,
readv, close
The library maintains a process-global file-descriptor → resource map:
connect(fd, sockaddr{127.0.0.1:5432}, ...) → fd=7 → "socket:127.0.0.1:5432"
send(fd=7, ...) → report write to "socket:127.0.0.1:5432"
recv(fd=7, ...) → report read from "socket:127.0.0.1:5432"
close(fd=7) → remove fd=7 from map
Events are communicated to the Python side via a pipe
(FRONTRUN_IO_FD). An IOEventDispatcher reads the pipe on a
background thread and delivers events to registered listeners. When
DPOR is active, a _PreloadBridge listener routes events to the DPOR
engine for conflict analysis.
Building:
make build-io # builds and copies libfrontrun_io.so into the frontrun package
Usage:
frontrun pytest -vv tests/ # I/O interception + monkey-patching
frontrun python examples/orm_race.py # same, for scripts