SQLAlchemy Lost-Update Race Condition

This walkthrough demonstrates frontrun detecting a real race condition in SQLAlchemy ORM code running against Postgres. The full source is in examples/orm_race.py.

The bug

Any ORM that computes new column values in Python is vulnerable to lost updates. Two concurrent request handlers both read the same row, compute a new value from what they read, and write it back. When their transactions overlap, the second commit silently overwrites the first:

Handler A:  SELECT → login_count = 0
Handler B:  SELECT → login_count = 0
Handler A:  UPDATE SET login_count = 1, COMMIT   ← correct
Handler B:  UPDATE SET login_count = 1, COMMIT   ← stale! should be 2

The fix is to push the arithmetic into SQL (SET login_count = login_count + 1) or to serialize access with SELECT FOR UPDATE.

The model and the handler

A minimal SQLAlchemy model and the buggy handler that increments login_count in Python:

from sqlalchemy import String, create_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, Session, mapped_column

class Base(DeclarativeBase):
    pass

class User(Base):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(100))
    login_count: Mapped[int] = mapped_column(default=0)

engine = create_engine("postgresql:///frontrun_test")

The handler that every demo runs — one per thread, each with its own session:

def handle_login(engine):
    with Session(engine) as session:
        user = session.get(User, 1)          # SELECT
        user.login_count = user.login_count + 1  # Python-side increment
        session.commit()                      # UPDATE … SET login_count=<n>

Two concurrent calls should leave login_count == 2. The race makes it 1.

Demo 1 — Trace markers (deterministic)

Trace markers (# frontrun: orm_read / # frontrun: orm_write) on the session.get() and session.commit() lines let TraceExecutor force both SELECTs to run before either COMMIT:

from frontrun.common import Schedule, Step
from frontrun.trace_markers import TraceExecutor

def handler_a():
    with Session(engine) as session:
        user = session.get(User, 1)            # frontrun: orm_read
        user.login_count = user.login_count + 1
        session.commit()                        # frontrun: orm_write

schedule = Schedule([
    Step("a", "orm_read"),   # Handler A SELECTs (login_count=0)
    Step("b", "orm_read"),   # Handler B SELECTs (login_count=0, stale)
    Step("a", "orm_write"),  # Handler A COMMITs (login_count=1)
    Step("b", "orm_write"),  # Handler B COMMITs (login_count=1, should be 2!)
])

executor = TraceExecutor(schedule)
executor.run("a", handler_a)
executor.run("b", handler_b)
executor.wait(timeout=10.0)

Output:

======================================================================
Demo 1: SQLAlchemy lost update  (TraceExecutor — deterministic)
======================================================================

  Scenario:
    Two handlers each read login_count and increment it in Python.
    Expected final login_count: 2

  Final login_count: 1  (expected 2)

  LOST UPDATE confirmed: one increment was silently lost.
  Handler B's commit wrote login_count=1 based on a stale
  read, overwriting handler A's increment.

  Reproducibility: 100% — the Schedule deterministically forces
  both SELECTs to run before either UPDATE on every execution.

Demo 2 — Bytecode exploration (automatic)

explore_interleavings generates random opcode-level schedules, running both handlers against the real database on each attempt. No trace markers are needed — the explorer finds the bad interleaving on its own:

from frontrun.bytecode import explore_interleavings

class _State:
    def __init__(self):
        with Session(engine) as session:
            user = session.get(User, 1)
            user.login_count = 0
            session.commit()

def _thread_fn(_state):
    with Session(engine) as session:
        user = session.get(User, 1)
        user.login_count = user.login_count + 1
        session.commit()

result = explore_interleavings(
    setup=_State,
    threads=[_thread_fn, _thread_fn],
    invariant=lambda s: _read_count() == 2,
    max_attempts=50,
    seed=42,
    detect_io=True,    # C-level sockets detected via LD_PRELOAD
)

Output:

======================================================================
Demo 2: SQLAlchemy lost update  (bytecode exploration — automatic)
======================================================================

  Generating random opcode-level schedules and running both
  handlers against real Postgres.  Checking whether both
  increments persist after each interleaving.

  property_holds    : False
  attempts_explored : 1
  counterexample found after 1 attempt(s)  (5/5 reproductions)

  Race condition found after 1 interleavings.

    Lost update: threads 0 and 1 both read login_count before either wrote it back.


    Thread 1 | orm_race.py:238           user = session.get(User, 1)  [read .get]
    Thread 0 | orm_race.py:238           user = session.get(User, 1)  [read .get]
    Thread 0 | orm_race.py:240           user.login_count = user.login_count + 1  [read .login_count]
    Thread 1 | orm_race.py:240           user.login_count = user.login_count + 1  [read .login_count]
    Thread 0 | orm_race.py:240           user.login_count = user.login_count + 1  [write .login_count]
    Thread 1 | orm_race.py:240           user.login_count = user.login_count + 1  [write .login_count]
    Thread 1 | orm_race.py:241           session.commit()  [read .commit]
    Thread 0 | orm_race.py:241           session.commit()  [read .commit]

    Reproduced 5/5 times (100%)

  LOST UPDATE confirmed via bytecode exploration.

The explorer found the race on its very first attempt and reproduced it 5 out of 5 times. The trace shows the exact interleaving: both threads read login_count (via session.get) before either writes, so both compute 0 + 1 = 1.

Demo 3 — Naive threading (intermittent)

Plain threads against real Postgres, with a random 0–15 ms start offset modelling realistic request-arrival timing. The race reproduces only a fraction of the time — exactly the kind of flaky bug that slips through CI:

======================================================================
Demo 3: Naive threading + SQLAlchemy  (500 trials)
======================================================================

  Running both handlers in plain threads against real Postgres.
  Threads start with a random 0-15ms offset to model realistic
  request arrival timing.  Counting how often the race manifests...

  Trials:   500
  Failures: 23
  Rate:     4.6%

  The race manifested in 23/500 trials (4.6%).
  frontrun makes it 100% reproducible.

With ordinary threads the bug surfaces roughly 5–10 % of the time (the exact rate varies with system load). The two frontrun approaches above both catch it deterministically, every time.

Running the example yourself

# Build the virtualenv with SQLAlchemy + psycopg2, and the I/O library
make build-examples-3.14t build-io   # or build-examples-3.10

# Create the test database (if it doesn't exist)
createdb frontrun_test

# Run via the frontrun CLI for C-level I/O interception
frontrun .venv-3.14t/bin/python examples/orm_race.py