5 minute read

Welcome to racecondition’s blog. Since this is the first post, let’s start with the idea behind the name: the race condition.

A race condition happens when the result of your program depends on the timing/order of concurrent operations that access shared state. If two things “race” to read/modify the same data, the outcome can change run-to-run—leading to flaky bugs, security issues, and production pain.

Below are two practical examples: one in Python (threads) and one on microcontrollers (ISR vs main loop). Each includes the bug and a fix.

Python: When i += 1 Isn’t Atomic

In CPython there’s a Global Interpreter Lock (GIL), but it does not make compound operations like x += 1 atomic. That statement expands to read → add → write, which can interleave across threads.

Python

The Python example below launches four threads that increment a shared counter concurrently. Because no synchronization is used, some increments are lost, and the final counter value is lower than expected.

import threading

N_THREADS = 4
N_INCREMENTS = 100_000

counter = 0  # shared state

def worker():
    global counter
    for _ in range(N_INCREMENTS):
        # Not atomic: read -> add -> write
        counter += 1

threads = [threading.Thread(target=worker) for _ in range(N_THREADS)]
[t.start() for t in threads]
[t.join() for t in threads]

print("Expected:", N_THREADS * N_INCREMENTS)
print("Actual  :", counter)  # Often less than expected

You’ll often see Actual != Expected, because increments get lost when threads interleave.

Fix 1: Use a Lock

The solution below uses threading.Lock to ensure that each thread acquires the lock before incrementing the counter, making the operation safe and preventing lost updates.

import threading

N_THREADS = 4
N_INCREMENTS = 100_000

counter = 0
lock = threading.Lock()

def worker():
    global counter
    for _ in range(N_INCREMENTS):
        with lock:            # critical section
            counter += 1

threads = [threading.Thread(target=worker) for _ in range(N_THREADS)]
[t.start() for t in threads]
[t.join() for t in threads]

print("Expected:", N_THREADS * N_INCREMENTS)
print("Actual  :", counter)  # Matches expected

Fix 2: Avoid shared mutability (Queues / Actors)

The solution below uses a queue as a safe way to record increments, ensuring data is accessed without race conditions.

import threading, queue

N_THREADS = 4
N_INCREMENTS = 100_000

q = queue.Queue()

def worker():
    for _ in range(N_INCREMENTS):
        q.put(1)  # no shared mutable counter in threads

threads = [threading.Thread(target=worker) for _ in range(N_THREADS)]
[t.start() for t in threads]
[t.join() for t in threads]

total = 0
while not q.empty():
    total += q.get()

print("Expected:", N_THREADS * N_INCREMENTS)
print("Actual  :", total)

Queues serialize access and remove the need for a shared counter entirely.

Key takeaways (Python)

x += 1 is not atomic.

Use threading.Lock, higher-level concurrency primitives (Queue, concurrent.futures), or design out shared state.

Embedded C (Microcontrollers): ISR vs Main Loop

On MCUs, a common race is between an interrupt service routine (ISR) and the main loop manipulating the same variable. Classic failure: a read-modify-write in main interleaves with an ISR update.

The bug: Lost events due to interleaving

Imagine an ISR increments a tick counter each millisecond. The main loop consumes pending ticks by decrementing the counter:

#include <stdint.h>
#include <stdbool.h>

volatile uint32_t tick_count = 0;  // updated in ISR

// Called by SysTick or a timer interrupt at 1 kHz
void SysTick_Handler(void) {
    tick_count++;   // producer
}

int main(void) {
    // init systick/timer...

    for (;;) {
        if (tick_count > 0) {  // read
            // --- RACE WINDOW ---
            // ISR could fire here: tick_count++ happens
            tick_count--;      // write (consume one tick)
            // -------------------
            // If an ISR runs between read and write, an increment can be lost.
            // Symptom: you "miss" ticks -> drift, timing jitter, or slow loops.
        }

        // ... do other work ...
    }
}

What goes wrong? The if (tick_count > 0) tick_count–; is a non-atomic read-modify-write. If the ISR fires between the if check and the decrement, you can decrement a value that already increased—effectively dropping an event.

Note: volatile only addresses visibility/optimization, not atomicity.

Fix 1: Make the decrement atomic (short critical section)

Disable the interrupt briefly around the read-modify-write. Keep it as short as possible.

#include <stdint.h>
#include <stdbool.h>

volatile uint32_t tick_count = 0;

void SysTick_Handler(void) {
    tick_count++;
}

static inline uint32_t irq_save(void) {
    // Cortex-M example:
    uint32_t primask;
    __asm volatile ("MRS %0, PRIMASK" : "=r" (primask) );
    __asm volatile ("CPSID i"); // disable IRQs
    return primask;
}

static inline void irq_restore(uint32_t primask) {
    __asm volatile ("MSR PRIMASK, %0" :: "r" (primask) );
}

int main(void) {
    for (;;) {
        uint32_t key = irq_save();          // enter critical section
        if (tick_count > 0) {
            tick_count--;                   // atomic now w.r.t. ISR
        }
        irq_restore(key);                   // exit critical section

        // ... rest of loop ...
    }
}

On AVR you’d use cli() / sei(). On some SDKs (STM32 HAL, ESP-IDF, Zephyr, FreeRTOS) there are helpers/macros for critical sections—prefer those.

Fix 2: Swap-and-drain (minimize IRQ mask time)

Grab the whole count at once atomically, then process without interrupts masked:

#include <stdint.h>

volatile uint32_t tick_count = 0;

void SysTick_Handler(void) {
    tick_count++;
}

int main(void) {
    for (;;) {
        // atomically copy and reset
        uint32_t key = irq_save();
        uint32_t pending = tick_count;
        tick_count = 0;
        irq_restore(key);

        // handle all pending ticks with interrupts enabled
        while (pending--) {
            // ... do 1-tick worth of work ...
        }
    }
}

This pattern reduces interrupt-off time, improving latency.

Fix 3: Use true atomics if available

On some toolchains/architectures you can use C11 atomics or compiler builtins:

#include <stdatomic.h>

_Atomic uint32_t tick_count = 0;

void SysTick_Handler(void) {
    atomic_fetch_add(&tick_count, 1);
}

int main(void) {
    for (;;) {
        // Decrement only if positive, atomically:
        uint32_t old = atomic_load(&tick_count);
        while (old > 0 && 
               !atomic_compare_exchange_weak(&tick_count, &old, old - 1)) {
            // old reloaded by the CAS loop
        }
        if (old > 0) {
            // consumed one tick
        }
    }
}

Support for lock-free atomics on small MCUs varies; check your compiler and core.

Key takeaways (MCU)

ISRs and main must not do unsynchronized read-modify-write on shared data.

volatile is necessary for visibility, but not sufficient—you also need atomicity.

Use brief critical sections, swap-and-drain, or C11 atomics where supported.

Design Patterns That Prevent Races

Protect shared state with locks/critical sections or true atomics.

Prefer message passing (queues/mailboxes): ISRs push events; main loop drains them.

Immutable data and ownership transfer reduce shared mutable state.

Time-bounded critical sections: keep interrupts masked for as little as possible.

Why This Blog Exists

I named this place racecondition.blog because modern systems—from Python services to bare-metal firmware—are full of tiny, invisible “races.” Spotting and fixing them is a superpower. Here you’ll find hands-on posts spanning software, embedded, robotics, and edge AI, always with real code and reproducible patterns.

Categories:

Updated: