The problem with blocking threads in a reactive world — lessons from ARGUS

20258 min read

When I started building ARGUS, I made a classic mistake. I had Spring WebFlux running on Netty — a non-blocking HTTP server that handles thousands of concurrent connections on a handful of threads — and I was calling JPA repositories directly on those threads. The app would work fine in development and collapse under any real load.

Understanding why requires understanding what "non-blocking" actually means.

Netty's threading model

Netty uses an event loop with a small, fixed pool of threads — typically one per CPU core. Each thread is responsible for handling I/O events for many connections simultaneously. This is what makes reactive servers efficient. A single thread can juggle thousands of concurrent HTTP connections because it never blocks waiting for any one of them.

The contract is simple: never block an event-loop thread. If you block — even for 10ms waiting for a database query — that thread can't serve any of the other connections it's responsible for. Multiply that by concurrent requests and your server grinds to a halt.

The JPA problem

JPA and JDBC are inherently blocking. When you call `apiKeyRepository.findByKeyValue(key)`, that call goes out to PostgreSQL and waits. On a traditional servlet server this is fine — each request has its own thread, and blocking it only affects that one request. On Netty, you've just stalled a thread that was serving potentially hundreds of connections.

The fix

The solution is thread isolation. Push the blocking call off the event-loop thread and onto a separate pool designed for blocking I/O:

return Mono.fromCallable(() -> apiKeyRepository.findByKeyValue(key))
    .subscribeOn(Schedulers.boundedElastic())
    .map(apiKey -> validateAndEnrich(apiKey));

`Schedulers.boundedElastic()` is a thread pool specifically for blocking operations. The event-loop thread kicks off the work and moves on. When the database responds, the result is handed back to the reactive pipeline.

The deeper lesson

This problem exists at the boundary between two concurrency models. Reactive programming assumes non-blocking I/O throughout. The moment you mix in a blocking call without isolation, you've broken the contract. The system still compiles and runs — it just falls over at load.

The principle applies beyond WebFlux. Any time you're crossing a concurrency boundary — reactive to blocking, async to sync, event-loop to thread-per-request — you need to explicitly manage the handoff. Don't let the framework hide it from you.

Written by Basit Tijani. Find me on GitHub or LinkedIn.