ARGUS
2024Backend Engineer
A production-grade, multi-tenant behavioral and system event ingestion platform. Built to handle high-velocity telemetry data — page views, clicks, API latency, health metrics — via a fully reactive, non-blocking pipeline. Inspired by the architecture of PostHog and Datadog.
KEY METRIC
8,000–15,000 events/sec on a single 4-vCPU VM
Stack
Overview
ARGUS is a real-time behavioral analytics platform built around a strict decoupled producer-consumer model. The ingestion edge uses Spring WebFlux (Netty) to accept events and immediately publish them to Kafka — the HTTP thread returns 202 Accepted before any database write occurs.
A Kafka Streams processor handles routing, aggregation, and storage. Valid events increment live Redis counters for real-time dashboards and are upserted into TimescaleDB hypertables for time-series analytics. Malformed payloads are caught and routed to a Dead Letter Queue, preventing crash loops on the primary topology.
Rate limiting is enforced via an atomic Redis Lua script operating concurrently at both IP level (50 req/sec) and API key level (5,000 req/sec), with zero race conditions under load. The entire stack is containerized and deployed on Microsoft Azure.
Architecture
System architecture overview
Technical Challenges
Bridging Reactive and Blocking Paradigms
Spring WebFlux runs on a tiny pool of Netty event-loop threads. Calling a standard JPA repository on those threads freezes the server. I solved this by wrapping all RDBMS lookups in Mono.fromCallable().subscribeOn(Schedulers.boundedElastic()), isolating blocking calls to a dedicated thread pool while preserving the reactive pipeline.
Atomic Rate Limiting Under Concurrency
Doing GET → calculate → SET via standard Redis calls creates a race condition window where hundreds of requests slip through simultaneously. I pushed the entire token-bucket logic into a single Lua script executed atomically by Redis — since Redis is single-threaded, the script is guaranteed to run serially, eliminating the race entirely.
Kafka Poison Pills
When Kafka Streams receives a byte array it cannot deserialize, the default behavior is a fatal exception and an infinite crash loop. I pre-filtered the stream with a non-typed branch — failed deserializations are gracefully piped to a DLQ topic while the primary analytical topology continues uninterrupted.