← back

ARGUS

2024

Backend Engineer

A production-grade, multi-tenant behavioral and system event ingestion platform. Built to handle high-velocity telemetry data — page views, clicks, API latency, health metrics — via a fully reactive, non-blocking pipeline. Inspired by the architecture of PostHog and Datadog.

KEY METRIC

8,000–15,000 events/sec on a single 4-vCPU VM

Stack

Java 21Spring WebFluxApache KafkaKafka StreamsTimescaleDBRedisDockerAzurePrometheus

Overview

ARGUS is a real-time behavioral analytics platform built around a strict decoupled producer-consumer model. The ingestion edge uses Spring WebFlux (Netty) to accept events and immediately publish them to Kafka — the HTTP thread returns 202 Accepted before any database write occurs.

A Kafka Streams processor handles routing, aggregation, and storage. Valid events increment live Redis counters for real-time dashboards and are upserted into TimescaleDB hypertables for time-series analytics. Malformed payloads are caught and routed to a Dead Letter Queue, preventing crash loops on the primary topology.

Rate limiting is enforced via an atomic Redis Lua script operating concurrently at both IP level (50 req/sec) and API key level (5,000 req/sec), with zero race conditions under load. The entire stack is containerized and deployed on Microsoft Azure.

Architecture

Client / SDKSpring WebFlux(Netty)RedisRate LimiterApache Kafkaraw-eventsDLQraw-events-dlqKafka StreamsProcessorRedisLive CountersTimescaleDBHypertables

System architecture overview

Technical Challenges

Bridging Reactive and Blocking Paradigms

Spring WebFlux runs on a tiny pool of Netty event-loop threads. Calling a standard JPA repository on those threads freezes the server. I solved this by wrapping all RDBMS lookups in Mono.fromCallable().subscribeOn(Schedulers.boundedElastic()), isolating blocking calls to a dedicated thread pool while preserving the reactive pipeline.

Atomic Rate Limiting Under Concurrency

Doing GET → calculate → SET via standard Redis calls creates a race condition window where hundreds of requests slip through simultaneously. I pushed the entire token-bucket logic into a single Lua script executed atomically by Redis — since Redis is single-threaded, the script is guaranteed to run serially, eliminating the race entirely.

Kafka Poison Pills

When Kafka Streams receives a byte array it cannot deserialize, the default behavior is a fatal exception and an infinite crash loop. I pre-filtered the stream with a non-typed branch — failed deserializations are gracefully piped to a DLQ topic while the primary analytical topology continues uninterrupted.

View on GitHub ↗Live Demo ↗