Phalanx
2026Distributed Systems Engineer
A high-availability distributed consensus engine based on the Raft protocol, built from scratch in Go. Features a 5-node global mesh with double-fault tolerance, specialized production tuning for cross-continental latency, and a single-threaded event loop for lock-free concurrency.
KEY METRIC
5-node global mesh, double-fault tolerance (Quorum=3), <2ms p99 write latency
Stack
Overview
Phalanx is a production-grade implementation of the Raft consensus algorithm, engineered to provide a reliable 'source of truth' across a globally distributed cluster. The engine ensures that all nodes agree on a linearized log of operations, even in the face of network partitions or hardware failures.
The system utilizes a custom gossip-based peer discovery mechanism over Fly.io's private IPv6 network and features a "terminal-noir" technical manual for operational transparency. It implements Pre-Vote extensions (§9.6), lease-based linearizable reads, and a deterministic, tick-based state machine orchestrated through a single-threaded event loop.
Architecture
phalanx global mesh — 5-node consensus cluster across 5 continents
Technical Challenges
Taming Cross-Continental Latency
Standard Raft timeouts (100ms) cause 'election flapping' when nodes are spread across continents due to speed-of-light constraints. I recalibrated the state machine with a 200ms tick and a randomized 4-8s election window, absorbing global RTT spikes without compromising cluster stability.
Dynamic Peer Discovery in Mesh Networks
Static IP configurations are fragile in cloud environments. I engineered a startup sequence that resolves peers via Fly.io's internal AAAA DNS records, allowing nodes to dynamically join the cluster and exchange gossip seeds regardless of their geographic region.
Proving Double-Fault Tolerance
Theoretical reliability doesn't guarantee production safety. I built a 'Chaos Mesh' runner that randomly terminates up to 2 nodes simultaneously while under heavy write load. Phalanx successfully maintained a Quorum of 3, proving it can survive the loss of two entire global regions without data corruption.