Sharded media pipeline
Re-architected sequential video analysis into a sharded parallel pipeline
(receiver → analyzer → uploader) with per-channel ownership and file-level locking.
- Backlog: 60+ min → ~5 min
- Storage: 6.48 TB/day → 540 GB/day
- Continuous stream: 0.6 Gbps
Telemetry detection at scale
Designed a stateful streaming detection architecture for nationwide network telemetry
(~4 Tbps across a 100-node cluster) with session tracking and a Kafka rule engine.
- Botnet C2 identification: <60s
- Incidents: ~100–200/month → ~10K/month
Temporal deduplication window
Designed temporal deduplication for high-frequency RF streams (8K pkt/s across 1000 channels),
aggregating packets per frame and selecting the highest-confidence signal.
- DB writes reduced ~80%
- Accuracy improved 6×
Crash-safe task orchestration
Replaced a volatile Redis queue with a durable Postgres-backed task ledger, adding transactional task claiming
and semaphore-based rate limiting to prevent lost work on crashes.
- Parallel processing: 10-item batches → 100 concurrent tasks
- Eliminated queue-related data loss
Consistency under concurrency
Fixed real-time counter inconsistencies in high-traffic telecom monitoring by introducing queued aggregation
with atomic batched writes.
- ~3M events/day (peaks ~1M in 2 hours)
- Write frequency reduced 100×
- Deviation bounded to <100 during peak load
Split-brain prevention
Resolved a split-brain failure in an active-active control system by introducing external epoch-based leader
arbitration to ensure a single authoritative writer under network partitions.