Caliper: Quantitative ML Trading Platform

A modular quant platform for US equities plus Polymarket BTC market-making: unified FeatureSnapshot layer, simulation + evaluation engines, regime detection and allocation, cross-sectional ranking, wallet intelligence, and a Next.js Model Observatory — 17 sprints shipped through v2.7.0 (Jan–Apr 2026).

Highlights

  • Dual path: Alpaca equities + Polymarket CLOB/Binance features behind one strategy and risk surface
  • Sprints 15–17: regime/HRP allocation, 5-factor cross-sectional ranker + paper fleet, reward density + wallet KMeans + signal aggregation
  • 550+ pytest tests (Sprint 14 AC-9 probability suite still tracked as open)
  • Simulation + evaluation: determinism, fill-rate checks, regime partitions, baseline regret; API routes partially stubbed pending full DB wiring
  • Dashboard: 10+ explorer routes, FeatureSnapshot inspector, mobile sidebar — aligned with workflow-core extraction

Tech Stack

Tags

Overview

Caliper is a private, full-stack quantitative ML trading monorepo: Python services and shared packages for market data, features, backtest, execution, risk, ML (including probability_model), simulation, evaluation, regime allocation, cross-sectional ranking, and wallet intelligence — plus a Next.js 14 Model Observatory dashboard. Work from Jan–Apr 2026 delivered 17 sprints in main through v2.7.0, from core equity tooling through Polymarket, unified FeatureSnapshot features, simulation/evaluation, probability modeling, regime + HRP allocation, fleet strategies, and on-chain-informed signals.

Design priority is correctness and safety over raw PnL: paper mode by default, strict RiskManager gating, and observability for every automated decision.

Problem & Context

Most retail-facing algo stacks hide risk and model behavior. I wanted an end-to-end system that could:

  • Ingest and store time-series data efficiently (TimescaleDB hypertables).
  • Backtest with realistic slippage and commissions, plus walk-forward optimization.
  • Enforce layered automated risk (kill switch, circuit breaker, limits).
  • Integrate ML with confidence gating, drift detection, SHAP explainability, and human-in-the-loop approvals.
  • Extend to prediction-market execution without forking the risk story — via a shared FeatureSnapshot abstraction and the same allocator/risk path.

Constraints

  • Paper trading by default; live mode requires explicit env validation.
  • No secrets in git.env.example only; Doppler-style workflow for real keys.
  • All orders through RiskManager — no bypass of kill switch or circuit breaker.
  • Python 3.11 — some TA libraries target 3.12+; indicators implemented with pandas/numpy where needed.

Approach & Design Decisions

  • Monorepo (Python + Next.js): atomic schema and consumer changes; one Docker Compose for API + Timescale + Redis.
  • TimescaleDB for bars, pm.features, simulation/evaluation tables, and probability predictions (Alembic through revision 005).
  • BFF pattern: dashboard calls FastAPI; Vercel rewrites keep the backend URL off the client.
  • Adapter execution: BrokerClientAlpacaClient; Polymarket path uses session orchestration + PolymarketMMStrategy.
  • ML safety first: drift (PSI, KL, mean shift), ABSTAIN outputs, baselines/regret, and HITL before trusting production models.

Implementation Highlights

  • Equities: DataProviderPriceBar feature pipeline; event-driven backtest; OMS with client_order_id idempotency.
  • Polymarket (Sprint 10): Gamma/CLOB clients, fee engine, session orchestrator, quoting strategy, DB schema for orders/trades.
  • Sprints 11–12: UnifiedSignal, FeatureSnapshot (four feature families), CLOBSource + BinanceSource, FeatureBuilder + FeatureStore, GET /v1/features/{market_id}/latest|history.
  • Sprint 13: SimulatedOrderBook, ExecutionSimulator, FeeEngine, AdverseSelectionModel, ReplayEngine, SimulationRunner, SimulationValidator, evaluation metrics + regime matrix + baselines; /v1/simulation/* and /v1/evaluation/* (some responses still stub-backed until full DB wiring).
  • Sprint 14: probability_model — calibration, lead-lag tests, /v1/probability/* (AC-9 test wiring still open per project status).
  • Sprints 15–16: regime detection + HRP allocator (/v1/regime/*, /v1/allocation/*); cross-sectional 5-factor ranker, cooldown selection, four paper fleet strategies; dashboard overhaul.
  • Sprint 17: reward density, wallet intelligence (KMeans k=4), smart-money signals, composite aggregation with weight learning.

Results & Evaluation

  • 17 sprints shipped through v2.7.0; 550+ pytest tests in repo (per workflow-core portfolio extraction). SMA crossover backtest math verified on sample AAPL bars.
  • Polymarket bot: session orchestrator and quoting implemented; extended paper PnL validation still on the roadmap (no fabricated production metrics).
  • Simulation + evaluation: determinism, fill-rate, and regime test criteria exercised; some API responses still stub-backed until full DB integration.
  • Probability stack: library + migration + router merged; AC-9 and live DB reads called out as remaining work in source docs.
  • Roadmap: further live/paper validation and out-of-sample ML metrics depend on training runs — not claimed here.

Tradeoffs & Limitations

  • Simulation/evaluation/probability APIs: some routes still stub or mock until persisted runs/reports are fully read from pm.* tables.
  • Sprint 14 AC-9 (probability module test suite) not landed per quant ticket index.
  • No CI/CD in repo at last extraction; tests run locally.
  • Dashboard uses polling, not WebSockets.
  • Repo and detailed metrics stay private until a deliberate public-safe review.

Notes / Redactions

Private project: no live credentials, no real-money results, and no fabricated performance numbers in this case study.