Overview
Caliper is a private, full-stack quantitative ML trading monorepo: Python services and shared packages for market data, features, backtest, execution, risk, ML (including probability_model), simulation, evaluation, regime allocation, cross-sectional ranking, and wallet intelligence — plus a Next.js 14 Model Observatory dashboard. Work from Jan–Apr 2026 delivered 17 sprints in main through v2.7.0, from core equity tooling through Polymarket, unified FeatureSnapshot features, simulation/evaluation, probability modeling, regime + HRP allocation, fleet strategies, and on-chain-informed signals.
Design priority is correctness and safety over raw PnL: paper mode by default, strict RiskManager gating, and observability for every automated decision.
Problem & Context
Most retail-facing algo stacks hide risk and model behavior. I wanted an end-to-end system that could:
- Ingest and store time-series data efficiently (TimescaleDB hypertables).
- Backtest with realistic slippage and commissions, plus walk-forward optimization.
- Enforce layered automated risk (kill switch, circuit breaker, limits).
- Integrate ML with confidence gating, drift detection, SHAP explainability, and human-in-the-loop approvals.
- Extend to prediction-market execution without forking the risk story — via a shared
FeatureSnapshotabstraction and the same allocator/risk path.
Constraints
- Paper trading by default; live mode requires explicit env validation.
- No secrets in git —
.env.exampleonly; Doppler-style workflow for real keys. - All orders through
RiskManager— no bypass of kill switch or circuit breaker. - Python 3.11 — some TA libraries target 3.12+; indicators implemented with pandas/numpy where needed.
Approach & Design Decisions
- Monorepo (Python + Next.js): atomic schema and consumer changes; one Docker Compose for API + Timescale + Redis.
- TimescaleDB for bars,
pm.features, simulation/evaluation tables, and probability predictions (Alembic through revision005). - BFF pattern: dashboard calls FastAPI; Vercel rewrites keep the backend URL off the client.
- Adapter execution:
BrokerClient→AlpacaClient; Polymarket path uses session orchestration +PolymarketMMStrategy. - ML safety first: drift (PSI, KL, mean shift), ABSTAIN outputs, baselines/regret, and HITL before trusting production models.
Implementation Highlights
- Equities:
DataProvider→PriceBarfeature pipeline; event-driven backtest; OMS withclient_order_ididempotency. - Polymarket (Sprint 10): Gamma/CLOB clients, fee engine, session orchestrator, quoting strategy, DB schema for orders/trades.
- Sprints 11–12:
UnifiedSignal,FeatureSnapshot(four feature families),CLOBSource+BinanceSource,FeatureBuilder+FeatureStore,GET /v1/features/{market_id}/latest|history. - Sprint 13:
SimulatedOrderBook,ExecutionSimulator,FeeEngine,AdverseSelectionModel,ReplayEngine,SimulationRunner,SimulationValidator, evaluation metrics + regime matrix + baselines;/v1/simulation/*and/v1/evaluation/*(some responses still stub-backed until full DB wiring). - Sprint 14:
probability_model— calibration, lead-lag tests,/v1/probability/*(AC-9 test wiring still open per project status). - Sprints 15–16: regime detection + HRP allocator (
/v1/regime/*,/v1/allocation/*); cross-sectional 5-factor ranker, cooldown selection, four paper fleet strategies; dashboard overhaul. - Sprint 17: reward density, wallet intelligence (KMeans k=4), smart-money signals, composite aggregation with weight learning.
Results & Evaluation
- 17 sprints shipped through v2.7.0; 550+ pytest tests in repo (per workflow-core portfolio extraction). SMA crossover backtest math verified on sample AAPL bars.
- Polymarket bot: session orchestrator and quoting implemented; extended paper PnL validation still on the roadmap (no fabricated production metrics).
- Simulation + evaluation: determinism, fill-rate, and regime test criteria exercised; some API responses still stub-backed until full DB integration.
- Probability stack: library + migration + router merged; AC-9 and live DB reads called out as remaining work in source docs.
- Roadmap: further live/paper validation and out-of-sample ML metrics depend on training runs — not claimed here.
Tradeoffs & Limitations
- Simulation/evaluation/probability APIs: some routes still stub or mock until persisted runs/reports are fully read from
pm.*tables. - Sprint 14 AC-9 (probability module test suite) not landed per quant ticket index.
- No CI/CD in repo at last extraction; tests run locally.
- Dashboard uses polling, not WebSockets.
- Repo and detailed metrics stay private until a deliberate public-safe review.
Notes / Redactions
Private project: no live credentials, no real-money results, and no fabricated performance numbers in this case study.