Design a Payment System (Stripe / PayPal-class)
Idempotent payment intents, two-phase commit with PSPs, ledger-based double-entry accounting, fraud + chargeback flows, and PCI scope minimization.
Intro
A payment system mediates between merchants, processors (Visa/MC/Stripe), and end users. The core challenge is correctness — every cent must be accounted for, retries must never double-charge, and the system must survive partial failures across the merchant API, the bank, and the network. This is the canonical 'consistency at a distance' design.
Functional
- Create a payment intent (charge with currency + amount).
- Authorize + capture against a Payment Service Provider (PSP).
- Refund + chargeback flows.
- Reconciliation against PSP and bank statements.
Non-functional
- Availability ≥ 99.99% on charge endpoint.
- Idempotency on every mutating call (Idempotency-Key header).
- Strong consistency on the ledger; eventual on read replicas.
- PCI DSS — never store raw PAN; tokenise immediately.
Components
Payment API
Idempotent endpoints; orchestrates the saga.
Payment Intent saga
Pending → Authorized → Captured → Settled.
PSP adapter
Per-PSP integration (Stripe, Adyen, …).
Ledger
Double-entry accounting; append-only.
Token vault
PCI scope; stores card tokens only.
Fraud + risk
Real-time scoring; chargeback handling.
Reconciler
Daily batch against PSP + bank reports.
Trade-offs
2PC vs saga
Pros
- Saga: distributed-friendly, no global lock.
- 2PC: strong but blocks on coordinator.
Cons
- Saga: compensations can be complex.
- 2PC: doesn't scale across PSPs.
Synchronous vs asynchronous capture
Pros
- Sync: instant confirmation.
- Async: better availability.
Cons
- Sync: tied to PSP latency.
- Async: needs polling / webhook for finalization.
Scale concerns
- Idempotency keys must dedupe across regions.
- Ledger must never lose a row; partition by account_id.
- PSP outages — fall over to a secondary PSP if available.
- Chargebacks arrive days later — must be reconciled into the original payment.
Related reads
HLD
Design a Stock Exchange Matching Engine
Sub-millisecond matching engine: in-memory order book, deterministic ordering, FIX gateways, market-data fan-out, and replay-based DR.
HLD
Design a Distributed Job Scheduler
Cron-class scheduler at planet scale: leases, exactly-once-effect, retries with back-off, DAG dependencies, and at-most-once concurrency.