API Design Requirements

Real-time AML APIs must balance competing demands: low latency, high throughput, reliability, and ease of integration. Get the API design wrong, and you'll either bottleneck transaction processing or compromise detection accuracy.

Core API Endpoints

1. Transaction Screening

POST /v1/transactions/screen

Synchronous risk scoring for real-time transaction approval/decline

Request:
{
  "transaction_id": "txn_abc123",
  "amount": 9850.00,
  "currency": "USD",
  "timestamp": "2025-09-03T14:23:45Z",
  "sender": {
    "entity_id": "ent_sender_456",
    "account_id": "acc_789"
  },
  "receiver": {
    "entity_id": "ent_receiver_321",
    "account_id": "acc_654"
  },
  "metadata": {
    "channel": "mobile",
    "ip_address": "192.0.2.1",
    "device_id": "dev_xyz"
  }
}

Response (< 100ms):
{
  "risk_score": 87,
  "risk_level": "HIGH",
  "decision": "REVIEW",
  "explanation": {
    "primary_factors": [
      "Amount just below $10K threshold (+32 points)",
      "Transaction velocity 12x higher than average (+25 points)",
      "New counterparty (+18 points)"
    ]
  },
  "case_id": "case_2025_09_00145"
}

2. Batch Screening

POST /v1/transactions/batch

Process multiple transactions asynchronously

Request:
{
  "transactions": [ /* array of 1-10,000 transactions */ ],
  "callback_url": "https://your-system.com/aml-results"
}

Immediate Response:
{
  "batch_id": "batch_2025_09_001",
  "status": "PROCESSING",
  "estimated_completion": "2025-09-03T14:28:00Z"
}

Callback (when complete):
{
  "batch_id": "batch_2025_09_001",
  "results": [ /* risk scores for each transaction */ ]
}

3. Entity Risk Lookup

GET /v1/entities/{entity_id}/risk

Retrieve current risk assessment for an entity

Response:
{
  "entity_id": "ent_456",
  "risk_score": 42,
  "risk_level": "MEDIUM",
  "factors": {
    "historical_sars": 1,
    "avg_transaction_size": 2500,
    "velocity_90d": 47,
    "network_centrality": 0.23
  },
  "last_updated": "2025-09-03T12:00:00Z"
}

Performance Optimization

Latency Targets

p50 latency: < 50ms
p95 latency: < 100ms
p99 latency: < 200ms
Timeout: 500ms (fail open with alert)

Caching Strategy

Entity Profiles: Redis cache, 5-minute TTL
Network Features: Pre-computed, updated hourly
Sanctions Lists: Cached locally, refreshed daily
Model Weights: Loaded in memory, hot-swapped on update

Circuit Breaker Pattern

Protect downstream services from cascading failures:

Circuit States:
1. CLOSED (normal): All requests processed
2. OPEN (failure): Requests fail fast, no processing
3. HALF_OPEN (recovery): Test requests to check if service recovered

Thresholds:
- Open circuit after 10 consecutive failures OR 50% error rate in 30s
- Half-open after 60s
- Close after 5 successful requests in half-open state

Reliability & Resilience

Graceful Degradation

If full ML pipeline fails, fall back to simpler models:

Primary: Full ensemble (GNN + LSTM + Isolation Forest)
Fallback 1: Simplified model (logistic regression) if GPU unavailable
Fallback 2: Rule-based scoring if ML infrastructure down
Fallback 3: Fail open with alert to compliance team

Idempotency

Clients can safely retry requests with same transaction_id:

API deduplicates based on transaction_id + timestamp
Return cached result if request already processed
Idempotency window: 24 hours

Rate Limiting & Throttling

Rate Limit Tiers

Basic: 1,000 requests/minute, burst 100
Standard: 10,000 requests/minute, burst 1,000
Enterprise: 100,000 requests/minute, burst 10,000
Unlimited: Custom, dedicated infrastructure

Rate limits enforced per API key using token bucket algorithm

Security

Authentication

API Keys: For server-to-server communication
OAuth 2.0: For user-facing applications
mTLS: Mutual TLS for high-security deployments

Data Protection

TLS 1.3: All data in transit encrypted
Field-Level Encryption: Sensitive PII encrypted in requests/responses
Audit Logging: All API calls logged with retention policy
Data Residency: Regional endpoints for GDPR compliance

Monitoring & Observability

Comprehensive metrics exposed for monitoring:

Performance Metrics

• Request latency (p50, p95, p99)
• Throughput (requests/sec)
• Error rate (4xx, 5xx)
• Cache hit rate

Business Metrics

• Risk score distribution
• Alert generation rate
• Model version in use
• Fallback activation count

Distributed Tracing

End-to-end request tracking with OpenTelemetry:

API Gateway → Feature Store → ML Inference → Response
Identify bottlenecks and slow dependencies
Track requests across microservices

API Versioning

Multiple API versions supported concurrently:

URL-based: /v1/, /v2/ in path
Deprecation Policy: 12-month notice before sunset
Breaking Changes: Only in major versions
Backward Compatibility: Maintained within major version

SDK & Client Libraries

Official SDKs for common languages:

Python SDK Example

from nerous import AMLClient

client = AMLClient(api_key="your_api_key")

# Screen single transaction
result = client.screen_transaction({
    "transaction_id": "txn_123",
    "amount": 9850,
    "currency": "USD",
    "sender": {"entity_id": "ent_456"},
    "receiver": {"entity_id": "ent_789"}
})

print(f"Risk Score: {result.risk_score}")
print(f"Decision: {result.decision}")

# Batch screening
batch = client.screen_batch(transactions)
batch.wait_for_completion()
results = batch.get_results()

Testing & Development

Sandbox Environment

Test API Keys: No charges, limited rate limits
Synthetic Data: Pre-loaded test scenarios
Mock Responses: Simulate different risk levels
Latency Simulation: Test timeout handling

Webhooks

Subscribe to events for asynchronous updates:

case.created: New high-risk case generated
case.updated: Analyst action on case
model.updated: New model version deployed
alert.triggered: Threshold breach notification

Integration Best Practices

Async Where Possible: Use batch API for non-blocking workflows
Implement Retries: Exponential backoff for transient failures
Handle Timeouts: Don't block user experience on slow responses
Cache Aggressively: Entity risk lookups can be cached locally
Monitor Closely: Track latency, errors, and fallback activations

Conclusion

API design for real-time AML is as much about operational excellence as technical capability. At nerous.ai—where our Finnish name reflects ingenuity and brilliance—we've built APIs that deliver sub-100ms latency at 100M+ requests/day while maintaining 99.99% uptime.

The result: seamless integration into transaction flows that enhances security without compromising user experience or system performance.

API Design for Real-Time AML Integration