← Back to Resources
Sep 10, 2025 · 11 min read · Dr. Sarah Chen

Time Series Analysis for Transaction Monitoring

Using LSTM and Transformer models to detect temporal patterns and anomalies in transaction sequences, identifying money laundering schemes that evolve over time.

The Temporal Dimension of Money Laundering

Money laundering is inherently sequential. Placement, layering, and integration occur over time, often spanning days or weeks. Traditional AML systems analyze transactions in isolation, missing patterns that only emerge when viewing the temporal sequence. Time series models address this blind spot.

Why Time Series Models?

Certain money laundering patterns are fundamentally temporal:

  • Structuring Over Time: Small deposits made daily to stay below thresholds
  • Layering Sequences: Funds moved through accounts in specific order
  • Velocity Changes: Sudden bursts of activity after dormancy
  • Cyclic Patterns: Repeating schemes with predictable timing
  • Sequential Dependencies: Transaction N depends on transaction N-1

LSTM for Sequence Modeling

Long Short-Term Memory networks excel at learning patterns in sequences. They maintain internal state that captures long-term dependencies.

Architecture

Our LSTM Configuration

Input: Last 180 days of transactions (variable length)
↓
Embedding Layer: Convert transaction features to 128-dim vectors
↓
LSTM Layer 1: 256 hidden units, return sequences
↓
Dropout: 0.3
↓
LSTM Layer 2: 128 hidden units, return sequences
↓
Dropout: 0.3
↓
LSTM Layer 3: 64 hidden units, return final state
↓
Dense Layer: 32 units, ReLU
↓
Output Layer: Risk score (0-1)

What LSTM Learns

  • Normal Sequences: Typical transaction ordering for different entity types
  • Anomalous Patterns: Deviations from learned sequences
  • Temporal Dependencies: How current transaction relates to previous ones
  • Long-Range Effects: Events weeks apart that are connected

Example: Structuring Detection

Sequence: Customer makes deposits of $9,800, $9,900, $9,700, $9,850 over 4 days

LSTM Analysis:

  • • Recognizes amounts consistently just below $10K threshold
  • • Detects regular timing (daily pattern)
  • • Compares to customer's historical sequence (normally 1-2 deposits/month)
  • • Flags entire sequence as high-risk structuring

Transformer Models for Transactions

Transformers use attention mechanisms to capture relationships between any two points in a sequence, not just adjacent ones.

Advantages Over LSTM

  • Parallel Processing: Transformers process entire sequence simultaneously (faster training)
  • Long-Range Dependencies: Attention can connect transactions months apart
  • Interpretability: Attention weights show which past transactions influenced current prediction

Transformer Architecture

Input: Transaction sequence (up to 512 transactions)
↓
Positional Encoding: Add temporal position information
↓
Multi-Head Attention: 8 attention heads, 128-dim each
↓
Feed-Forward Network: 512 hidden units
↓
Layer Norm + Residual
↓
[Repeat above block 6 times]
↓
Global Average Pooling
↓
Classification Head: Risk score

Attention Visualization

When analyzing Transaction T, the model pays attention to:

  • • Transaction T-1 (yesterday): 0.42 weight - immediate predecessor
  • • Transaction T-7 (last week): 0.28 weight - similar amount and counterparty
  • • Transaction T-30 (last month): 0.18 weight - start of suspicious pattern
  • • Other transactions: 0.12 weight combined

Feature Engineering for Time Series

What features do we feed into these models?

Transaction-Level Features

  • Amount: Log-transformed, normalized
  • Transaction Type: Deposit, withdrawal, transfer (one-hot encoded)
  • Counterparty: Hashed entity ID
  • Time of Day: Hour (0-23), business hours flag
  • Day of Week: Cyclic encoding (sin/cos)

Sequence-Level Features

  • Time Gaps: Seconds between transactions
  • Cumulative Amounts: Running total over window
  • Velocity: Transaction count in past 24h, 7d, 30d
  • Direction Changes: Deposits to withdrawals ratio
  • Counterparty Diversity: Unique entities in sequence

Anomaly Detection in Sequences

LSTM Autoencoders learn to reconstruct normal transaction sequences. High reconstruction error indicates anomalous patterns.

LSTM Autoencoder

Encoder: Transaction sequence → Compressed representation (32-dim)
Decoder: Compressed representation → Reconstructed sequence

Training: Minimize reconstruction error on normal sequences
Inference: High error = anomalous sequence

Example:
Normal sequence reconstruction error: 0.03
Anomalous sequence error: 0.47 → FLAG

Real-World Use Cases

Use Case 1: Layering Detection

Pattern: Funds deposited → immediately split across 5 accounts → consolidated in offshore account 3 days later

Detection: LSTM recognizes unusual split-consolidate temporal pattern

Result: Flagged $2.3M layering scheme, 7 linked accounts identified

Use Case 2: Dormant Account Reactivation

Pattern: Account inactive for 2 years → suddenly receives $500K → funds dispersed in 3 days

Detection: Transformer attention mechanism identifies abrupt change in account behavior

Result: Detected money mule account takeover

Training Strategies

Handling Variable-Length Sequences

  • Padding: Pad short sequences to max length
  • Masking: Mask padded positions so model ignores them
  • Bucketing: Group sequences by similar length for efficient batching

Addressing Class Imbalance

  • Oversampling: Replicate rare suspicious sequences
  • SMOTE for Sequences: Generate synthetic suspicious sequences
  • Weighted Loss: Penalize false negatives more heavily
  • Focal Loss: Focus learning on hard-to-classify sequences

Performance Metrics

LSTM Model

  • • Sequence length: up to 180 days
  • • Training time: 12 hours (100M sequences)
  • • Inference: 15ms per sequence
  • • AUC-ROC: 0.96

Transformer Model

  • • Sequence length: up to 512 transactions
  • • Training time: 8 hours (GPU parallelization)
  • • Inference: 25ms per sequence
  • • AUC-ROC: 0.97

Ensemble with Other Models

Time series models work best in combination with other approaches:

  • LSTM: Captures temporal patterns in individual entity sequences
  • GNN: Analyzes network structure and entity relationships
  • Isolation Forest: Detects statistical outliers in aggregated features
  • Rule-Based: Catches known typologies

Final risk score: weighted ensemble of all models, optimized for maximum precision at 95% recall.

Implementation Considerations

  • Computational Cost: LSTMs and Transformers require GPUs for real-time inference at scale
  • Data Storage: Need to store full transaction history (180 days+)
  • Retraining Frequency: Monthly retraining as patterns evolve
  • Explainability: Attention weights and sequence visualizations for analysts

Conclusion

Money laundering is a sequential process, and time series models capture this temporal dimension that traditional approaches miss. At nerous.ai—where our name embodies the ingenuity and brilliance of Finnish innovation—we've deployed LSTM and Transformer models that detect sophisticated schemes evolving over weeks or months.

The result: 40% improvement in detecting layering schemes, 60% reduction in false positives for velocity-based rules, and analyst tools that visualize exactly how patterns evolved over time.

👨‍💻

Dr. Sarah Chen

Chief AI Scientist at nerous.ai

Sarah leads ML research at nerous.ai, specializing in time series models and sequential pattern detection for financial crime prevention.

Detect Temporal Patterns with AI

See how time series models catch schemes that evolve over weeks and months.

Schedule Demo →