Skip to content

Memory Budget

Global decode memory budget that prevents OOM from concurrent large image decodes by tracking aggregate memory consumption across all in-flight decode operations.

Why

The thread pool controls CPU concurrency but has no awareness of memory. Multiple concurrent full-resolution JP2 decodes (e.g., 20000x30000 pixels) can each consume 1-2GB, exhausting a 4GB container. The per-request pixel limit caps individual requests and the per-client rate limiter throttles per-client throughput, but neither prevents aggregate memory exhaustion from multiple legitimate clients requesting large images simultaneously.

How It Works

  1. Precise estimation from IIIF parameters: Before each decode, the actual decode buffer size is computed from IIIF region/size parameters. For JP2, this accounts for DWT reduce levels and ROI restrictions — a tile request on a 20000x30000 source estimates ~4MB, not 4.8GB.

  2. Pipeline-aware peak estimation: Walks the processing stages (decode → scale → rotate → ICC convert) and returns the maximum concurrent allocation at any point, accounting for 2-stage downscale intermediates and rotation expansion.

  3. Lock-free accounting: Uses std::atomic<size_t> with compare-exchange for zero-contention acquire/release. Budget check adds nanoseconds vs. millisecond decode times.

  4. RAII release: MemoryBudgetGuard releases budget on all exit paths including exceptions. No manual cleanup needed.

Configuration

Parameter Default (binary) Default (ops-deploy) Description
max_decode_memory "0" (auto) "0" (auto) Budget in bytes. 0 = auto-detect (75% of container memory). Accepts M/G suffixes: "2G", "500M"
decode_memory_mode "off" "monitor" "off", "monitor" (log only), "enforce" (HTTP 503)

All parameters available via: - Lua config: max_decode_memory, decode_memory_mode - CLI flags: --max-decode-memory, --decode-memory-mode - Environment: SIPI_MAX_DECODE_MEMORY, SIPI_DECODE_MEMORY_MODE

Auto-Detection

When max_decode_memory = "0" (default), the budget is set to 75% of detected memory: 1. cgroups v2: /sys/fs/cgroup/memory.max 2. cgroups v1: /sys/fs/cgroup/memory/memory.limit_in_bytes 3. Linux fallback: /proc/meminfo MemTotal 4. macOS: sysctl hw.memsize 5. Fallback: 1 GB if detection fails

The 25% headroom covers kernel buffers, Sipi heap, cache, Lua, and thread stacks.

Monitor to Enforce Workflow

  1. Deploy in monitor mode (default in ops-deploy):
  2. Budget is tracked and logged but requests are never rejected
  3. sipi_decode_memory_decisions_total{action="shadow_rejected"} shows what would be blocked

  4. Observe metrics (1-2 weeks):

  5. Budget utilization: sipi_decode_memory_used_bytes / sipi_decode_memory_budget_bytes — should be < 0.8 normally
  6. Shadow rejection rate: rate(sipi_decode_memory_decisions_total{action="shadow_rejected"}[5m])
  7. Request size distribution: histogram_quantile(0.99, sipi_decode_memory_estimate_bytes)

  8. Tune budget if needed:

  9. If shadow rejections are frequent on normal tile traffic, budget is too low
  10. Use the histogram to understand what size requests are being served

  11. Switch to enforce: Set SIPI_DECODE_MEMORY_MODE=enforce (or DSP_IIIF_DECODE_MEMORY_MODE=enforce in ops-deploy). Redeploy.

Prometheus Metrics

Metric Type Labels Description
sipi_decode_memory_budget_bytes Gauge Configured budget (set once at startup)
sipi_decode_memory_used_bytes Gauge Currently allocated to in-flight decodes
sipi_decode_memory_decisions_total Counter action acquired, rejected, shadow_rejected
sipi_decode_memory_near_limit_total Counter Acquisitions where usage > 80% of budget
sipi_decode_memory_estimate_bytes Histogram Per-request peak memory estimates

Operational Dashboards

# Budget utilization (should be < 0.8)
sipi_decode_memory_used_bytes / sipi_decode_memory_budget_bytes

# Rejection rate (should be 0 under normal load)
rate(sipi_decode_memory_decisions_total{action="rejected"}[5m])

# Early warning (budget getting tight)
rate(sipi_decode_memory_near_limit_total[5m])

# Largest 1% of requests
histogram_quantile(0.99, rate(sipi_decode_memory_estimate_bytes_bucket[5m]))

Traffic Patterns

Request Type Typical Estimate Budget Impact
Tile (256x256) < 1 MB Negligible — passes instantly
Thumbnail (/full/,128/) < 100 KB Negligible
Medium (/full/,2000/) 50-120 MB Moderate
Full resolution (/full/max/) 1-5 GB High — budget limits concurrency
Full + rotation (/full/max/90/) 2-10 GB Very high

Troubleshooting

Budget seems too restrictive (503s on normal traffic): - Check histogram_quantile(0.5, sipi_decode_memory_estimate_bytes) — median should be < 1MB for tile traffic - If median is high, check for clients not using tiles (direct /full/max/ requests) - Increase budget or add more container memory

OOM despite budget enabled: - Check mode is enforce, not monitor - Check sipi_decode_memory_budget_bytes matches expected container memory - Memory outside decode pipeline (cache, Lua, HTTP buffers) is not budgeted