Wilma 2.0 - PEP Proxy Performance Analysis

Overview

What Wilma Does

Wilma is a Policy Enforcement Point (PEP) Proxy — a reverse proxy that sits between every HTTP client and every protected backend service. It validates authentication tokens and enforces authorization policies on every single request. If Wilma adds 5ms of latency, every API call in the entire platform pays that tax.

Client request → Extract token → Validate JWT → Check cache → Keyrock (cache miss) → AuthZForce (if Level 3) → Proxy to backend → Return response

Three security levels:

Level 1 — Authentication: Validate OAuth2 token (local JWT or Keyrock call). Forward with enriched headers (user ID, roles, organizations).
Level 2 — Basic Authorization: After authentication, check HTTP verb + resource path against role-based policies via a PDP (Keyrock, AuthZForce, OPA).
Level 3 — Advanced Authorization: Full XACML policy evaluation including request body inspection (entity IDs, attributes, types). Requires XML serialization.

Source: github.com/ging/fiware-pep-proxy (v8.3.0, MIT license)

Current Implementation

Node.js + Express — The Bottlenecks

Technology Stack

Component	Library	Problem
HTTP server	Express 4.x	~50μs per middleware layer, 6+ middlewares per request
JWT verification	`jsonwebtoken`	Synchronous crypto on the event loop — blocks all other requests
Token cache	`node-cache`	JS hash map, subject to GC pressure
HTTP proxy client	`got`	No connection pooling, full body buffering, new TCP connection per request
XACML handling	`xml2js` / `xml2json`	Full DOM parse for XML — heavy allocation
Body handling	Express middleware	`Buffer.concat()` copies all chunks before processing
Logging	`morgan` + `debug`	String formatting on every request
Clustering	`cluster.fork()`	N full Node.js processes = N × 100 MB RAM

The Hot Path — What Happens Per Request

// Step 1: Express middleware chain (~0.5-2ms)
app.use(bodyParser)     // Buffer.concat() all body chunks
app.use(cors)           // CORS header check
app.use(morgan)         // Log formatting

// Step 2: Token extraction (~0.01ms)
token = req.headers['authorization'].split(' ')[1]

// Step 3: JWT verification (~0.1-1ms) ⚠️ BLOCKS EVENT LOOP
jwt.verify(token, secret)  // synchronous RSA/HMAC crypto

// Step 4: Cache check (~0.01ms)
nodeCache.get(token)

// Step 5: Keyrock call on cache miss (~5-50ms) ⚠️ EXTERNAL I/O
got('http://keyrock:3000/user?access_token=...')

// Step 6: Forward to backend (~1-500ms) ⚠️ NEW TCP CONNECTION
got(PROXY_URL + req.url, { method, headers, body, retry: 0 })

Performance Numbers

Metric	Value	Notes
RAM (single instance)	~100 MB	Node.js V8 heap baseline
RAM (8-core cluster)	~800 MB	8 separate Node.js processes
Throughput (cache hit)	~5,000–15,000 req/s	Limited by Express + got proxy overhead
Throughput (cache miss)	~200–1,000 req/s	Limited by Keyrock HTTP round-trip
Throughput (Level 3)	~100–500 req/s	Two sequential HTTP round-trips + XML
p99 latency (cache hit)	~5–20ms	GC pauses + synchronous JWT crypto
Startup time	~1–2s	Node.js + module loading

The Three Killer Bottlenecks

1. Synchronous JWT on Event Loop

jsonwebtoken.verify() performs RSA or HMAC cryptography synchronously. During the ~0.1–1ms crypto operation, the event loop is frozen. Every other connection — reading, writing, proxying — stalls. Under load, this creates cascading latency spikes.

2. No Connection Pooling

The got HTTP client creates a new TCP connection for every proxied request. TCP handshake (~0.5ms local, ~50ms remote) + no keep-alive means the proxy overhead alone can exceed the backend's response time.

3. Full Body Buffering

Every request body is fully buffered in memory (Buffer.concat()) before processing begins. For large NGSI-LD payloads (batch entity creation), this means copying megabytes of data before a single byte is validated or forwarded. No streaming.

Wilma 2.0

C + fw-libs — The Rewrite

Architecture

// The entire PEP proxy in C
KhServer server;
khInit(&server, 8080, 0);

// All routes go through the same handler
khRegister(&server, KhGet,     "/**", pepHandler, true);
khRegister(&server, KhPost,    "/**", pepHandler, true);
khRegister(&server, KhPatch,   "/**", pepHandler, true);
khRegister(&server, KhDelete,  "/**", pepHandler, true);

// pepHandler hot path:
// 1. Extract token from header (zero-copy pointer into read buffer)
// 2. fwHash lookup in token cache (O(1), no GC)
// 3. If miss: validate JWT via OpenSSL (threaded, non-blocking)
// 4. If miss: HTTP to Keyrock via persistent connection pool
// 5. Forward to backend via persistent connection pool
// 6. Stream response back (zero-copy splice where possible)

Component-by-Component Replacement

Wilma Component	Node.js	C + fw-libs	Impact
HTTP server	Express (event loop, JS middleware)	fwHttp (epoll, zero-copy parse)	10–50× faster request parsing
JWT verification	`jsonwebtoken` (sync crypto)	OpenSSL HMAC/RSA (threaded)	Non-blocking, 2–5× faster crypto
Token cache	`node-cache` (JS object)	fwHash (flat table, no GC)	~10× faster, zero GC pressure
Proxy to backend	`got` (no pool, full buffer)	fwHttp client + connection pool	Eliminates TCP handshake per request
Proxy to Keyrock	`got` (new conn per call)	Persistent connection to Keyrock	~10× faster cache-miss path
Body handling	`Buffer.concat()`	Zero-copy (fwHttp read buffer)	No copy, no allocation
JSON parsing	`JSON.parse()`	fwJson (in-place, zero-alloc)	5–10× faster
Memory allocation	V8 heap + GC (~100 MB)	fwAlloc bump allocator (~2–5 MB)	20–50× less RAM, zero GC
Logging	`morgan` + `debug`	fwTrace (structured, near-zero cost)	~100× less logging overhead
Clustering	N × Node.js processes	SO_REUSEPORT (fwHttp built-in)	N instances at ~2 MB each, not N × 100 MB

Performance Projection

The PEP proxy hot path (cache hit) becomes:

Phase	Node.js (Wilma)	C (Wilma 2.0)
Parse HTTP request	~0.5–2ms (Express + middlewares)	~3–5μs (fwHttp zero-copy)
Extract token	~0.01ms	~0.01μs (pointer into buffer)
Cache lookup	~0.01ms	~0.5μs (fwHash)
JWT verify (if needed)	~0.1–1ms (blocks event loop)	~0.05–0.5ms (threaded OpenSSL)
Enrich headers	~0.05ms (JSON.stringify roles)	~1μs (pre-cached)
Forward to backend	~0.5–1ms (new TCP conn)	~0.02–0.05ms (pooled conn)
Return response	~0.1–0.5ms (full buffer)	~0.01–0.05ms (stream/splice)
Total (cache hit)	~1.5–5ms	~0.01–0.05ms (+ backend time)

Summary

Metric	Wilma (Node.js)	Wilma 2.0 (C)	Improvement
Throughput (cache hit)	~5,000–15,000 req/s	~100,000–200,000 req/s	~10–40×
Throughput (cache miss)	~200–1,000 req/s	~2,000–5,000 req/s	~5× (Keyrock-limited)
p99 latency (cache hit)	~5–20ms	~0.05–0.2ms	~50–100×
RAM per instance	~100 MB	~2–5 MB	~20–50× less
RAM (8-core cluster)	~800 MB	~5–10 MB	~80–160× less
Startup time	~1–2s	<10ms	~100×
Proxy overhead added	~1.5–5ms	~0.01–0.05ms	~50–100× less

The cache-miss throughput is still limited by Keyrock's response time, but with persistent connection pooling the overhead drops from ~50ms (new TLS handshake) to ~5ms (reused connection). A Keyrock 2.0 in C (see Keyrock analysis) would reduce this further.

Development

Effort Estimate with Claude Max

Wilma is the easiest FIWARE GE to rewrite. It's a thin proxy with well-defined behavior. The fw-libs provide ~80% of the infrastructure.

Component	Work	Estimate
HTTP proxy core	fwHttp server + client with connection pooling, header forwarding, body streaming	1 week
JWT validation	OpenSSL HMAC-SHA256/RS256 verification, token parsing with fwJson	3–4 days
Token/decision cache	fwHash with TTL expiry, thread-safe access	2–3 days
Keyrock integration	HTTP client for token validation, user info enrichment	3–4 days
Authorization PDPs	Keyrock basic, AuthZForce XACML, OPA HTTP client	1 week
NGSI-LD payload analysis	Level 3 body inspection (entity IDs, attributes, types) with fwJson	3–4 days
Configuration & startup	Config file parsing, environment variable overrides, graceful shutdown	2–3 days
Testing & hardening	Unit tests, integration tests against Keyrock, load testing	1 week
Total		3–5 weeks

Verdict: The Quick Win

Wilma is the ideal first rewrite target. It's small (~2,500 lines of JavaScript), the behavior is well-defined (validate token, proxy request), and the performance impact is massive — because the PEP proxy sits in front of every protected service, making it faster benefits the entire platform.

With 3–5 weeks of effort, you get a PEP proxy that adds 50μs of overhead instead of 5ms, uses 2 MB instead of 100 MB, and handles 100K+ req/s per core. The latency reduction alone justifies the rewrite — every API call in the platform becomes 1–5ms faster.

Bonus: building Wilma 2.0 serves as a test bed for fwHttp's proxy capabilities (connection pooling, streaming, header manipulation) that will also be needed for other GE rewrites.