Language comparison for an NGSI-LD Context Broker
An NGSI-LD context broker is fundamentally a JSON-LD processing engine with a REST API, a database backend, and an async notification system. The performance-critical request pipeline looks like this:
Steps 1–5 and 7–8 are CPU-bound string/memory manipulation. Step 6 is I/O-bound. Step 9 is async I/O. This distinction is critical for the language choice.
Go's goroutines are often cited as the killer feature for REST servers. They provide lightweight concurrency — millions of goroutines can run on a few OS threads, with the Go runtime scheduler handling multiplexing.
However, the existing Orion-LD 1.x architecture already solves concurrency elegantly:
// Current architecture (with fwHttp replacing MHD)
fwHttp: epoll edge-triggered event loop + connection pool (1024 pre-allocated)
Per-connection fwAlloc // zero locking on hot path
Thread-local OrionldState // no shared mutable state
MongoDB connection // driver-managed
Pernot async notification threads // decoupled from request path
This is essentially the same concurrency model as goroutines — multiplexed I/O with pre-allocated connection state — just explicit instead of runtime-managed. Go's advantage here is developer ergonomics, not performance. Goroutines add a scheduling layer that the C design avoids entirely.
// Actual hot path: 3 operations, pure pointer arithmetic
char* start = kaP->allocPointer;
kaP->allocPointer += size;
kaP->bytesLeft -= size;
return start;
// End of request: single reset, no individual frees
faBufferReset(kaP);
Go cannot replicate this. Go's GC is excellent (sub-millisecond pauses), but it still scans all live objects periodically, adds latency variance (p99 jitter), uses ~2x memory overhead for concurrent mark-sweep, and cannot do bulk deallocation. For a broker doing thousands of requests/sec, each creating hundreds of JSON nodes, fwAlloc's "allocate fast, free everything at once" pattern is unbeatable.
// The parser modifies the input buffer directly
// Replaces quote delimiters with \0 for string termination
// Zero allocation for string values — they point into the input buffer
FtNode* tree = kjParse(fwJsonP, buffer);
// tree->value.s points directly into buffer, no copy
Go strings are immutable. Every JSON string value requires a separate allocation and copy. Even the best Go JSON parsers (sonic, json-iterator) can't match in-place parsing because Go's memory model forbids it. fwJson-style parsing is typically 3–5x faster than the best Go JSON parsers for broker-style workloads.
// CPU branch predictor hint + cached hash for fast rejection
if (likely(itemP->hashCode == hashCode)) // integer compare first
if (compareFunction(...)) // string compare only if needed
Go has no __builtin_expect. For @context expansion (done on every attribute of every entity), this matters at scale.
// No mutex, no GC interaction, fixed-point for atomic float ops
atomic_fetch_add(&counter->value.iValue, 1000);
Even the HTTP layer is a custom fw-lib. fwHttp replaces libmicrohttpd (MHD) with a purpose-built REST server:
// ~1000 lines of C. That's the entire HTTP server.
// epoll edge-triggered event loop
struct epoll_event ev;
ev.events = EPOLLIN | EPOLLET; // edge-triggered for max throughput
ev.data.ptr = conn; // direct pointer to connection state
// Pre-allocated connection pool (1024 connections, zero malloc on accept)
KhConn* conn = khConnGet(server); // O(1) free-list pop
conn->fd = clientFd;
setNonBlocking(clientFd);
setTcpNoDelay(clientFd); // Nagle off for low latency
// Zero-copy HTTP parsing — parses in place, null-terminates inline
// conn->path, conn->headers[].key, conn->body all point INTO the read buffer
khParse(conn);
// Per-connection fwAlloc for response building (reset on keep-alive)
faBufferReset(&conn->alloc, true);
Key design decisions that Go's net/http cannot match:
malloc on the hot path. Go allocates goroutine stacks dynamically.http.Request copies every header key and value into heap-allocated strings.* matches a single path component, ** matches the rest. Critical for NGSI-LD entity IDs which are full URIs (e.g., /ngsi-ld/v1/entities/urn:ngsi-ld:Building:001).khSetJson(conn, tree) renders a FtNode tree directly into the response buffer. No intermediate serialization step.With fwHttp, the entire request pipeline from TCP socket to response avoids copying data:
fwHttp reads into pre-allocated buffer → parses HTTP in-place → fwJson parses body in-place → FtNode names point into original buffer → DB query built from FtNode → fwJson renders response → fwAlloc bulk reset
For a 4KB entity with 20 attributes: 3–5 allocations (from bump allocator)
net/http reads into []byte → copies request headers into http.Request → json.Unmarshal allocates structs, copies all strings → build BSON (more copies) → marshal response (more allocations) → GC cleans up
Same entity: 60–100+ heap allocations triggering GC pressure
| Aspect | Go's Advantage |
|---|---|
| MongoDB I/O | Both wait on network. Go's goroutine scheduler is slightly more efficient at I/O multiplexing than epoll+threads, but the difference is <5% |
| HTTP/2 & TLS | Go has built-in HTTP/2, but a REST API broker doesn't benefit from HTTP/2's multiplexing (designed for browsers loading many assets concurrently — not API clients sending one request at a time). TLS is typically terminated at a reverse proxy (nginx, envoy) in production deployments regardless of language |
| Notification fan-out | Goroutines shine for "send 1000 HTTP notifications concurrently" — simpler than managing curl multi handles |
| Development speed | Significantly faster for a Go expert. But irrelevant if you're not one |
| Memory safety | No buffer overflows or use-after-free. But fwAlloc's bulk alloc/free pattern largely eliminates these risks |
| Code maintainability | Go is easier for others to contribute to |
Orion-LD 1.x achieves ~5,000 req/s on a single core with 4–5 GB RAM (3–4 GB for MongoDB). Using this as the baseline, the broker's own CPU work per request is ~50–65μs — the rest is MongoDB I/O wait. This gives us a grounded projection for both the FiWorks Broker and a hypothetical Go rewrite:
| Metric | C 2.0 (fwHttp + fw-libs) | Go (best libraries + MongoDB) |
|---|---|---|
| Requests/sec (single core) | ~10,000–15,000 | ~3,000–4,000 (GC + MongoDB) |
| System throughput (same HW) | ~15,000–30,000 (multi-core via SO_REUSEPORT) | ~5,000–8,000 |
| p50 latency | ~0.1ms | ~0.3–0.5ms |
| p99 latency | ~0.5–2ms | ~3–8ms (GC jitter + MongoDB tail latency) |
| Memory (total system) | ~4–5GB (incl. MongoDB) | ~5–7GB (Go heap + MongoDB) |
| JSON parse throughput | ~800MB/s (fwJson zero-copy) | ~200–400MB/s |
The p99 latency difference is the killer. Go's GC can cause occasional 1–5ms pauses even with the excellent Go 1.22+ collector. Combined with MongoDB tail latency, the p99 for Go would be significantly worse than the C stack. For IoT platforms processing sensor data at high rates, consistent low latency matters more than average throughput.
Go IS easy to learn syntactically. You could write basic Go in a week. But:
sync.Pool, avoiding allocations — a different skill set from CThe one scenario where Go makes sense: if you need other people to contribute and maintain the code, and they know Go but not C. That's a team/project decision, not a technical one.