Go vs C — FiWorks Labs

What the Broker Actually Does

An NGSI-LD context broker is fundamentally a JSON-LD processing engine with a REST API, a database backend, and an async notification system. The performance-critical request pipeline looks like this:

fwHttp (parse HTTP) → fwJson (parse JSON-LD) → Expand @context → Validate → Build DB query → DB roundtrip → Build response → fwJson (render JSON) → Notifications

Steps 1–5 and 7–8 are CPU-bound string/memory manipulation. Step 6 is I/O-bound. Step 9 is async I/O. This distinction is critical for the language choice.

The Goroutine Argument

Go's goroutines are often cited as the killer feature for REST servers. They provide lightweight concurrency — millions of goroutines can run on a few OS threads, with the Go runtime scheduler handling multiplexing.

However, the existing Orion-LD 1.x architecture already solves concurrency elegantly:

// Current architecture (with fwHttp replacing MHD)
fwHttp: epoll edge-triggered event loop + connection pool (1024 pre-allocated)
Per-connection fwAlloc               // zero locking on hot path
Thread-local OrionldState           // no shared mutable state
MongoDB connection                  // driver-managed
Pernot async notification threads   // decoupled from request path

This is essentially the same concurrency model as goroutines — multiplexed I/O with pre-allocated connection state — just explicit instead of runtime-managed. Go's advantage here is developer ergonomics, not performance. Goroutines add a scheduling layer that the C design avoids entirely.

Where C Wins Decisively

1. fwAlloc — O(1) Bump Allocation, Bulk Free

// Actual hot path: 3 operations, pure pointer arithmetic
char* start = kaP->allocPointer;
kaP->allocPointer += size;
kaP->bytesLeft -= size;
return start;

// End of request: single reset, no individual frees
faBufferReset(kaP);

Go cannot replicate this. Go's GC is excellent (sub-millisecond pauses), but it still scans all live objects periodically, adds latency variance (p99 jitter), uses ~2x memory overhead for concurrent mark-sweep, and cannot do bulk deallocation. For a broker doing thousands of requests/sec, each creating hundreds of JSON nodes, fwAlloc's "allocate fast, free everything at once" pattern is unbeatable.

2. fwJson — In-Place Zero-Copy Parsing

// The parser modifies the input buffer directly
// Replaces quote delimiters with \0 for string termination
// Zero allocation for string values — they point into the input buffer
FtNode* tree = kjParse(fwJsonP, buffer);
// tree->value.s points directly into buffer, no copy

Go strings are immutable. Every JSON string value requires a separate allocation and copy. Even the best Go JSON parsers (sonic, json-iterator) can't match in-place parsing because Go's memory model forbids it. fwJson-style parsing is typically 3–5x faster than the best Go JSON parsers for broker-style workloads.

3. fwHash — Branch Prediction + Hash Caching

// CPU branch predictor hint + cached hash for fast rejection
if (likely(itemP->hashCode == hashCode))   // integer compare first
    if (compareFunction(...))               // string compare only if needed

Go has no __builtin_expect. For @context expansion (done on every attribute of every entity), this matters at scale.

4. fwProm — Lock-Free Atomic Metrics

// No mutex, no GC interaction, fixed-point for atomic float ops
atomic_fetch_add(&counter->value.iValue, 1000);

5. fwHttp — Zero-Copy HTTP Server with epoll

Even the HTTP layer is a custom fw-lib. fwHttp replaces libmicrohttpd (MHD) with a purpose-built REST server:

// ~1000 lines of C. That's the entire HTTP server.
// epoll edge-triggered event loop
struct epoll_event ev;
ev.events = EPOLLIN | EPOLLET;           // edge-triggered for max throughput
ev.data.ptr = conn;                       // direct pointer to connection state

// Pre-allocated connection pool (1024 connections, zero malloc on accept)
KhConn* conn = khConnGet(server);        // O(1) free-list pop
conn->fd = clientFd;
setNonBlocking(clientFd);
setTcpNoDelay(clientFd);                 // Nagle off for low latency

// Zero-copy HTTP parsing — parses in place, null-terminates inline
// conn->path, conn->headers[].key, conn->body all point INTO the read buffer
khParse(conn);

// Per-connection fwAlloc for response building (reset on keep-alive)
faBufferReset(&conn->alloc, true);

Key design decisions that Go's net/http cannot match:

Pre-allocated connection pool — 1024 connections allocated at startup, returned to free-list on close. Zero malloc on the hot path. Go allocates goroutine stacks dynamically.
Zero-copy HTTP parsing — request line, headers, and query parameters are parsed by null-terminating directly in the read buffer. No string copies. Go's http.Request copies every header key and value into heap-allocated strings.
Per-connection fwAlloc — each connection has an 8KB inline bump allocator (expandable to 32KB). Response headers, JSON rendering, everything allocated from the bump buffer and bulk-freed on connection reset.
Wildcard URL matching — * matches a single path component, ** matches the rest. Critical for NGSI-LD entity IDs which are full URIs (e.g., /ngsi-ld/v1/entities/urn:ngsi-ld:Building:001).
Direct FtNode integration — khSetJson(conn, tree) renders a FtNode tree directly into the response buffer. No intermediate serialization step.

6. Zero-Copy Pipeline — TCP to Response, No Copies

With fwHttp, the entire request pipeline from TCP socket to response avoids copying data:

C Pipeline (fwHttp + fw-libs)

fwHttp reads into pre-allocated buffer → parses HTTP in-place → fwJson parses body in-place → FtNode names point into original buffer → DB query built from FtNode → fwJson renders response → fwAlloc bulk reset

For a 4KB entity with 20 attributes: 3–5 allocations (from bump allocator)

Go Pipeline (net/http + encoding/json)

net/http reads into []byte → copies request headers into http.Request → json.Unmarshal allocates structs, copies all strings → build BSON (more copies) → marshal response (more allocations) → GC cleans up

Same entity: 60–100+ heap allocations triggering GC pressure

Where Go Would Be Comparable or Better

Aspect	Go's Advantage
MongoDB I/O	Both wait on network. Go's goroutine scheduler is slightly more efficient at I/O multiplexing than epoll+threads, but the difference is <5%
HTTP/2 & TLS	Go has built-in HTTP/2, but a REST API broker doesn't benefit from HTTP/2's multiplexing (designed for browsers loading many assets concurrently — not API clients sending one request at a time). TLS is typically terminated at a reverse proxy (nginx, envoy) in production deployments regardless of language
Notification fan-out	Goroutines shine for "send 1000 HTTP notifications concurrently" — simpler than managing curl multi handles
Development speed	Significantly faster for a Go expert. But irrelevant if you're not one
Memory safety	No buffer overflows or use-after-free. But fwAlloc's bulk alloc/free pattern largely eliminates these risks
Code maintainability	Go is easier for others to contribute to

Quantitative Estimate (Anchored to Real-World Data)

Orion-LD 1.x achieves ~5,000 req/s on a single core with 4–5 GB RAM (3–4 GB for MongoDB). Using this as the baseline, the broker's own CPU work per request is ~50–65μs — the rest is MongoDB I/O wait. This gives us a grounded projection for both the FiWorks Broker and a hypothetical Go rewrite:

Metric	C 2.0 (fwHttp + fw-libs)	Go (best libraries + MongoDB)
Requests/sec (single core)	~10,000–15,000	~3,000–4,000 (GC + MongoDB)
System throughput (same HW)	~15,000–30,000 (multi-core via SO_REUSEPORT)	~5,000–8,000
p50 latency	~0.1ms	~0.3–0.5ms
p99 latency	~0.5–2ms	~3–8ms (GC jitter + MongoDB tail latency)
Memory (total system)	~4–5GB (incl. MongoDB)	~5–7GB (Go heap + MongoDB)
JSON parse throughput	~800MB/s (fwJson zero-copy)	~200–400MB/s

The p99 latency difference is the killer. Go's GC can cause occasional 1–5ms pauses even with the excellent Go 1.22+ collector. Combined with MongoDB tail latency, the p99 for Go would be significantly worse than the C stack. For IoT platforms processing sensor data at high rates, consistent low latency matters more than average throughput.

The "Easy to Learn" Claim

Go IS easy to learn syntactically. You could write basic Go in a week. But:

Writing performant Go requires understanding the GC, escape analysis, sync.Pool, avoiding allocations — a different skill set from C
Go's concurrency is easy to write, hard to debug (goroutine leaks, channel deadlocks)
35 years of C intuition about cache lines, memory layout, and performance patterns doesn't transfer to Go
The fw-libs represent years of battle-tested optimization. Rewriting them in Go means losing that investment and getting something slower

Verdict: Stay with C

The concurrency problem is already solved — fwHttp's epoll edge-triggered event loop + pre-allocated connection pool + async notifications. Goroutines don't add value here.
The fw-libs are a massive competitive advantage — fwHttp + fwAlloc + fwJson + fwHash give performance that Go literally cannot match due to language-level constraints (GC, immutable strings, no in-place parsing, no connection pool pre-allocation).
p99 latency matters for IoT brokers — GC pauses are the one thing Go cannot fully eliminate.
35 years of C vs 0 days of Go — your C code will be better than your Go code for years.
The rewrite cost is enormous for marginal benefit in one area (async I/O ergonomics) and regression in the area that matters most (raw throughput and latency).

The one scenario where Go makes sense: if you need other people to contribute and maintain the code, and they know Go but not C. That's a team/project decision, not a technical one.