CGNAT Arena Engine · runtime-allocated unified conntrack · one-command operator rate control
Engineering & Operations Brief · Next-Gen CGNAT

CGNAT conntrack that sizes its memory to real sessions — not to the worst-case ceiling

A carrier-grade NAT keeps a connection-tracking table sized for its peak. The classic design pre-allocates that peak twice — a forward table and a reverse table, each reserved for the full session ceiling — so a box that is 20% subscribed still holds the memory of a box at 100%. BNGSOFT's CGNAT Arena Engine rebuilds the conntrack store on a runtime-allocated BPF arena: one unified bidirectional record per session, with pages that fault in on demand — so resident memory tracks the sessions you actually have, not the ones you provisioned for. It rides behind a single config switch, with the proven map-based engine as the byte-identical default.
1 record
per session
unified forward + reverse — written and reaped once, never drift
On demand
memory faults in
resident footprint tracks real sessions, not the reserved ceiling
1 switch
maps ↔ arena
engine-gated; the proven map engine stays the default
1 command
operator rate control
set or cap any slice of subscribers, live, with a dry-run first
The table no longer reserves the ceiling twice. It grows with the sessions you actually carry — and shrinks back when they leave. Memory becomes a function of load, not of provisioning.

The classic cost: pre-allocate the peak, twice

XDP/eBPF data planes track NAT sessions in hash maps that are fixed-size at load. To guarantee a worst-case subscriber count never overflows the table, the engine pre-allocates the full ceiling — and because a NAT needs to look a flow up from both directions (subscriber→internet for the outbound translation, internet→subscriber for the return), it keeps two such tables.

The map engine (today's default)

Twin pre-allocated tables

  • Forward table + reverse table, each sized to the full session ceiling.
  • Memory is reserved at load and held whether the box is 5% or 100% subscribed.
  • Two independent tables can drift — entry inserted/aged in one but not the other is a class of bug to defend against.
  • Battle-tested, predictable, and the production default.
The arena engine (next-gen)

One unified record, allocated at runtime

  • A single bidirectional record per session reached from both lookups — written, refreshed, and reaped exactly once.
  • Backing store is a runtime-allocated arena: virtual space reserved, physical pages fault in on demand.
  • Resident memory tracks the real session count; idle capacity costs nothing.
  • Forward and reverse can no longer drift — there is only one record.
Resident memory at ~20% subscribed (illustrative — same worst-case capacity both ways)
Map engine ceiling ×2 — reserved always fwd table back table Arena engine virtual reservation · pages on demand resident ≈ live sessions
Same worst-case capacity. The map engine holds both tables at the full ceiling whatever the load; the arena reserves only virtual space and makes pages resident as real sessions arrive — so idle capacity costs nothing.

How the arena engine is built

This is data-plane engineering, not a config toggle on someone else's stack. Three decisions make it work inside a single, large XDP program already operating at the kernel verifier's limits:

Runtime allocation, not static reservation

The conntrack store and its two lookup indexes are allocated from the arena at runtime — never declared as giant static tables. That keeps the program object small and lets the kernel fault pages in only as sessions arrive.

One record, two indexes

A unified record holds both the private 5-tuple side and the public ip:port side. A forward index (subscriber→net) and a reverse index (net→subscriber) both resolve to the same record — so a session is one allocation, one timeout, one reap.

Verifier-disciplined

The engine loads inside the full monolithic XDP program under the kernel's strict instruction and stack limits — bounded, unrolled lookups; arena-resident tables that consume zero program stack; race-free session claim and timeout-aware reuse.

Lifecycle, done right. Records are claimed race-free across CPUs, reused only when free or idle-expired, and a reused slot has its stale index entries scrubbed before the new session is written — so a public ip:port can never resolve to the wrong subscriber. Conservative by construction: when in doubt the engine drops the new session rather than ever alias a live mapping.
Map engine — two tables, can drift fwd key back key fwd table back table record record two stores to keep in step Arena — one unified record fwd key back key fwd index back index ONErecord written & reaped once — can't drift
The map engine keeps a forward and a reverse table — two records that can fall out of step. The arena keeps one bidirectional record, reached by both indexes — written, refreshed and reaped exactly once.

Map engine — reserve the ceiling ×2

// fixed at load, held always forward_table[ MAX_SESSIONS ] // ceiling reverse_table[ MAX_SESSIONS ] // ceiling again resident memory = f(ceiling) ...regardless of real load

Arena engine — grow with real load

// virtual reservation; pages on demand arena := reserve(VA) record := alloc_on_first_touch() fwd_index → record ← rev_index resident memory = f(live sessions) ...idle capacity is free
Same worst-case capacity. The difference is what it costs you when you are not at worst case — which is almost always.

Operator control: rate or state on any subscriber slice, in one command SHIPPED

The arena is the long game; this you can use today. sub set-rate gives operations a single, daemon-mediated command to set a downstream/upstream rate — or flip active/inactive state — across one subscriber, a filter, or the whole box, with a dry-run that shows exactly who will be touched before anything changes.

# preview: every subscriber currently parked inactive, what they'd become bngxdpctl sub set-rate 10M/2M --inactive --state active --dry-run # apply to the whole box (bulk requires explicit confirmation) bngxdpctl sub set-rate 50M/50M --all --confirm # one subscriber, with an auto-expiring emergency override bngxdpctl sub set-rate 7M/3M --ip 100.64.3.12 --ttl 600

Any scope

A single IP or interface, a selector (e.g. every inactive subscriber), or the entire box — same command.

Safe by default

Bulk changes preview first and require explicit confirmation. An optional TTL makes an emergency override self-expire and revert.

Authoritative path

Routed through the daemon's own rate-apply logic — the same path RADIUS/CoA uses — so it stays consistent with the live subscriber state.

Why it exists. When an upstream RADIUS or policy source has a bad moment and subscribers land without a proper rate, an operator can give every affected line a sane fallback speed in one reviewed command — and let it revert automatically as the real policy returns. It is an override, not a new source of truth: the next authoritative event re-rates the subscriber.

Part of the BNGSOFT subscriber-intelligence suite

The arena engine and operator control sit alongside the measurement and self-driving layers BNGSOFT already runs in the data plane:

Flow Intelligence

Measure

  • Passive per-flow loss + access/transit RTT, measured in XDP — no probes.
Autonomous QoE

Decide

  • Learns each subscriber's own normal; flags & leans on the ones that drift — no fixed thresholds.
CGNAT Arena + Control

Scale & act

  • Memory that tracks real sessions; one-command operator rate/state control.

The honest state of play

The map-based CGNAT engine is the production default and is unchanged — the arena engine is selected by an explicit switch and is in active validation, not yet general availability. We would rather tell you that than ship a number we haven't earned.

What is real today: the unified-record architecture, the runtime-allocation design that keeps memory proportional to load, and the shipped sub set-rate operator control. What is in the lab: live, at-scale soak validation of the arena translation path before it becomes a default-eligible engine.

Want the deep dive? We will walk you through the architecture, the lifecycle correctness model, and the validation plan on a node running real traffic.

Sources & honest framing: This is an engineering and operations brief, not a benchmark report, and no performance, throughput, memory, or session-count figures are claimed. The map-based CGNAT engine (twin pre-allocated forward/reverse session tables) is the shipping production default and is unchanged. The CGNAT Arena Engine — a runtime-allocated BPF-arena unified conntrack store with one bidirectional record per session, on-demand page allocation, race-free session claim, timeout-aware reuse, and stale-index scrubbing on reuse — is selected behind an explicit engine config switch and is in active validation, not general availability; the map engine remains the byte-identical default and live fallback. The conceptual diagrams illustrate the architecture, not measured results. sub set-rate (bulk / filtered / per-interface subscriber rate and active-state control, daemon-mediated via the standard rate-apply path, with dry-run, explicit confirmation for bulk changes, and optional auto-expiring TTL) is shipped operator tooling; an override applied this way persists until the subscriber's next authoritative (RADIUS/CoA/session) event re-rates it. Related per-topic briefs — Flow Intelligence, Autonomous QoE, CGNAT, L4S/AQM, NOC2 — are available alongside this guide.