High-Performance XDP BNG · CGNAT · QoS · Low-Latency · Zero-Downtime Operations

Broadband Gateway · Zero-Downtime ISSU

Upgrade the BNG Without Dropping a Single Subscriber — on One Commodity x86 Server

Hitless In-Service Software Upgrades (ISSU) for the BNGSOFT XDP BNG: a new data-plane version is rolled in with the NIC never stopping forwarding, all subscriber, CGNAT and QoS state preserved, and zero session loss — no reconnect storm, no redundant chassis, no dual supervisors. The kind of ISSU that used to need carrier-class hardware, now a routine restart on a single server.

Traditionally, upgrading BNG software means a choice: drop every session and ride out the reconnect storm, or buy a redundant chassis with ISSU-capable dual supervisors. BNGSOFT does neither — it swaps the data plane in place on commodity hardware.

subscriber sessions dropped
during a daemon/XDP upgrade

Sub-sec

control-plane blip — the data
plane never stops forwarding

1 server

commodity x86 — no redundant
chassis, no dual supervisors

In-place
SWAP

pinned bpf_link program update;
the NIC never detaches XDP

The mechanism is simple to state and hard to fake: the XDP forwarding program is attached through a pinned bpf_link, and every piece of subscriber state — customer IP maps, CGNAT conntrack and port-blocks, per-subscriber QoS config, stats — lives in pinned BPF maps that outlive the daemon process. On upgrade, the new bngxdpd daemon re-attaches to those same maps and atomically swaps the XDP program in place. The link is never detached, so the NIC never stops running XDP. Packets keep flowing through the swap; nobody renegotiates a session.

1 · The two traditional ways to upgrade a BNG — and why both hurt

OPTION A · DROP EVERYONE

Restart the software, take the reconnect storm.

Every subscriber session dies at once: thousands of PPPoE/IPoE re-negotiations and re-auths hit the box and the RADIUS/AAA stack in a burst.
Minutes of hard outage for the whole node — and a thundering-herd load spike on the way back up.
Done only in a maintenance window, late at night, with staff on standby and customers warned.
So upgrades get deferred — security fixes and features wait weeks for the next window.

OPTION B · BUY REDUNDANCY

Pay for a chassis that can fail over.

Carrier-class ISSU means an expensive redundant chassis with dual supervisor cards.
Capex and complexity just to earn the privilege of upgrading without dropping subscribers.
Two of everything, a tightly-coupled vendor stack, and a fail-over path you have to trust.
Out of reach for the commodity-x86 economics that make a software BNG attractive in the first place.

BNGSOFT is a third option. Upgrade the data-plane software without dropping sessions, on a single commodity x86 server — no maintenance window for routine upgrades, no redundant chassis to buy. The forwarding plane is decoupled from the daemon's lifecycle, so restarting the daemon is not the same as interrupting traffic.

2 · How the hitless swap works — pinned link, pinned state

The whole trick rests on a clean separation: the forwarding program and its state live in the kernel, pinned, independent of the daemon process. The daemon is just the thing that builds and manages them. Kill the daemon and the XDP program keeps running off the same maps; start a new daemon and it adopts them.

PINNED bpf_link — THE ATTACHMENT SURVIVES

The NIC never stops running XDP

update-in-place

bpf_link__update_program, not detach + re-attach

The XDP program is attached to the NIC via a pinned bpf_link, persisted by the kernel, independently of the daemon process.
On upgrade the link is updated in place to point at the new program — the link is never detached.
There is no window where the NIC is running no XDP program; the swap is atomic from the data plane's view.
Packets in flight keep being forwarded across the swap.

PINNED MAPS — THE STATE SURVIVES

Subscriber state is preserved, not rebuilt

Kernel-pinned

customer IP · CGNAT CT/port-blocks · QoS · stats

All subscriber, CGNAT and QoS state lives in pinned BPF maps that outlive the daemon process.
The new daemon re-attaches to the existing maps — no NAT re-allocation, no table rebuild.
No session is re-authenticated; no public IP:port mapping changes hands.
Auxiliary BPF state (e.g. the BNGSOFT DNS maps) survives the restart too.

New daemon starts. It opens the existing pinned maps and loads the new XDP program object — while the old program is still attached and forwarding.

Atomic program swap. The daemon calls bpf_link__update_program on the pinned link, replacing the running program in place. The link is never detached; the NIC never goes XDP-less.

State is adopted, not rebuilt. Because the maps were pinned, the new program reads the same customer IP entries, CGNAT conntrack, port-block allocations, per-subscriber QoS and stats. Nothing re-allocates.

Control plane re-syncs. The daemon re-syncs subscriber state over the internal Unix-socket bus (bngsync), with backoff so large fleets don't storm — and forwarding never paused throughout.

The key insight: in a chassis ISSU you keep traffic alive by failing over to a second set of hardware. Here you keep traffic alive because the forwarding plane was never the thing being restarted — it lives in the kernel, pinned, while the user-space daemon that manages it is the only thing that cycles.

3 · Graceful CGNAT restart — public IP:port mappings stay put

For a CGNAT box, "don't drop the session" is only half the promise. The other half is "don't break the NAT." A naive restart that re-allocated port-blocks would change every subscriber's public IP:port — breaking established flows just as surely as a session drop would. BNGSOFT's restart is graceful with respect to CGNAT state.

PORT-BLOCKS SURVIVE

Same public IP:port, before and after.

CGNAT port-block allocations live in pinned maps.
The new daemon adopts them — no re-allocation.
Subscribers keep their public IP:port mapping.

CONNTRACK SURVIVES

Established flows aren't broken.

Connection-tracking state is preserved across the restart.
Existing NAT'd flows continue uninterrupted.
No mid-call / mid-stream resets from the upgrade.

NO EXTERNAL DATASTORE

Nothing to coordinate.

Control-plane IPC runs over Unix sockets (bngsync).
Replaced an earlier external-datastore dependency — nothing external to coordinate.
State re-syncs over the socket with storm-avoiding backoff.

Why this matters for the subscriber: a customer mid-video-call, mid-download, or holding a long-lived gaming connection notices nothing when you upgrade the box. Their session stays up, their public mapping stays the same, and their existing flows keep flowing. The upgrade is invisible from the subscriber's side of the wire.

4 · Many changes don't need a restart at all REAL CAPABILITY

The hitless restart is the heavy-duty path for a full daemon/XDP version change. But a large class of operational changes apply live, with no restart whatsoever — the running daemon re-reads its config and updates the data plane in place.

Change you want to make	How it applies	Subscriber impact
CGNAT no-NAT exempt destinations (e.g. IPTV / direct-routed prefixes)	Config-file mtime change is auto-detected; the exempt list reloads live.	None — no restart, no session touch.
Firewall rule reload	Rules re-applied on reload; stale permanent entries pruned, dynamic (BNGSOFT DNS) entries kept.	None — applied in place.
General config reload	`SIGHUP` / config-file mtime reload applies the change to the running daemon.	None — zero interruption.
New daemon / XDP program version (feature, bugfix, data-plane change)	Hitless ISSU: pinned-link in-place program swap + pinned-map state adoption.	None — sub-second control-plane blip, no session loss.
Full OS / kernel image change	Requires a reboot — a different, planned operation.	Drops sessions — done in a maintenance window or via node fail-over.

Honest boundary: the hitless ISSU path covers the bngxdpd daemon and the XDP program — which is the vast majority of upgrades (features, bugfixes, data-plane changes). A full operating-system / kernel image change still requires a reboot, which is a separate planned operation that does drop sessions; operators handle kernel upgrades with a maintenance window or node fail-over. We position ISSU as the routine upgrade path, not as a claim that nothing ever needs a reboot.

5 · What an operator sees during an upgrade ILLUSTRATIVE FORMAT

The example below illustrates the shape of a hitless restart from the operator's side: the daemon cycles, the pinned link is updated in place, the pinned maps are adopted, and the data plane never reports a forwarding gap. Exact wording and counters vary by release and box.

root@NodeA:~# /etc/init.d/bngxdpd restart

* stopping bngxdpd (leaving pinned link + maps in place) ...
* starting bngxdpd ...
bngxdpd: adopting pinned BPF state (pinned kernel maps)
  [ OK ] pinned link found      eth-uplink (xdp, native)
  [ OK ] customer_ip_map     adopted  (8,538 entries preserved)
  [ OK ] cgnat conntrack     adopted  (port-blocks intact)
  [ OK ] qos / stats maps    adopted
  [ OK ] bng-dns maps       adopted
bngxdpd: swapping XDP program in place
  [ OK ] bpf_link__update_program  link never detached — NIC kept running XDP
  [ OK ] control-plane re-sync over bngsync  (backoff enabled)

  UPGRADE: hitless — data plane never stopped forwarding; 0 sessions dropped

Illustrative format — this block demonstrates the sequence and the design guarantees (pinned-link in-place swap, pinned-map adoption, no detach), not verbatim output from a specific node. The honest framing of what is and isn't captured live is disclosed in the note at the foot of this page.

Robustness, hardened over several releases. An init watchdog plus orphan-bpf-link cleanup defenses guard against a "populate-from-empty hang" failure class on restart — the pathological case where a half-attached link or empty map could otherwise stall startup. The safe-restart path is the result of repeated field-hardening, not a first draft.

6 · ISSU on commodity x86 vs. the traditional alternatives

Property	BNGSOFT XDP ISSU	Traditional software restart	Redundant ISSU chassis
Subscriber sessions during upgrade	Preserved — zero dropped	All dropped — reconnect storm	Preserved (via fail-over)
Data-plane forwarding	Never stops — in-place program swap	Stops for minutes	Continues on standby supervisor
CGNAT public IP:port mappings	Unchanged — port-blocks/CT adopted	Re-allocated — flows break	Preserved if state-synced
Hardware required	One commodity x86 server	One server (but with outage)	Redundant chassis, dual supervisors
Maintenance window for routine upgrades	Not needed — upgrade any night	Required every time	Not needed
External datastore to coordinate	None — Unix-socket bus (bngsync)	N/A	Vendor-internal
Capex to obtain ISSU	None — it's a software property	None (but you pay in outages)	High — redundant hardware
Kernel / OS image change	Planned reboot (window / fail-over)	Planned reboot	Often hitless on chassis

Reading the table honestly: a redundant chassis can also do hitless kernel upgrades, which BNGSOFT handles with a planned reboot. The trade BNGSOFT offers is that you get hitless data-plane software ISSU — the upgrade you actually run most often — on a single commodity server, with no redundant-hardware capex. For the routine feature/bugfix/data-plane cadence, that is the upgrade path that matters.

7 · What it means for the business

Zero-Downtime ISSU · operator value

🌙

Upgrade any night, no windowRoutine daemon/XDP upgrades roll out without a maintenance window or customer notification — the data plane never stops and no session drops.

⚡

Faster security & feature rolloutFixes ship the day they're ready instead of waiting weeks for the next window. Less time exposed, quicker time-to-value on new features.

No redundant-chassis capexYou get ISSU as a software property on one commodity x86 server — without buying dual-supervisor carrier hardware just to upgrade safely.

↓

Lower operational riskNo reconnect storm means no thundering-herd load on RADIUS/AAA, no mass re-auth, and a hardened safe-restart path with watchdog + orphan-link defenses.

✓

Subscriber experience protectedEstablished flows, CGNAT mappings and live sessions all survive the upgrade — customers don't feel the deploy at all.

No new systems to runIt's built into the daemon and the kernel's pinned BPF objects. No external datastore, no clustering layer, no second box to keep in sync.

The bottom line

Deploying a new BNGSOFT daemon or XDP version is a sub-second control-plane blip with zero data-plane interruption and zero subscriber session loss — on one commodity server, with no redundant chassis. The forwarding plane and all subscriber/CGNAT/QoS state live pinned in the kernel; the daemon that manages them is the only thing that cycles.

That turns upgrades from a dreaded, scheduled, customer-impacting event into a routine operation you can run any night. Faster fixes, lower risk, no special hardware — and the subscriber never knows it happened.

Methodology and honest framing: This document describes the zero-downtime in-service software upgrade (ISSU) behavior of the BNGSOFT XDP BNG. The grounded mechanism is real: the XDP forwarding program is attached via a pinned bpf_link, and subscriber/forwarding state (customer IP maps, CGNAT connection-tracking and port-block allocations, per-subscriber QoS config, and stats — plus auxiliary state such as the BNGSOFT DNS maps) lives in pinned BPF maps held by the kernel, independent of the daemon. On upgrade the new bngxdpd daemon re-attaches to those existing pinned maps and atomically swaps the XDP program in place via bpf_link__update_program — the link is never detached, so the NIC never stops running XDP and the data plane keeps forwarding through the swap; subscriber state is preserved with no NAT re-allocation and no session loss. CGNAT port-block allocations and connection-tracking state survive the daemon restart, so existing NAT'd flows are not broken and subscribers retain their public IP:port mappings. Control-plane IPC runs over Unix sockets (an internal bus called bngsync, which replaced an earlier external-datastore dependency, removing any external datastore); the new daemon re-syncs subscriber state over the socket with backoff to avoid storms on large fleets (4,000+ subscribers). Many configuration changes apply live with no restart at all — SIGHUP / config-file mtime reload applies changes such as CGNAT no-NAT exempt destinations (auto-reload on file change) and firewall rule reloads with zero interruption. Robustness has been hardened over several releases with an init watchdog and orphan-bpf-link cleanup defenses that prevent a "populate-from-empty hang" failure class on restart. The terminal block in Section 5 is marked ILLUSTRATIVE FORMAT: it demonstrates the sequence and the design guarantees (pinned-link in-place swap, pinned-map adoption, no detach, hitless result) and uses representative entry counts; it is not captured verbatim from a specific production node, and exact log wording and counters vary by release and box. Honest caveat on scope: the hitless ISSU described here covers the bngxdpd daemon and the XDP program — feature, bugfix and data-plane upgrades, which are the vast majority of upgrades. A full operating-system / kernel image change is a different, planned operation that still requires a reboot and does drop sessions; operators handle kernel upgrades with a maintenance window or node fail-over. The example node is referred to as "NodeA"; no real customer names are used. Prepared as a management and operations overview for large-scale operators. Zero-Downtime ISSU is a capability of the BNGSOFT XDP BNG product.