XDP BNG · CGNAT — Hardware Sizing · NIC Selection · Power & Cost-per-Subscriber

Deployment Guide · Hardware Sizing & TCO

Pick the Right Server & NIC: Throughput, Subscribers, Power and Cost per Subscriber

Because the BNGSOFT data plane runs in XDP on commodity servers, your capacity is set by three things: the NIC, the PCIe bus, and CPU/RAM for session state — not by a proprietary chassis. This guide compares Intel X710 / XL710 / E810 and NVIDIA ConnectX-6 Dx across 1U and 2U builds, and gives a simple model to size a node to a target subscriber count and traffic.

A BNG node has two ceilings: how much traffic it can move (NIC + PCIe bus) and how many subscribers it can track (CPU + RAM for the session/NAT tables). The right build balances both — and the densest balanced node wins on watts and dollars per subscriber.

10→200G

per server

X710 (20G) · XL710 (40G) ·
E810 / CX-6 Dx (100–200G)

1U & 2U

commodity x86

edge node to high-density
multi-100G aggregation

~100+

subs / watt

best at high density —
consolidation wins TCO

PCIe

the real ceiling

gen3 x8 ≈ 50G ·
gen4 x16 ≈ 200G

1 · The NICs — X710, XL710, E810 and ConnectX-6 Dx

For an XDP BNG the NIC choice drives line rate, latency, queue count and how much you can offload. All four are well-supported by mature Linux drivers (Intel i40e/ice, NVIDIA mlx5) and run XDP natively.

Adapter	Speed	PCIe	Typ. power*	Driver	Street price*	Best for
Intel X710-DA2	2 × 10G	Gen3 ×8	~3.3–4.5 W	i40e	~$200–300	Edge / small PoP nodes
Intel XL710-QDA2	2 × 40G (bus-capped)	Gen3 ×8	~7–8 W	i40e	~$300–450	20–40G nodes; mind the bus
Intel E810-CQDA2	2 × 100G	Gen4 ×16	~15–21 W	ice	~$700–900	Mainstream 100G, many queues / ADQ
NVIDIA ConnectX-6 Dx	2 × 100G / 200G	Gen4 ×16	~16–22 W	mlx5	~$965–1,400	Lowest latency + HW conntrack offload (ASAP²)

Watch the PCIe bus, not just the port speed. An XL710 2×40G sits on PCIe Gen3 ×8 (~63 Gbps raw, ~50 Gbps usable) — so it cannot move a full 80G bidirectionally; the bus caps it near ~50G aggregate. 100G-class cards need PCIe Gen4 ×16 to deliver their ports. This is the most common real-world bottleneck — we have measured it on production nodes.

Port speed only matters if the bus can carry it. Size the slot (PCIe generation × lanes) to the NIC — or the bus becomes your throughput ceiling.

2 · 1U vs 2U — what the form factor buys you

Edge & mainstream nodes

PCIe slots (usable)1–2 low-profile

Sockets1 (sometimes 2)

Typical NICX710 / XL710 / one E810

Power draw~250–450 W

Cooling / noisetight, high-RPM

Best for distributed PoP/edge: one NIC, up to ~100G, lowest rack cost per node.

High-density aggregation

PCIe slots (usable)2–4 full-height ×16

Sockets1–2 (more cores/RAM)

Typical NIC2× E810 or ConnectX-6 Dx

Power draw~450–800 W (dual PSU)

Cooling / noisebetter thermals, redundancy

Best for consolidation: 200G, more session memory, dual-PSU resilience, best subs/watt.

Rule of thumb. 1U = the cheapest way to put a 100G XDP BNG at the edge. 2U = the cheapest way to serve a lot of subscribers per rack-unit and per watt, with PSU redundancy and room for dual-100G or a SmartNIC.

3 · The sizing model — a node has two ceilings

Size every node against both limits and take the smaller:

Ceiling	Set by	How to estimate
① Traffic it can move	NIC line rate ∧ PCIe bus	Gbps = min(NIC ports, PCIe usable). E.g. XL710→~50G, E810→~100–200G.
② Subscribers it can track	CGNAT/session table (CPU + RAM) and a per-node map ceiling	The session/CGNAT table is sized to hold the throughput-driven count with headroom; a hard per-node map ceiling of ~131,000 subs only bites above ~400G.

Effective subscribers = min( ① NIC usable Gbps ÷ busy-hour Mbps , 131,000 map ceiling ). At an assumed ~3 Mbps busy-hour average per active subscriber, a 100G node carries ~33k subs and a 2×100G node ~64k — the NIC + PCIe bus is the practical ceiling, and the CGNAT/session table is sized to match it. The data plane forwards in XDP at ~one core per ~55 Gbps, so CPU is not the wall on a right-sized server. Tune the 3 Mbps to your own busy-hour data.

Size your node — NIC card × CPU → subscribers

Capacity = min( NIC usable Gbps ÷ busy-hour Mbps , CPU data-plane ceiling , ~131k per-node map limit ). Everything is editable; the math is shown.

NIC card CPU cores Busy-hour Mbps / sub Target subscribers (optional)

Figures are raw NIC ÷ busy-hour rate; published per-node headline numbers round down slightly for encapsulation overhead and N+1 headroom. The data plane forwards in XDP at ~1 core per ~55 Gbps; the per-subscriber control thread needs ~1 core per ~45k subs.

4 · Sample configurations (indicative — validate per deployment)

EDGE · 1U

Small PoP / edge BNG

CPU8-core Xeon-E / EPYC

RAM32 GB

NICIntel X710-DA2 (2×10G)

Moves~20 Gbps

Tracksup to ~6.7k subs

Power~250 W

~$2.5k · ~$0.37 / sub · ~27 subs/W

STANDARD · 1U

Mainstream 100G node

CPU16-core Xeon-SP / EPYC

RAM64 GB

NICIntel E810-CQDA2 (2×100G)

Moves~100 Gbps

Tracksup to ~33k subs

Power~380 W

~$5k · ~$0.15 / sub · ~87 subs/W

HIGH-DENSITY · 2U

Aggregation / consolidation

CPU2× 16–24c (dual socket)

RAM128 GB

NIC2× E810 (up to 200G)

Moves~200 Gbps

Tracksup to ~64k subs

Power~650 W (dual PSU)

~$9k · ~$0.14 / sub · ~98 subs/W

SMARTNIC · 2U

Lowest latency + HW offload

CPU24-core (frees cores)

RAM128 GB

NICConnectX-6 Dx + ASAP²

Moves~200 Gbps

Tracksup to ~64k subs

Power~600 W

~$10k · ~$0.16 / sub · HW conntrack offload

Power efficiency — subscribers per watt (higher is better)

Whole-node draw ÷ subscribers tracked. Density improves efficiency: the 2U consolidation node carries the most subs per watt.

Edge 1U · X710

~27 subs/W

~250 W → 6.7k

Standard 1U · E810

~87 subs/W

~380 W → 33k

High-density 2U · 2×E810

~98 subs/W

~650 W → 64k

SmartNIC 2U · CX-6 Dx

~107 subs/W

~600 W → 64k

Traffic the node can move (forwarding ceiling, Gbps)

min(NIC ports, PCIe bus). Note the XL710 bus cap vs the 100G-class cards.

X710-DA2 (2×10G)

~20G

NIC-bound

XL710 (2×40G)

~50G

bus-bound

E810-CQDA2 (2×100G)

~100G

1U / single

2× E810 / CX-6 Dx

~200G

2U / dual

5 · Latency & the SmartNIC option

All four NICs run the XDP data plane at low microsecond latency. Two levers matter for tail latency under load: queue count / steering (E810 ADQ and ConnectX flow steering isolate latency-sensitive traffic) and hardware offload. The ConnectX-6 Dx can offload stateful connection tracking in hardware via ASAP² (up to ~8M rules) — moving CGNAT/firewall flow state into the NIC, cutting CPU and trimming latency on the hottest flows.

i40e · X710/XL710

Proven & cheap

Rock-solid 10–40G, lowest card cost, mature driver. Great for edge nodes where 100G is overkill.

ice · E810

Mainstream 100G

Many queues, ADQ traffic isolation, PCIe Gen4 — the default choice for new 100G BNG nodes.

mlx5 · ConnectX-6 Dx

Latency + offload

Lowest latency, best AF_XDP, and HW conntrack/NAT offload for the densest, leanest-CPU builds.

6 · 3-Year TCO — a 50,000-subscriber example

Hardware is only the entry fee. Over a typical 3-year refresh, power + cooling, rack space, optics and the operational surface of every extra box dominate. Below is a directional total-cost comparison for 50,000 FTTH subscribers with CGNAT: a consolidated BNGSOFT build versus the common router-appliance design (≈16 NAS routers + a separate CGNAT tier).

3-year cost line (50k subs)	BNGSOFT — consolidated	Appliance fleet (router + CGNAT tier)
Build	3 × 1U E810 node (2 active @ ~33k + N+1)	16 × CCR2216 NAS + ~3-node CGNAT cluster
Hardware (CapEx)	~$15,000	~$52,000
Software license	+ BNGSOFT license (per quote)	RouterOS included
Power + cooling (3 yr)*	~$5,400 (~1.1 kW)	~$13,000 (~2.7 kW)
Rack space	3U	~18U
Boxes to operate & spare	3	~19
HW + power subtotal (ex-license)	~$20,000 (~$0.41 / sub)	~$65,000 (~$1.30 / sub)

3-year hardware + power for 50k subscribers (ex-license)

Lower is better. The consolidated build frees ~$45k of headroom — typically more than covers the software license while still winning on TCO, rack and ops.

Appliance fleet

~$65k · ~18U · 19 boxes

~$1.30/sub

BNGSOFT consolidated

~$20k + license · 3U · 3 boxes

~$0.41/sub

The license is the variable — and consolidation pays for it. Even before software, the consolidated build runs ~$45k less in hardware + power over 3 years for 50k subscribers, in 3U instead of ~18U, with 3 boxes to manage and spare instead of ~19 — and no second hardware tier just for CGNAT. Plug your BNGSOFT quote into the headroom and compare full TCO.

7 · Why this beats a fixed appliance

Hardware strategy · operator value

~$0.14–0.16 per subscriber on 100G+ nodesMainstream 100G and dense 2U builds land near ~$0.15/sub HW; edge 10G nodes cost more per sub but suit small PoPs. Densify to lower both $/sub and watts/sub.

⚡

Buy the NIC the site needs10G edge to 200G aggregation on the same software — no forklift, no proprietary line cards.

▲

Scale by adding commodity nodesLinear scale-out; each node is a standard server you already know how to buy, rack and spare.

◇

Future-proof to SmartNIC/DPUMove to ConnectX-6 Dx ASAP² hardware offload when you want lower CPU and latency — same platform.

⊞

Fewer watts per subscriberHigh-density 2U consolidation reaches ~98 subs/W (SmartNIC ~107) — less power, cooling and rack per customer served.

◎

One model to size everythingmin(traffic ceiling, subscriber ceiling) gives a defensible node spec for any PoP in minutes.

The bottom line

Size a BNGSOFT node against two ceilings — traffic (NIC + PCIe) and the per-node map limit (~131k) — and take the smaller. At ~3 Mbps busy-hour, a 1U E810 node moves ~100G and carries ~33k subs at ~$5k and ~87 subs/W; a 2U dual-100G node reaches ~200G, ~64k subs and ~98 subs/W. Add a ConnectX-6 Dx when you want hardware conntrack offload and the lowest latency.

Right card, right form factor, right node — ~$0.15 per subscriber on hardware you already buy.

Sources & honest framing: This is a sizing and TCO guide, not a benchmark report; all subscriber, throughput, power and cost figures are indicative engineering estimates that depend on CPU, RAM, NIC, traffic mix, per-subscriber busy-hour rate and enabled features, and must be validated per deployment. NIC specifications: Intel X710-DA2 — PCIe Gen3 ×8, typ. ~3.3–4.5 W (Intel ARK, ServeTheHome); Intel XL710 — 40GbE, PCIe Gen3 ×8 (Intel datasheet); Intel E810-CQDA2 — 2×100G, PCIe Gen4 ×16, ~15.4 W idle to ~20.8 W with optics (Intel ARK, ServeTheHome); NVIDIA ConnectX-6 Dx — 2×100G/200G, PCIe Gen4, ASAP² hardware connection-tracking offload up to ~8M rules, street price ~US$965–1,400 (NVIDIA datasheet, FS.com). NIC street prices and server costs are approximate retail (June 2026) and exclude optics, software licensing and operational costs. PCIe usable bandwidth: Gen3 ×8 ≈ 63 Gbps raw (~50 Gbps usable), Gen4 ×16 ≈ 252 Gbps raw (~200 Gbps usable) — the XL710 2×40G bus-cap and the multi-100G ceiling reflect this and match BNGSOFT production measurements. Per-node subscriber capacity is throughput-driven — NIC usable line rate ÷ the busy-hour per-subscriber rate (e.g. 2×100G ÷ ~3 Mbps ≈ ~64k), capped by a hard ~131,000-subscriber per-node map ceiling. The data plane forwards in XDP at roughly one core per ~55 Gbps, so a right-sized server is NIC/PCIe-bound, not CPU-bound, at residential busy-hour rates. The ~3 Mbps busy-hour per-subscriber figure is a planning assumption and should be replaced with your own measured busy-hour average; BNGSOFT XDP forwarding cost (~2.5% CPU at production load) and the ~131k map ceiling are from BNGSOFT deployment data. Power-efficiency (subs/W) and cost-per-subscriber are derived from the indicative whole-node figures above. 3-year TCO example assumptions: 50,000 subscribers over 36 months; electricity ~$0.12/kWh with a ~1.5× cooling/PUE multiplier (~$4,700 per continuous kW over 3 years); BNGSOFT build = 3 × 1U E810 nodes (2 active @ ~33k + N+1); appliance fleet = 16 × MikroTik CCR2216 NAS (max power 80–121 W — MikroTik docs; ~16 × CCR2216 @ 5k users for 50k per the carrier-BNG design referenced in our MikroTik comparison) plus a ~3-node CGNAT cluster; hardware and power are directional and exclude the BNGSOFT software license (shown as a separate line — request a quote) and any optics, cabling, support and rack/colo fees, which apply to both sides. MikroTik®, Intel®, NVIDIA®/Mellanox® and respective product names are trademarks of their owners; BNGSOFT is not affiliated with them. Prepared as a hardware-planning overview for operators.