XDP BNG · CGNAT — Hardware Sizing · NIC Selection · Power & Cost-per-Subscriber
Deployment Guide · Hardware Sizing & TCO
Pick the Right Server & NIC: Throughput, Subscribers, Power and Cost per Subscriber
Because the BNGSOFT data plane runs in XDP on commodity servers, your capacity is set by three things: the NIC, the PCIe bus, and CPU/RAM for session state — not by a proprietary chassis. This guide compares Intel X710 / XL710 / E810 and NVIDIA ConnectX-6 Dx across 1U and 2U builds, and gives a simple model to size a node to a target subscriber count and traffic.
A BNG node has two ceilings: how much traffic it can move (NIC + PCIe bus) and how many subscribers it can track (CPU + RAM for the session/NAT tables). The right build balances both — and the densest balanced node wins on watts and dollars per subscriber.
1 · The NICs — X710, XL710, E810 and ConnectX-6 Dx
For an XDP BNG the NIC choice drives line rate, latency, queue count and how much you can offload. All four are well-supported by mature Linux drivers (Intel i40e/ice, NVIDIA mlx5) and run XDP natively.
Adapter
Speed
PCIe
Typ. power*
Driver
Street price*
Best for
Intel X710-DA2
2 × 10G
Gen3 ×8
~3.3–4.5 W
i40e
~$200–300
Edge / small PoP nodes
Intel XL710-QDA2
2 × 40G (bus-capped)
Gen3 ×8
~7–8 W
i40e
~$300–450
20–40G nodes; mind the bus
Intel E810-CQDA2
2 × 100G
Gen4 ×16
~15–21 W
ice
~$700–900
Mainstream 100G, many queues / ADQ
NVIDIA ConnectX-6 Dx
2 × 100G / 200G
Gen4 ×16
~16–22 W
mlx5
~$965–1,400
Lowest latency + HW conntrack offload (ASAP²)
Watch the PCIe bus, not just the port speed. An XL710 2×40G sits on PCIe Gen3 ×8 (~63 Gbps raw, ~50 Gbps usable) — so it cannot move a full 80G bidirectionally; the bus caps it near ~50G aggregate. 100G-class cards need PCIe Gen4 ×16 to deliver their ports. This is the most common real-world bottleneck — we have measured it on production nodes.
Port speed only matters if the bus can carry it. Size the slot (PCIe generation × lanes) to the NIC — or the bus becomes your throughput ceiling.
2 · 1U vs 2U — what the form factor buys you
1U
Edge & mainstream nodes
PCIe slots (usable)1–2 low-profile
Sockets1 (sometimes 2)
Typical NICX710 / XL710 / one E810
Power draw~250–450 W
Cooling / noisetight, high-RPM
Best for distributed PoP/edge: one NIC, up to ~100G, lowest rack cost per node.
2U
High-density aggregation
PCIe slots (usable)2–4 full-height ×16
Sockets1–2 (more cores/RAM)
Typical NIC2× E810 or ConnectX-6 Dx
Power draw~450–800 W (dual PSU)
Cooling / noisebetter thermals, redundancy
Best for consolidation: 200G, more session memory, dual-PSU resilience, best subs/watt.
Rule of thumb. 1U = the cheapest way to put a 100G XDP BNG at the edge. 2U = the cheapest way to serve a lot of subscribers per rack-unit and per watt, with PSU redundancy and room for dual-100G or a SmartNIC.
3 · The sizing model — a node has two ceilings
Size every node against both limits and take the smaller:
Ceiling
Set by
How to estimate
① Traffic it can move
NIC line rate ∧ PCIe bus
Gbps = min(NIC ports, PCIe usable). E.g. XL710→~50G, E810→~100–200G.
② Subscribers it can track
CGNAT/session table (CPU + RAM) and a per-node map ceiling
The session/CGNAT table is sized to hold the throughput-driven count with headroom; a hard per-node map ceiling of ~131,000 subs only bites above ~400G.
Effective subscribers = min( ① NIC usable Gbps ÷ busy-hour Mbps , 131,000 map ceiling ). At an assumed ~3 Mbps busy-hour average per active subscriber, a 100G node carries ~33k subs and a 2×100G node ~64k — the NIC + PCIe bus is the practical ceiling, and the CGNAT/session table is sized to match it. The data plane forwards in XDP at ~one core per ~55 Gbps, so CPU is not the wall on a right-sized server. Tune the 3 Mbps to your own busy-hour data.
Size your node — NIC card × CPU → subscribers
Capacity = min( NIC usable Gbps ÷ busy-hour Mbps , CPU data-plane ceiling , ~131k per-node map limit ). Everything is editable; the math is shown.
Figures are raw NIC ÷ busy-hour rate; published per-node headline numbers round down slightly for encapsulation overhead and N+1 headroom. The data plane forwards in XDP at ~1 core per ~55 Gbps; the per-subscriber control thread needs ~1 core per ~45k subs.
4 · Sample configurations (indicative — validate per deployment)
EDGE · 1U
Small PoP / edge BNG
CPU8-core Xeon-E / EPYC
RAM32 GB
NICIntel X710-DA2 (2×10G)
Moves~20 Gbps
Tracksup to ~6.7k subs
Power~250 W
~$2.5k · ~$0.37 / sub · ~27 subs/W
STANDARD · 1U
Mainstream 100G node
CPU16-core Xeon-SP / EPYC
RAM64 GB
NICIntel E810-CQDA2 (2×100G)
Moves~100 Gbps
Tracksup to ~33k subs
Power~380 W
~$5k · ~$0.15 / sub · ~87 subs/W
HIGH-DENSITY · 2U
Aggregation / consolidation
CPU2× 16–24c (dual socket)
RAM128 GB
NIC2× E810 (up to 200G)
Moves~200 Gbps
Tracksup to ~64k subs
Power~650 W (dual PSU)
~$9k · ~$0.14 / sub · ~98 subs/W
SMARTNIC · 2U
Lowest latency + HW offload
CPU24-core (frees cores)
RAM128 GB
NICConnectX-6 Dx + ASAP²
Moves~200 Gbps
Tracksup to ~64k subs
Power~600 W
~$10k · ~$0.16 / sub · HW conntrack offload
Power efficiency — subscribers per watt (higher is better)
Whole-node draw ÷ subscribers tracked. Density improves efficiency: the 2U consolidation node carries the most subs per watt.
Edge 1U · X710
~27 subs/W
~250 W → 6.7k
Standard 1U · E810
~87 subs/W
~380 W → 33k
High-density 2U · 2×E810
~98 subs/W
~650 W → 64k
SmartNIC 2U · CX-6 Dx
~107 subs/W
~600 W → 64k
Traffic the node can move (forwarding ceiling, Gbps)
min(NIC ports, PCIe bus). Note the XL710 bus cap vs the 100G-class cards.
X710-DA2 (2×10G)
~20G
NIC-bound
XL710 (2×40G)
~50G
bus-bound
E810-CQDA2 (2×100G)
~100G
1U / single
2× E810 / CX-6 Dx
~200G
2U / dual
5 · Latency & the SmartNIC option
All four NICs run the XDP data plane at low microsecond latency. Two levers matter for tail latency under load: queue count / steering (E810 ADQ and ConnectX flow steering isolate latency-sensitive traffic) and hardware offload. The ConnectX-6 Dx can offload stateful connection tracking in hardware via ASAP² (up to ~8M rules) — moving CGNAT/firewall flow state into the NIC, cutting CPU and trimming latency on the hottest flows.
i40e · X710/XL710
Proven & cheap
Rock-solid 10–40G, lowest card cost, mature driver. Great for edge nodes where 100G is overkill.
ice · E810
Mainstream 100G
Many queues, ADQ traffic isolation, PCIe Gen4 — the default choice for new 100G BNG nodes.
mlx5 · ConnectX-6 Dx
Latency + offload
Lowest latency, best AF_XDP, and HW conntrack/NAT offload for the densest, leanest-CPU builds.
6 · 3-Year TCO — a 50,000-subscriber example
Hardware is only the entry fee. Over a typical 3-year refresh, power + cooling, rack space, optics and the operational surface of every extra box dominate. Below is a directional total-cost comparison for 50,000 FTTH subscribers with CGNAT: a consolidated BNGSOFT build versus the common router-appliance design (≈16 NAS routers + a separate CGNAT tier).
3-year cost line (50k subs)
BNGSOFT — consolidated
Appliance fleet (router + CGNAT tier)
Build
3 × 1U E810 node (2 active @ ~33k + N+1)
16 × CCR2216 NAS + ~3-node CGNAT cluster
Hardware (CapEx)
~$15,000
~$52,000
Software license
+ BNGSOFT license(per quote)
RouterOS included
Power + cooling (3 yr)*
~$5,400(~1.1 kW)
~$13,000 (~2.7 kW)
Rack space
3U
~18U
Boxes to operate & spare
3
~19
HW + power subtotal (ex-license)
~$20,000 (~$0.41 / sub)
~$65,000 (~$1.30 / sub)
3-year hardware + power for 50k subscribers (ex-license)
Lower is better. The consolidated build frees ~$45k of headroom — typically more than covers the software license while still winning on TCO, rack and ops.
Appliance fleet
~$65k · ~18U · 19 boxes
~$1.30/sub
BNGSOFT consolidated
~$20k + license · 3U · 3 boxes
~$0.41/sub
The license is the variable — and consolidation pays for it. Even before software, the consolidated build runs ~$45k less in hardware + power over 3 years for 50k subscribers, in 3U instead of ~18U, with 3 boxes to manage and spare instead of ~19 — and no second hardware tier just for CGNAT. Plug your BNGSOFT quote into the headroom and compare full TCO.
7 · Why this beats a fixed appliance
Hardware strategy · operator value
$
~$0.14–0.16 per subscriber on 100G+ nodesMainstream 100G and dense 2U builds land near ~$0.15/sub HW; edge 10G nodes cost more per sub but suit small PoPs. Densify to lower both $/sub and watts/sub.
⚡
Buy the NIC the site needs10G edge to 200G aggregation on the same software — no forklift, no proprietary line cards.
▲
Scale by adding commodity nodesLinear scale-out; each node is a standard server you already know how to buy, rack and spare.
◇
Future-proof to SmartNIC/DPUMove to ConnectX-6 Dx ASAP² hardware offload when you want lower CPU and latency — same platform.
⊞
Fewer watts per subscriberHigh-density 2U consolidation reaches ~98 subs/W (SmartNIC ~107) — less power, cooling and rack per customer served.
◎
One model to size everythingmin(traffic ceiling, subscriber ceiling) gives a defensible node spec for any PoP in minutes.
The bottom line
Size a BNGSOFT node against two ceilings — traffic (NIC + PCIe) and the per-node map limit (~131k) — and take the smaller. At ~3 Mbps busy-hour, a 1U E810 node moves ~100G and carries ~33k subs at ~$5k and ~87 subs/W; a 2U dual-100G node reaches ~200G, ~64k subs and ~98 subs/W. Add a ConnectX-6 Dx when you want hardware conntrack offload and the lowest latency.
Right card, right form factor, right node — ~$0.15 per subscriber on hardware you already buy.
Sources & honest framing: This is a sizing and TCO guide, not a benchmark report; all subscriber, throughput, power and cost figures are indicative engineering estimates that depend on CPU, RAM, NIC, traffic mix, per-subscriber busy-hour rate and enabled features, and must be validated per deployment. NIC specifications: Intel X710-DA2 — PCIe Gen3 ×8, typ. ~3.3–4.5 W (Intel ARK, ServeTheHome); Intel XL710 — 40GbE, PCIe Gen3 ×8 (Intel datasheet); Intel E810-CQDA2 — 2×100G, PCIe Gen4 ×16, ~15.4 W idle to ~20.8 W with optics (Intel ARK, ServeTheHome); NVIDIA ConnectX-6 Dx — 2×100G/200G, PCIe Gen4, ASAP² hardware connection-tracking offload up to ~8M rules, street price ~US$965–1,400 (NVIDIA datasheet, FS.com). NIC street prices and server costs are approximate retail (June 2026) and exclude optics, software licensing and operational costs. PCIe usable bandwidth: Gen3 ×8 ≈ 63 Gbps raw (~50 Gbps usable), Gen4 ×16 ≈ 252 Gbps raw (~200 Gbps usable) — the XL710 2×40G bus-cap and the multi-100G ceiling reflect this and match BNGSOFT production measurements. Per-node subscriber capacity is throughput-driven — NIC usable line rate ÷ the busy-hour per-subscriber rate (e.g. 2×100G ÷ ~3 Mbps ≈ ~64k), capped by a hard ~131,000-subscriber per-node map ceiling. The data plane forwards in XDP at roughly one core per ~55 Gbps, so a right-sized server is NIC/PCIe-bound, not CPU-bound, at residential busy-hour rates. The ~3 Mbps busy-hour per-subscriber figure is a planning assumption and should be replaced with your own measured busy-hour average; BNGSOFT XDP forwarding cost (~2.5% CPU at production load) and the ~131k map ceiling are from BNGSOFT deployment data. Power-efficiency (subs/W) and cost-per-subscriber are derived from the indicative whole-node figures above. 3-year TCO example assumptions: 50,000 subscribers over 36 months; electricity ~$0.12/kWh with a ~1.5× cooling/PUE multiplier (~$4,700 per continuous kW over 3 years); BNGSOFT build = 3 × 1U E810 nodes (2 active @ ~33k + N+1); appliance fleet = 16 × MikroTik CCR2216 NAS (max power 80–121 W — MikroTik docs; ~16 × CCR2216 @ 5k users for 50k per the carrier-BNG design referenced in our MikroTik comparison) plus a ~3-node CGNAT cluster; hardware and power are directional and exclude the BNGSOFT software license (shown as a separate line — request a quote) and any optics, cabling, support and rack/colo fees, which apply to both sides. MikroTik®, Intel®, NVIDIA®/Mellanox® and respective product names are trademarks of their owners; BNGSOFT is not affiliated with them. Prepared as a hardware-planning overview for operators.