Self-Healing CGNAT · pool health scoring · automatic migration · no flapping
CGNAT Operations Brief · Self-Healing Pool Steering

CGNAT pool IPs that get pressured automatically shed subscribers to healthier ones

A CGNAT pool has multiple public IP addresses. Over time, pressure distributes unevenly: one pool IP accumulates more subscribers, more sessions, more port-block demand — and subscribers behind it start hitting allocation failures and drops. bngxdpd continuously scores each pool IP's CGNAT health from live data-plane signals and automatically migrates subscribers off pressured IPs onto healthier ones — with hysteresis to prevent flapping. The pool rebalances itself. Default-off, operator-controlled.
Live scoring
per pool IP, continuous
alloc-fail rate · block-limit hits · drops · 0–100 health score
Auto-migrate
pressured → healthy
subscribers move off degraded pool IPs before failures compound
Hysteresis
no flapping
subscribers pinned on new IP until health delta justifies another move
Default-off
observe → enforce
watch scores and predicted moves before enabling enforcement
CGNAT pools are not static. Under normal traffic churn, pressure concentrates. Self-healing steering detects concentration as it builds — and moves subscribers before they start seeing failures.

The problem: uneven pressure inside a CGNAT pool

A CGNAT pool has N public IP addresses, each with a fixed port budget. Subscribers are distributed across those IPs — but not uniformly, and not stably. Traffic patterns shift. Specific subscriber behaviours (high session counts, P2P, large connection fan-out) concentrate pressure on particular pool IPs. When a pool IP gets oversubscribed, subscribers behind it hit port-block allocation failures and packet drops.

Pressure builds

One pool IP accumulates too many high-session subscribers

  • Port blocks fill up faster than they expire
  • Alloc-fail rate climbs — new connections refused
  • Drops start appearing on that pool IP
Subscribers affected

Connection failures for subscribers on the pressured IP

  • App connections refused silently (alloc-fail = no port mapping)
  • High-session apps (streaming, gaming, P2P) hit limits first
  • Support tickets arrive — but the cause is invisible without tooling
Self-healing response

Pressured subscribers migrate to healthy IPs

  • Health score drops below threshold → migration eligible
  • Subscribers pinned on new IP with hysteresis
  • Original IP pressure falls; healthy IP absorbs the load

The health score: what it measures

Each pool IP receives a 0–100 CGNAT health score, computed continuously from three internal data-plane signals. All three are measured natively by bngxdpd as part of normal CGNAT operation — no additional instrumentation.

SignalWhat it measuresWhy it matters
Alloc-fail ratePort-block allocation failures per second on this pool IP — subscribers trying to open a new port block and failingDirect evidence that subscribers behind this IP are being refused new connections
Block-limit hitsRate at which subscribers on this pool IP hit their per-subscriber port-block ceilingIndicates per-subscriber pressure — the IP may have enough total ports but subscribers are individually capped out
Packet dropsNAT-side drops on this pool IP — packets discarded because no port mapping was availableThe hard consequence: packets that actually failed to transit CGNAT, directly impacting subscriber connectivity
Pool health per IP — bngxdpctl cgnat reputation
====== CGNAT Pool Health ================== pool 0 (12 public IPs · 1,240 subscribers) IP score status alloc-fail/s blk-hits/s drops/s 49.238.60.114 97 healthy 0.0 0 0 49.238.60.115 94 healthy 0.1 0 0 49.238.60.116 91 healthy 0.0 1 0 49.238.60.117 88 healthy 0.2 2 0 49.238.60.120 61 pressured 3.4 18 1 49.238.60.122 34 degraded 9.1 47 8 △ steering eligible: 22 subscribers (enforce mode: would migrate) 49.238.60.123 96 healthy 0.0 0 0 ===========================================
Pool health at a glance. Two IPs are under pressure (scores 61 and 34); 22 subscribers are eligible for migration. In observe mode this is advisory. In enforce mode, migration would begin automatically.

How steering works

1. Score each pool IP continuously

The daemon aggregates per-pool-IP alloc-fail rate, block-limit-hit rate, and drop rate into a 0–100 health score on a rolling basis. No operator action needed.

2. Identify migration candidates

Subscribers on pool IPs below the pressure threshold are flagged as steering-eligible. The scheduler selects candidates using a load-aware ranking to avoid moving everyone at once.

3. Migrate with hysteresis

Eligible subscribers are remapped to a healthier pool IP and pinned there. Hysteresis prevents oscillation: a subscriber will not be moved again until the health delta across available IPs justifies another move.

Hysteresis prevents flapping. A naive load-balancer can thrash: it moves subscribers, the source IP recovers, and then it moves them back — in a loop. bngxdpd's steering uses a configurable hold-down period after each migration. A subscriber is pinned on their new IP until either the hold expires or the health differential across available IPs exceeds a stability threshold. The pool converges rather than oscillating.

Operator commands

View pool health and configure steering
# View per-pool-IP health scores (read-only, safe any time) bngxdpctl cgnat reputation # View current steering configuration and pending migrations bngxdpctl cgnat steering # Enable enforcement (default is observe — scores but does not migrate) bngxdpctl cgnat steering set enforce # Return to observe mode bngxdpctl cgnat steering set observe
Observe mode is always available — operators can watch scores build and verify the migration decisions are sensible before enabling enforcement. Enforcement can be toggled live without restart.

What this is — and what it is not

Scope: internal CGNAT-side health only. The health score reflects the CGNAT data plane's own performance metrics — allocation failures, block-limit hits, and drops as measured inside the BNG. It does not measure external or destination reputation: it does not know whether a website, CDN, or game network has blocklisted a pool IP, does not read CAPTCHA responses, and does not detect whether traffic to external destinations is succeeding or failing. The goal is to keep subscribers on well-performing pool IPs from a CGNAT resource perspective. Operators who want to isolate abusive subscribers from the shared pool to protect its external reputation should see the Abuser Auto-Isolation brief instead.

What it does

Scores each pool IP by its internal CGNAT resource health (alloc-fail rate, block-limit hits, drops). Migrates subscribers off IPs that are degrading under port-block pressure. Prevents pressure from compounding into widespread allocation failures.

What it does not do

Does not read external blocklist data. Does not detect when a third-party service has blocked a pool IP. Does not score based on traffic quality to external destinations, CAPTCHA rates, or any signal from outside the BNG.

Default-off. Operator-controlled.

Both the scoring and the migration capability are designed for incremental operator adoption:

ModeWhat happensHow to enable
Off (default)No scoring, no migration. Standard CGNAT assignment unchanged.Default state on first enable of the feature block
ObserveHealth scores computed and displayed via bngxdpctl cgnat reputation. Migration candidates identified and logged. No actual subscriber remapping.bngxdpctl cgnat steering set observe
EnforceAutomatic migration of eligible subscribers from pressured/degraded IPs to healthy ones, with hysteresis.bngxdpctl cgnat steering set enforce
No extra hardware, no external dependency. Self-healing steering runs entirely inside the bngxdpd daemon, reading per-pool-IP counters that the XDP data path already produces. The migration is a CGNAT assignment update — the same mechanism that assigns a subscriber to a pool IP on first connection. There is no separate orchestration service, no external database, and no additional licence.
The self-healing cycle
Score pool IPs alloc-fail rate block-limit hits · drops Detect pressure score < threshold flag migration candidates Migrate + pin move to healthier IP hold-down timer starts Stabilise hysteresis prevents oscillation continuous — scores update on every measurement cycle
The cycle runs continuously. Scoring is always-on in observe or enforce mode. Migration decisions are taken when a pool IP's score drops below the configurable pressure threshold.

CGNAT pools that rebalance themselves

Uneven pressure inside a CGNAT pool is a slow-building problem. It shows up as intermittent connection failures, support tickets about "certain apps not working," and alloc-fail counters that grow quietly in the background. By the time it's visible, subscribers have already had a degraded experience.

Self-healing steering addresses it before it compounds: pool IPs are scored continuously, pressure is detected while the score is dropping, and subscribers are moved to healthier IPs before allocation failures become widespread. The pool stays balanced without operator intervention.

Want to see the scoring in action? We'll show bngxdpctl cgnat reputation against a live pool, explain the scoring weights, and demonstrate observe-vs-enforce mode.

Honest framing: This is an operations brief; no throughput or price figures are claimed. The pool health scoring measures three internal CGNAT data-plane signals per pool IP: port-block allocation-failure rate, per-subscriber block-limit hit rate, and NAT-side packet drop rate. These are combined into a 0–100 health score. In enforce mode, subscribers on pool IPs whose score falls below a configurable pressure threshold are migrated to healthier pool IPs, with a hold-down (hysteresis) period after each move to prevent oscillation. The system does not measure external reputation, third-party blocklist status, destination-side traffic quality, CAPTCHA rates, or any signal from outside the BNG. It scores internal CGNAT resource health only. The feature is default-off. Observe mode (scoring and candidate identification without migration) is available separately from enforce mode. Both modes can be toggled live without daemon restart. Sample terminal outputs are representative of the format; counter values are illustrative. Related briefs: Per-Subscriber CGNAT X-Ray, Abuser Auto-Isolation, CGNAT Arena Engine, Full CGNAT.