Self-Healing CGNAT · pool health scoring · automatic migration · no flapping
CGNAT Operations Brief · Self-Healing Pool Steering
CGNAT pool IPs that get pressured automatically shed subscribers to healthier ones
A CGNAT pool has multiple public IP addresses. Over time, pressure distributes unevenly: one pool IP accumulates more subscribers, more sessions, more port-block demand — and subscribers behind it start hitting allocation failures and drops. bngxdpd continuously scores each pool IP's CGNAT health from live data-plane signals and automatically migrates subscribers off pressured IPs onto healthier ones — with hysteresis to prevent flapping. The pool rebalances itself. Default-off, operator-controlled.
subscribers move off degraded pool IPs before failures compound
Hysteresis
no flapping
subscribers pinned on new IP until health delta justifies another move
Default-off
observe → enforce
watch scores and predicted moves before enabling enforcement
CGNAT pools are not static. Under normal traffic churn, pressure concentrates. Self-healing steering detects concentration as it builds — and moves subscribers before they start seeing failures.
The problem: uneven pressure inside a CGNAT pool
A CGNAT pool has N public IP addresses, each with a fixed port budget. Subscribers are distributed across those IPs — but not uniformly, and not stably. Traffic patterns shift. Specific subscriber behaviours (high session counts, P2P, large connection fan-out) concentrate pressure on particular pool IPs. When a pool IP gets oversubscribed, subscribers behind it hit port-block allocation failures and packet drops.
Pressure builds
One pool IP accumulates too many high-session subscribers
Port blocks fill up faster than they expire
Alloc-fail rate climbs — new connections refused
Drops start appearing on that pool IP
Subscribers affected
Connection failures for subscribers on the pressured IP
App connections refused silently (alloc-fail = no port mapping)
High-session apps (streaming, gaming, P2P) hit limits first
Support tickets arrive — but the cause is invisible without tooling
Self-healing response
Pressured subscribers migrate to healthy IPs
Health score drops below threshold → migration eligible
Subscribers pinned on new IP with hysteresis
Original IP pressure falls; healthy IP absorbs the load
The health score: what it measures
Each pool IP receives a 0–100 CGNAT health score, computed continuously from three internal data-plane signals. All three are measured natively by bngxdpd as part of normal CGNAT operation — no additional instrumentation.
Signal
What it measures
Why it matters
Alloc-fail rate
Port-block allocation failures per second on this pool IP — subscribers trying to open a new port block and failing
Direct evidence that subscribers behind this IP are being refused new connections
Block-limit hits
Rate at which subscribers on this pool IP hit their per-subscriber port-block ceiling
Indicates per-subscriber pressure — the IP may have enough total ports but subscribers are individually capped out
Packet drops
NAT-side drops on this pool IP — packets discarded because no port mapping was available
The hard consequence: packets that actually failed to transit CGNAT, directly impacting subscriber connectivity
Pool health per IP — bngxdpctl cgnat reputation
====== CGNAT Pool Health ==================
pool 0 (12 public IPs · 1,240 subscribers)
IP score status alloc-fail/s blk-hits/s drops/s
49.238.60.11497healthy 0.0 0 0
49.238.60.11594healthy 0.1 0 0
49.238.60.11691healthy 0.0 1 0
49.238.60.11788healthy 0.2 2 0
49.238.60.12061pressured 3.4 18 1
49.238.60.12234degraded 9.1 47 8
△ steering eligible: 22 subscribers (enforce mode: would migrate)49.238.60.12396healthy 0.0 0 0
===========================================
Pool health at a glance. Two IPs are under pressure (scores 61 and 34); 22 subscribers are eligible for migration. In observe mode this is advisory. In enforce mode, migration would begin automatically.
How steering works
1. Score each pool IP continuously
The daemon aggregates per-pool-IP alloc-fail rate, block-limit-hit rate, and drop rate into a 0–100 health score on a rolling basis. No operator action needed.
2. Identify migration candidates
Subscribers on pool IPs below the pressure threshold are flagged as steering-eligible. The scheduler selects candidates using a load-aware ranking to avoid moving everyone at once.
3. Migrate with hysteresis
Eligible subscribers are remapped to a healthier pool IP and pinned there. Hysteresis prevents oscillation: a subscriber will not be moved again until the health delta across available IPs justifies another move.
Hysteresis prevents flapping. A naive load-balancer can thrash: it moves subscribers, the source IP recovers, and then it moves them back — in a loop. bngxdpd's steering uses a configurable hold-down period after each migration. A subscriber is pinned on their new IP until either the hold expires or the health differential across available IPs exceeds a stability threshold. The pool converges rather than oscillating.
Operator commands
View pool health and configure steering
# View per-pool-IP health scores (read-only, safe any time)
bngxdpctl cgnat reputation
# View current steering configuration and pending migrations
bngxdpctl cgnat steering
# Enable enforcement (default is observe — scores but does not migrate)
bngxdpctl cgnat steering set enforce
# Return to observe mode
bngxdpctl cgnat steering set observe
Observe mode is always available — operators can watch scores build and verify the migration decisions are sensible before enabling enforcement. Enforcement can be toggled live without restart.
What this is — and what it is not
Scope: internal CGNAT-side health only. The health score reflects the CGNAT data plane's own performance metrics — allocation failures, block-limit hits, and drops as measured inside the BNG. It does not measure external or destination reputation: it does not know whether a website, CDN, or game network has blocklisted a pool IP, does not read CAPTCHA responses, and does not detect whether traffic to external destinations is succeeding or failing. The goal is to keep subscribers on well-performing pool IPs from a CGNAT resource perspective. Operators who want to isolate abusive subscribers from the shared pool to protect its external reputation should see the Abuser Auto-Isolation brief instead.
What it does
Scores each pool IP by its internal CGNAT resource health (alloc-fail rate, block-limit hits, drops). Migrates subscribers off IPs that are degrading under port-block pressure. Prevents pressure from compounding into widespread allocation failures.
What it does not do
Does not read external blocklist data. Does not detect when a third-party service has blocked a pool IP. Does not score based on traffic quality to external destinations, CAPTCHA rates, or any signal from outside the BNG.
Default-off. Operator-controlled.
Both the scoring and the migration capability are designed for incremental operator adoption:
Mode
What happens
How to enable
Off (default)
No scoring, no migration. Standard CGNAT assignment unchanged.
Default state on first enable of the feature block
Observe
Health scores computed and displayed via bngxdpctl cgnat reputation. Migration candidates identified and logged. No actual subscriber remapping.
bngxdpctl cgnat steering set observe
Enforce
Automatic migration of eligible subscribers from pressured/degraded IPs to healthy ones, with hysteresis.
bngxdpctl cgnat steering set enforce
No extra hardware, no external dependency. Self-healing steering runs entirely inside the bngxdpd daemon, reading per-pool-IP counters that the XDP data path already produces. The migration is a CGNAT assignment update — the same mechanism that assigns a subscriber to a pool IP on first connection. There is no separate orchestration service, no external database, and no additional licence.
The self-healing cycle
The cycle runs continuously. Scoring is always-on in observe or enforce mode. Migration decisions are taken when a pool IP's score drops below the configurable pressure threshold.
CGNAT pools that rebalance themselves
Uneven pressure inside a CGNAT pool is a slow-building problem. It shows up as intermittent connection failures, support tickets about "certain apps not working," and alloc-fail counters that grow quietly in the background. By the time it's visible, subscribers have already had a degraded experience.
Self-healing steering addresses it before it compounds: pool IPs are scored continuously, pressure is detected while the score is dropping, and subscribers are moved to healthier IPs before allocation failures become widespread. The pool stays balanced without operator intervention.
Want to see the scoring in action? We'll show bngxdpctl cgnat reputation against a live pool, explain the scoring weights, and demonstrate observe-vs-enforce mode.
Honest framing: This is an operations brief; no throughput or price figures are claimed. The pool health scoring measures three internal CGNAT data-plane signals per pool IP: port-block allocation-failure rate, per-subscriber block-limit hit rate, and NAT-side packet drop rate. These are combined into a 0–100 health score. In enforce mode, subscribers on pool IPs whose score falls below a configurable pressure threshold are migrated to healthier pool IPs, with a hold-down (hysteresis) period after each move to prevent oscillation. The system does not measure external reputation, third-party blocklist status, destination-side traffic quality, CAPTCHA rates, or any signal from outside the BNG. It scores internal CGNAT resource health only. The feature is default-off. Observe mode (scoring and candidate identification without migration) is available separately from enforce mode. Both modes can be toggled live without daemon restart. Sample terminal outputs are representative of the format; counter values are illustrative. Related briefs: Per-Subscriber CGNAT X-Ray, Abuser Auto-Isolation, CGNAT Arena Engine, Full CGNAT.