Autonomous QoE · per-subscriber learned baseline · anomaly detection · self-driving remediation

Engine Brief · Self-Driving Subscriber Experience

Autonomous QoE — the BNG learns each subscriber's normal, and fixes the ones that drift

A fixed threshold is wrong for everyone. “5% loss is bad” flags a fibre line that should never lose a packet far too late, and cries wolf on a fixed-wireless link where 3% is a calm Tuesday. BNGSOFT Autonomous QoE throws the fixed threshold away: it learns each subscriber's own normal — loss, access & transit latency, experience score — as a rolling baseline, flags the ones that are worse than their own normal, and — with hysteresis — autonomously leans on the bulk flows of the subscribers that are actually degrading. No thresholds to tune. No dashboard to watch. Observe-first and default-off.

Flow Intelligence answered “is this subscriber suffering?” Autonomous QoE answers the next question by itself: “which subscribers just got worse than they normally are — and lean on those, now, without me touching a thing.”

Per-sub baseline

learned, not configured

each line's own normal
loss · RTT · QoE

Anomaly, not threshold

“worse than ITS normal”

z-score deviation catches
the quiet degradations

Self-driving

hysteresis · auto-decay

leans on the ones that drift,
relaxes when they recover

Observe-first

default-off · no DP risk

learns & recommends before
it ever changes a packet

1 · Why a fixed threshold is the wrong tool

Every QoE alert today is built on a number an operator typed in: loss > X%, latency > Y ms. That number is a compromise that fits no one. Set it tight and a noisy fixed-wireless cell pages you all night; set it loose and a fibre customer whose line quietly slid from 0.2% to 1.5% loss — a fourfold degradation they can absolutely feel on a video call — sails straight under it. The threshold can't tell the difference between a line that's always a bit lossy and a line that just got worse, because it has no idea what “normal” means for that specific subscriber.

The gap: a subscriber's experience is only meaningful relative to their own history. A 30 ms access RTT is fine for one line and a red flag for another. Until the BNG knows each subscriber's own baseline, every threshold is simultaneously too loud and too quiet — and a human still has to watch the dashboard and decide who to help.

2 · The idea — learn the normal, act on the deviation

Autonomous QoE is a control loop that sits on top of the Flow Intelligence engine. Flow Intelligence already measures, per subscriber and passively in the XDP data plane, the real signals — download loss, access vs transit RTT, and a 0–100 experience score. Autonomous QoE turns those measurements into a self-driving loop: learn → detect → act → relax.

ILLUSTRATIVE Learn each subscriber's own baseline. Detect when a sub is worse than its own normal, by z-score — not a fixed number. Act by leaning harder on that sub's bulk flows (only after a sustained run, never on one noisy tick). Relax automatically the moment it recovers.

3 · The three pillars

LEARN

Each subscriber's own normal.

A rolling EWMA baseline per subscriber for loss%, access RTT, transit RTT and QoE score — plus the variance of each.
Learned entirely from the signals Flow Intelligence already measures — no new probes, no per-subscriber configuration.
Cold-start safe: a brand-new subscriber is judged against a fleet-wide baseline until it has enough of its own history, so it never throws false alarms on day one.

DETECT

Worse than its normal — by z-score.

Anomaly = how many standard deviations a subscriber's current loss / RTT / QoE sits from its own learned baseline.
Catches the quiet degradations a fixed threshold misses — the fibre line that slid 0.2% → 1.5%, invisible to a 5% rule but obvious against its own history.
Each signal is judged only when it was actually measured, so a line with no latency sample this moment never skews its own baseline.

ACT & RELAX

Lean on the drifters — automatically.

A subscriber that stays anomalous for several consecutive intervals (hysteresis — never one noisy tick) earns a graduated push: its bulk/elephant flows are demoted harder so its interactive traffic recovers.
The push is scaled by how far off normal it is, and decays the moment it recovers — back to baseline, no human action.
It only ever raises aggressiveness above the operator baseline; it never drops below your floor, and it reverts on reload.

4 · What it helps

NOC / OPERATIONS

Stop tuning thresholds

No per-cohort “what loss% counts as bad here?” ever again.

The box learns the right normal for every line by itself — fibre, fixed-wireless, congested cell — and only surfaces the ones that genuinely drifted.

SUBSCRIBER EXPERIENCE

Caught before the call

A quiet degradation gets leaned on while it is still small.

The subscriber whose line just got worse than usual has their bulk flows demoted automatically, so the game or call stays smooth — often before they'd think to complain.

SELF-DRIVING

No human in the loop

Detect → act → relax runs on its own, every few seconds.

Hysteresis stops it flapping, the push decays on recovery, and everything stays layered on the operator baseline — autonomy with a floor you set.

Flow Intelligence measures and scores; Autonomous QoE decides and acts. Together the BNG doesn't just tell you a subscriber is suffering — it learns what “suffering” means for that specific line and quietly does something about it, at line rate, with no appliance and no operator threshold.

5 · How it works — under the hood

The whole loop is a userspace control layer in the daemon — it reads the per-subscriber signals Flow Intelligence already computes and writes a single per-subscriber demotion hint. It adds no new data-plane code and no new fast-path cost: the measurement was already happening; this is the intelligence on top.

Baseline & anomaly

For every subscriber the daemon keeps a slow exponentially-weighted moving average and variance of each metric. The anomaly score is the positive, degradation-direction z-score — (current − baseline) / stddev — taking the worst across loss, access RTT, transit RTT and QoE. A subscriber is “anomalous” when that exceeds a small multiple of its own deviation and it has learned enough history; before then it is measured against a fleet baseline so fresh lines stay quiet. Each metric is fed only when it carried a real sample, so an unmeasured signal can never teach the baseline a false “normal.”

Hysteresis & the autonomous push

Detection alone would flap. So a subscriber must be anomalous for several consecutive intervals before it earns a push, and the push is a graduated demotion boost scaled by how far off normal it is. That boost folds into the same per-subscriber remediation the loss engine already drives — one decision per subscriber, taking whichever signal says “lean harder” — so the two never fight. The instant the subscriber's streak breaks, the boost is withdrawn and the line decays back to its operator baseline.

$ bngxdpctl fi baseline
  mode         : enforce
  warmed baselines : 1377
  anomalous (now)  : 14
  enforced (boost) : 2
  — most-anomalous —
  ppp967  z=62  txRTT 510ms (base 18)  QoE 0 (base 97)One glance: the box has learned 1,377 baselines, 14 subscribers are worse than their own normal right now, and 2 have drifted long enough to be acted on — here, a line whose transit latency jumped 28× over its own baseline.

6 · Safety & deployment

Default-OFF, observe-first, no new data-plane risk. The baseline layer is pure userspace intelligence on top of measurements that already exist — it adds nothing to the fast path. It runs in observe (learn & recommend, never touch a packet) until you explicitly turn on enforce, and even then it only ever raises aggressiveness above your operator baseline, with hysteresis and automatic decay. Turn it off and the data plane is exactly as it was.

Stage	What it does	Risk
off	nothing — no baselines kept	none
observe	learns every subscriber's baseline, flags anomalies, recommends — never changes a packet	none — read-only
enforce	sustained-anomalous subscribers get a graduated, decaying demotion boost on their bulk flows, layered on the operator baseline	bounded — hysteresis + ceiling + auto-revert

Maturity — honest framing. The learned-baseline and anomaly layer is live and observing on production-representative hardware; the autonomous-enforce loop runs with hysteresis, a bounded boost ceiling and automatic decay, and reverts on reload. As with any closed loop, validate in observe and read the recommendations before enabling enforce on a given deployment. Thresholds (learning rate, anomaly sensitivity, the sustained-interval count) are sensible defaults and tunable per network.

Stop tuning thresholds. Let the BNG learn each subscriber's normal — and fix the ones that drift.

Autonomous QoE turns Flow Intelligence's measurements into a self-driving loop: it learns every subscriber's own baseline, detects who just got worse than they normally are, and leans on exactly those lines — with hysteresis, automatic decay, and a floor you set. No probes. No per-subscriber config. No dashboard to watch. Observe-first, default-off.

Turn on observe today, read who the box says is drifting, and enable enforce when the recommendations earn your trust.

Sources & honest framing: This is an engine brief, not a benchmark report. BNGSOFT Autonomous QoE builds on the BNGSOFT Flow Intelligence engine (passive per-flow loss and access/transit RTT measured in the XDP data plane). The baseline layer maintains per-subscriber exponentially-weighted moving averages and variances and scores deviation by z-score against each subscriber's own history, with a fleet-wide baseline for cold-start; anomaly is detected only on metrics that carried a real sample. The autonomous-enforce path requires a sustained run of anomalous intervals (hysteresis), applies a graduated demotion boost scaled by deviation that folds into the existing per-subscriber remediation, is bounded by a maximum and decays on recovery, and only ever raises aggressiveness above the operator baseline. It is observe-first and default-off; everything reverts on reload. Default parameters (learning rate, anomaly sensitivity, sustained-interval count, fleet warmup) are engineering defaults and are tuned per deployment. The example fi baseline snippet is illustrative of the output format, not a benchmark; no specific performance figures are claimed. Related per-topic briefs (Flow Intelligence, L4S/AQM, Interactive Flow Protection, QoS, NOC2) are available alongside this guide.