Vulnerability Report: net/sched/sch_pie.c

Summary

Integer overflow in pie_calculate_probability() due to unvalidated alpha/beta parameters — arbitrary control of drop probability, leading to AQM bypass or total packet loss.

  • File: net/sched/sch_pie.c + net/sched/sch_pie.c (shared logic from include/net/pie.h)
  • Function: pie_calculate_probability() (line 304)
  • Root cause: No bounds checking on TCA_PIE_ALPHA / TCA_PIE_BETA netlink attributes in pie_change() (lines 178–182), combined with integer overflow in the probability update arithmetic.
  • Privilege required: CAP_NET_ADMIN in any network namespace (reachable by unprivileged users via unshare(1) -n).

Vulnerability Detail

Missing Input Validation (pie_change, lines 178–182)

if (tb[TCA_PIE_ALPHA])
    WRITE_ONCE(q->params.alpha, nla_get_u32(tb[TCA_PIE_ALPHA]));

if (tb[TCA_PIE_BETA])
    WRITE_ONCE(q->params.beta, nla_get_u32(tb[TCA_PIE_BETA]));

params->alpha and params->beta are u32. Any value in [0, U32_MAX] is accepted without validation. The code comment says “alpha and beta should be between 0 and 32” but this is never enforced.

Integer Overflow in pie_calculate_probability (lines 341–362)

alpha = ((u64)params->alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 4;
beta  = ((u64)params->beta  * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 4;

/* ... conditional right-shifts reduce alpha/beta further ... */

delta += alpha * (qdelay - params->target);   // u64 * u64 → u64, no overflow check
delta += beta  * (qdelay - qdelay_old);       // same

Step-by-step overflow with params->alpha = U32_MAX:

Step Expression Value
MAX_PROB / PSCHED_TICKS_PER_SEC (U64_MAX>>8) / 1e9 ≈ 72,057,594
(u64)U32_MAX * 72057594 pre-shift alpha ≈ 3.1 × 10¹⁷
>> 4 post-shift ≈ 1.94 × 10¹⁶
Additional >> 1 (prob < MAX_PROB/10 branch) ≈ 9.7 × 10¹⁵
Five >> 2 iterations (inner while loop, power ≤ 10⁶) ≈ 9.5 × 10¹²
alpha * (qdelay - target) where delay ≈ 250 ms (2.5×10⁸ ticks) overflows u64! wraps mod 2⁶⁴

U64_MAX ≈ 1.8 × 10¹⁹; the product 9.5×10¹² × 2.5×10⁸ ≈ 2.4×10²¹ overflows. The resulting u64 is arbitrary — the wrapped value is then implicitly converted to s64 for the delta += operation, producing a value that can be any sign and magnitude.

Consequences

The overflowed delta is then applied at line 379:

vars->prob += delta;

The overflow/underflow guards (lines 382–395) clamp vars->prob to MAX_PROB (if prob wrapped around upward) or 0 (if it wrapped downward). Whether prob lands at 0 or MAX_PROB depends on which bits survive the u64 wraparound — effectively making the drop probability attacker-controlled.

Attack scenario — AQM bypass (prob → 0):

  1. unshare -n to obtain a new network namespace (no privileges needed).
  2. Attach a PIE qdisc with TCA_PIE_ALPHA = U32_MAX, TCA_PIE_DQ_RATE_ESTIMATOR = 1.
  3. Under high-latency conditions, each timer tick overflows delta downward → prob underflows → clamped to 0.
  4. PIE drops no packets regardless of congestion. The queue fills to sch->limit, then every subsequent packet is hard-dropped at qdisc_qlen >= sch->limit (overlimit path) — this is correct tail-drop, but AQM is completely defeated.

Attack scenario — total packet black-hole (prob → MAX_PROB):

  1. Same setup; arrange overflow in the upward direction.
  2. Every subsequent call to pie_drop_early() returns true (prob ≥ MAX_PROB, and accu_prob >= (MAX_PROB/2)*17 immediately after a few packets).
  3. All new packets are dropped — effective DoS of any service bound to that interface.

Secondary Finding: Data Race in pie_dump_stats

pie_dump_stats() (line 500) reads q->vars.prob, q->vars.qdelay, and q->vars.avg_dq_rate without holding the qdisc root lock:

struct tc_pie_xstats st = {
    .prob  = q->vars.prob << BITS_PER_BYTE,   // no lock, no READ_ONCE
    .delay = ((u32)PSCHED_TICKS2NS(q->vars.qdelay)) / NSEC_PER_USEC,
    ...
};

pie_timer() (line 427) modifies these same fields while holding root_lock. On 32-bit kernels, 64-bit reads of prob/qdelay are non-atomic and can be torn, leaking a half-written kernel value into the netlink response sent to userspace.


Root Cause

pie_change()          – accepts any u32 for alpha/beta (no validation)
    ↓
pie_calculate_probability() – u64 × u64 multiplication overflows
    ↓
vars->prob              – clamped to MAX_PROB or 0 (arbitrary from attacker's perspective)
    ↓
pie_drop_early()        – drop probability is 0 (bypass) or MAX_PROB (blackhole)

Fix

1. Clamp alpha and beta in pie_change() to the documented range [0, 32]:

if (tb[TCA_PIE_ALPHA])
    WRITE_ONCE(q->params.alpha,
               min(nla_get_u32(tb[TCA_PIE_ALPHA]), 32U));

if (tb[TCA_PIE_BETA])
    WRITE_ONCE(q->params.beta,
               min(nla_get_u32(tb[TCA_PIE_BETA]), 32U));

2. Add READ_ONCE() guards in pie_dump_stats() to match the pattern already used in pie_dump(), and consider taking the root lock (or using a seqlock).