Signed Integer Overflow in bd_holder_disk.refcnt Leading to Use-After-Free

File

block/holder.c

Severity

Critical — Use-After-Free / Premature Object Destruction

Vulnerability Description

struct bd_holder_disk (line 8) uses a plain int for its reference counter:

struct bd_holder_disk {
    struct list_head    list;
    struct kobject      *holder_dir;
    int                 refcnt;        // ← signed 32-bit, no overflow protection
};

In bd_link_disk_holder(), every time the same (bdev, disk) pair is re-registered the counter is incremented without bounds checking (line 91):

holder = bd_find_holder_disk(bdev, disk);
if (holder) {
    kobject_put(bdev->bd_holder_dir);
    holder->refcnt++;        // ← no overflow check
    goto out_unlock;
}

In bd_unlink_disk_holder(), cleanup fires when the counter reaches zero (line 147):

if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
    del_symlink(disk->slave_dir, bdev_kobj(bdev));
    del_symlink(holder->holder_dir, &disk_to_dev(disk)->kobj);
    kobject_put(holder->holder_dir);   // ← releases bd_holder_dir kobject
    list_del_init(&holder->list);
    kfree(holder);                     // ← frees the holder struct
}

Root Cause

refcnt is int (signed 32-bit) rather than refcount_t. Signed integer overflow is undefined behavior per the C standard (C11 §6.5 ¶5) and the kernel provides refcount_t specifically to prevent this class of bug with saturation semantics.

Exploit Scenario

  1. An attacker with CAP_SYS_ADMIN uses device-mapper (drivers/md/dm.c) or MD RAID (drivers/md/md.c) — both of which call bd_link_disk_holder() — to repeatedly reload a device mapping against the same (bdev, disk) pair, incrementing refcnt 2 147 483 647 times.

  2. On the next call, refcnt++ causes signed integer overflow from INT_MAXINT_MIN (undefined behavior; two’s-complement wrap in practice). The holder is not cleaned up — the caller believes there are still outstanding holders.

  3. A racing or subsequent bd_unlink_disk_holder() call decrements from the wrapped negative value. Because !--holder->refcnt is evaluated, the counter will eventually pass through zero at a point in time that does not correspond to the true last unlink, causing:

    • Premature kobject_put(holder->holder_dir) — drops the reference to bdev->bd_holder_dir while other code still holds a logical reference to it. If this is the last reference, the kobject is freed → use-after-free on the still-live block_device.
    • Premature kfree(holder) — any in-flight code touching holder->holder_dir or holder->list after this point is a use-after-free.
  4. Because list_del_init() zeroes only the freed node’s own list pointers, later list traversals in bd_find_holder_disk() may not reach the stale node, but kobject references remain corrupted, leading to kernel heap corruption exploitable for privilege escalation.

Affected Callers

File Function
drivers/md/dm.c bind_mdev_to_target()
drivers/md/md.c md_add_new_disk()
drivers/md/bcache/super.c bcache_device_attach()
drivers/block/drbd/drbd_nl.c drbd_adm_attach()

All are reachable via unprivileged ioctl to /dev/mapper/*, /dev/md*, or /dev/bcache* given CAP_SYS_ADMIN (a common container escape primitive).

Proof-of-Concept (Logic)

// Repeatedly reload DM table with same slave device
// Each reload calls bd_link_disk_holder(slave_bdev, dm_disk)
for (long i = 0; i < (long)INT_MAX + 2; i++)
    ioctl(dm_fd, DM_TABLE_LOAD, &table);   // triggers bd_link_disk_holder

// One unlink at the "wrong" refcnt value triggers premature kfree
ioctl(dm_fd, DM_DEV_REMOVE, &dev);        // triggers bd_unlink_disk_holder
// → kobject_put fires with live users → UAF

Fix

Replace int refcnt with refcount_t refcnt (from <linux/refcount.h>) and use refcount_inc() / refcount_dec_and_test(). refcount_t saturates at REFCOUNT_SATURATED on overflow and emits a warning, preventing the counter from wrapping:

#include <linux/refcount.h>

struct bd_holder_disk {
    struct list_head    list;
    struct kobject      *holder_dir;
    refcount_t          refcnt;        // overflow-safe
};

Update the increment and decrement sites accordingly:

// bd_link_disk_holder:
refcount_inc(&holder->refcnt);

// bd_unlink_disk_holder:
if (refcount_dec_and_test(&holder->refcnt)) { ... }

References

  • CWE-190: Integer Overflow or Wraparound
  • CWE-416: Use After Free
  • include/linux/refcount.h — kernel overflow-safe reference counter API
  • Similar fix pattern: commit a2b0b2b (“mm: use refcount_t for page->_refcount”)