CVE Candidate: Missing max_entries Validation in BPF Map-in-Map Allows Security Policy Bypass and Spectre Mitigation Undermining
Summary
bpf_map_meta_equal() in kernel/bpf/map_in_map.c validates a candidate inner map against its prototype when a user calls bpf_map_update_elem on an ARRAY_OF_MAPS or HASH_OF_MAPS outer map. The function checks map_type, key_size, value_size, and map_flags — but omits max_entries. An unprivileged user (with CAP_BPF or a privileged fd to the outer map) can therefore insert an inner map whose max_entries differs arbitrarily from the prototype, violating the invariants the BPF verifier relied on when it analysed the program at load time.
The sample samples/bpf/test_map_in_map.bpf.c is the canonical exerciser of this path and makes the impact concrete.
Affected Files
| File | Location | Role |
|---|---|---|
kernel/bpf/map_in_map.c |
bpf_map_meta_equal(), line 83 |
Missing check |
kernel/bpf/map_in_map.c |
bpf_map_fd_get_ptr(), line 94 |
Calls the flawed comparator |
samples/bpf/test_map_in_map.bpf.c |
all | BPF program whose verifier assumptions are violated |
Root Cause
/* kernel/bpf/map_in_map.c – prototype is built correctly … */
inner_map_meta->max_entries = inner_map->max_entries; // line 40 – saved
inner_array_meta->index_mask = inner_array->index_mask; // line 69 – saved
/* … but equality check ignores max_entries entirely: */
bool bpf_map_meta_equal(const struct bpf_map *meta0,
const struct bpf_map *meta1)
{
return meta0->map_type == meta1->map_type &&
meta0->key_size == meta1->key_size &&
meta0->value_size== meta1->value_size &&
meta0->map_flags == meta1->map_flags &&
btf_record_equal(meta0->record, meta1->record);
/* ← max_entries NEVER compared */
}
bpf_map_fd_get_ptr() calls inner_map_meta->ops->map_meta_equal(inner_map_meta, inner_map) and accepts the candidate if it returns true. Because max_entries is not part of that test, any array/hash map with matching type/key/value can be swapped in.
Impact
1. BPF Verifier Invariant Violation → Spectre v1 Mitigation Bypass
The verifier records the prototype’s max_entries and index_mask in inner_map_meta (stored in outer_map->inner_map_meta). For the JIT-compiled inline path (array_map_gen_lookup, arraymap.c:220) the bounds-check instruction and the masking instruction are emitted once at JIT time using the prototype’s values:
*insn++ = BPF_JMP_IMM(BPF_JGE, ret, map->max_entries, 4); // prototype max
*insn++ = BPF_ALU32_IMM(BPF_AND, ret, array->index_mask); // prototype mask
If a substituted inner map has a larger max_entries (e.g. 2 × MAX_NR_PORTS = 131072) its index_mask is 0x1FFFF rather than the prototype’s 0xFFFF. After substitution the runtime pointer arithmetic uses the actual (larger) index_mask of the swapped-in map (arraymap.c:175):
return array->value + (u64)array->elem_size * (index & array->index_mask);
Spectre v1 protection relies on the AND mask matching the conditional branch. With mismatched masks the speculative window is wider than the verifier intended, re-opening the Spectre gadget that bypass_spec_v1 was supposed to close.
2. Silent Security-Policy Bypass
test_map_in_map.bpf.c maps TCP ports to policy values. The outer map a_of_port_a (type ARRAY_OF_MAPS) has max_entries = MAX_NR_PORTS = 65536; its prototype inner map inner_a also has max_entries = 65536.
Attack:
- Open the fd for
a_of_port_a. - Create a new
BPF_MAP_TYPE_ARRAYmap with matchingkey_size=4,value_size=4, butmax_entries=1. - Call
bpf(BPF_MAP_UPDATE_ELEM)to insert it at any port index.bpf_map_meta_equalaccepts it. - The running BPF program calls
bpf_map_lookup_elem(inner_map, &port_key). The actual inner map’sarray_map_lookup_elemchecksindex >= array->map.max_entries(i.e.port_key >= 1): every port except 0 returnsNULL. do_reg_lookupreturns-ENOENT; thereg_result_hentry is written as “not found”.
Any firewall or auditing decision downstream of reg_result_h now sees a false negative for every port except 0 — an attacker-controlled silent bypass.
3. Out-of-Bounds Read (larger inner map)
Conversely, a substituted inner map with max_entries much larger than the prototype causes the BPF program to successfully look up indices the prototype never covered. The returned pointer points into the swapped-in map’s value array beyond the bounds assumed by the verifier, leaking uninitialised or attacker-controlled kernel heap data through the result maps (reg_result_h, inline_result_h).
Severity
Critical — local privilege escalation / information disclosure / security-policy bypass.
| Attribute | Value |
|---|---|
| Attack vector | Local |
| Privileges required | CAP_BPF (or open fd to the outer map) |
| User interaction | None |
| Confidentiality | High (kernel heap leak) |
| Integrity | High (verifier invariant broken, Spectre mitigation undermined) |
| Availability | None |
Reproduction Sketch
// 1. Load test_map_in_map BPF program — outer map fd is exposed via bpffs
int outer_fd = bpf_obj_get("/sys/fs/bpf/a_of_port_a");
// 2. Create undersized inner map (same type/key/value, wrong max_entries)
int inner_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY,
NULL, sizeof(u32), sizeof(int),
1, // ← max_entries = 1, prototype expects 65536
NULL);
// 3. Insert at port 80 — bpf_map_meta_equal() accepts it
u32 key = 80;
bpf_map_update_elem(outer_fd, &key, &inner_fd, BPF_ANY);
// 4. Connect to dead:beef::80 — BPF probe fires, inner lookup returns ENOENT
// for port 80, silently bypassing any policy recorded there.
Fix
Add max_entries to the equality check:
bool bpf_map_meta_equal(const struct bpf_map *meta0,
const struct bpf_map *meta1)
{
return meta0->map_type == meta1->map_type &&
meta0->key_size == meta1->key_size &&
meta0->value_size == meta1->value_size &&
meta0->map_flags == meta1->map_flags &&
meta0->max_entries== meta1->max_entries && /* ← add this */
btf_record_equal(meta0->record, meta1->record);
}
References
kernel/bpf/map_in_map.c—bpf_map_meta_equal()(line 83),bpf_map_fd_get_ptr()(line 94),bpf_map_meta_alloc()(line 10)kernel/bpf/arraymap.c—array_map_gen_lookup()(line 220),array_map_lookup_elem()(line 170)samples/bpf/test_map_in_map.bpf.c— affected BPF program- Related prior art: CVE-2021-20268 (BPF map reference count UAF), CVE-2022-2905 (BPF array OOB read)