Go to file
Daniel Borkmann 511804ab70 bpf: Fix overrunning reservations in ringbuf
[ Upstream commit cfa1a2329a ]

The BPF ring buffer internally is implemented as a power-of-2 sized circular
buffer, with two logical and ever-increasing counters: consumer_pos is the
consumer counter to show which logical position the consumer consumed the
data, and producer_pos which is the producer counter denoting the amount of
data reserved by all producers.

Each time a record is reserved, the producer that "owns" the record will
successfully advance producer counter. In user space each time a record is
read, the consumer of the data advanced the consumer counter once it finished
processing. Both counters are stored in separate pages so that from user
space, the producer counter is read-only and the consumer counter is read-write.

One aspect that simplifies and thus speeds up the implementation of both
producers and consumers is how the data area is mapped twice contiguously
back-to-back in the virtual memory, allowing to not take any special measures
for samples that have to wrap around at the end of the circular buffer data
area, because the next page after the last data page would be first data page
again, and thus the sample will still appear completely contiguous in virtual
memory.

Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for
book-keeping the length and offset, and is inaccessible to the BPF program.
Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ`
for the BPF program to use. Bing-Jhong and Muhammad reported that it is however
possible to make a second allocated memory chunk overlapping with the first
chunk and as a result, the BPF program is now able to edit first chunk's
header.

For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size
of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to
bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in
[0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets
allocate a chunk B with size 0x3000. This will succeed because consumer_pos
was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask`
check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able
to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned
earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data
pages. This means that chunk B at [0x4000,0x4008] is chunk A's header.
bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then
locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk
B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong
page and could cause a crash.

Fix it by calculating the oldest pending_pos and check whether the range
from the oldest outstanding record to the newest would span beyond the ring
buffer size. If that is the case, then reject the request. We've tested with
the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh)
before/after the fix and while it seems a bit slower on some benchmarks, it
is still not significantly enough to matter.

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg>
Co-developed-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240621140828.18238-1-daniel@iogearbox.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05 09:33:47 +02:00
arch s390/pci: Add missing virt_to_phys() for directed DIBV 2024-07-05 09:33:46 +02:00
block block/ioctl: prefer different overflow check 2024-06-27 13:49:01 +02:00
certs certs: Reference revocation list for all keyrings 2023-08-17 20:12:41 +00:00
crypto crypto: ecrdsa - Fix module auto-load on add_key 2024-06-16 13:47:39 +02:00
Documentation dt-bindings: i2c: google,cros-ec-i2c-tunnel: correct path to i2c-controller schema 2024-06-27 13:49:13 +02:00
drivers mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systems 2024-07-05 09:33:46 +02:00
fs ocfs2: update inode fsync transaction id in ocfs2_unlink and ocfs2_link 2024-06-27 13:49:14 +02:00
include workqueue: Increase worker desc's length to 32 2024-07-05 09:33:45 +02:00
init smp: Provide 'setup_max_cpus' definition on UP too 2024-06-16 13:47:49 +02:00
io_uring io_uring/rsrc: fix incorrect assignment of iter->nr_segs in io_import_fixed 2024-06-27 13:49:10 +02:00
ipc Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
kernel bpf: Fix overrunning reservations in ringbuf 2024-07-05 09:33:47 +02:00
lib lib/test_hmm.c: handle src_pfns and dst_pfns allocation failure 2024-06-12 11:12:08 +02:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-27 13:49:13 +02:00
net openvswitch: get related ct labels from its master if it is not confirmed 2024-07-05 09:33:46 +02:00
rust rust: kernel: require Send for Module implementations 2024-05-17 12:01:56 +02:00
samples work around gcc bugs with 'asm goto' with outputs 2024-02-23 09:24:47 +01:00
scripts locking/atomic: scripts: fix ${atomic}_sub_and_test() kerneldoc 2024-06-27 13:49:11 +02:00
security ima: Fix use-after-free on a dentry's dname.name 2024-06-21 14:38:48 +02:00
sound ASoC: fsl-asoc-card: set priv->pdev before using it 2024-07-05 09:33:46 +02:00
tools selftests: mptcp: userspace_pm: fixed subtest names 2024-07-05 09:33:44 +02:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin() 2024-06-27 13:49:11 +02:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: rpm-pkg: rename binkernel.spec to kernel.spec 2023-07-25 00:59:33 +09:00
.mailmap 20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues 2023-10-24 09:52:16 -10:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS USB: Remove Wireless USB and UWB documentation 2023-08-09 14:17:32 +02:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig
MAINTAINERS pwm: Rename pwm_apply_state() to pwm_apply_might_sleep() 2024-06-12 11:12:24 +02:00
Makefile Linux 6.6.36 2024-06-27 13:49:15 +02:00
README

Linux kernel

There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first.

In order to build the documentation, use make htmldocs or make pdfdocs. The formatted documentation can also be read online at:

https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.