mirror of
git://git.yoctoproject.org/linux-yocto.git
synced 2025-10-23 07:23:12 +02:00
![]() [ Upstream commit 6d78b4473cdb08b74662355a9e8510bde09c511e ]
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can
cause various locking issues; see the following stack trace (edited for
style) as one example:
...
[10.011566] do_raw_spin_lock.cold
[10.011570] try_to_wake_up (5) double-acquiring the same
[10.011575] kick_pool rq_lock, causing a hardlockup
[10.011579] __queue_work
[10.011582] queue_work_on
[10.011585] kernfs_notify
[10.011589] cgroup_file_notify
[10.011593] try_charge_memcg (4) memcg accounting raises an
[10.011597] obj_cgroup_charge_pages MEMCG_MAX event
[10.011599] obj_cgroup_charge_account
[10.011600] __memcg_slab_post_alloc_hook
[10.011603] __kmalloc_node_noprof
...
[10.011611] bpf_map_kmalloc_node
[10.011612] __bpf_async_init
[10.011615] bpf_timer_init (3) BPF calls bpf_timer_init()
[10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
[10.011619] bpf__sched_ext_ops_runnable
[10.011620] enqueue_task_scx (2) BPF runs with rq_lock held
[10.011622] enqueue_task
[10.011626] ttwu_do_activate
[10.011629] sched_ttwu_pending (1) grabs rq_lock
...
The above was reproduced on bpf-next (b338cf849ec8) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking the memcg accounting code a bit to make
a bpf_timer_init() call more likely to raise an MEMCG_MAX event.
We have also run into other similar variants (both internally and on
bpf-next), including double-acquiring cgroup_file_kn_lock, the same
worker_pool::lock, etc.
As suggested by Shakeel, fix this by using __GFP_HIGH instead of
GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg()
raises an MEMCG_MAX event, we call __memcg_memory_event() with
@allow_spinning=false and avoid calling cgroup_file_notify() there.
Depends on mm patch
"memcg: skip cgroup_file_notify if spinning is not allowed":
https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/
v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/
https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
v1 approach:
https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/
Fixes:
|
||
---|---|---|
.. | ||
preload | ||
arena.c | ||
arraymap.c | ||
bloom_filter.c | ||
bpf_cgrp_storage.c | ||
bpf_inode_storage.c | ||
bpf_iter.c | ||
bpf_local_storage.c | ||
bpf_lru_list.c | ||
bpf_lru_list.h | ||
bpf_lsm.c | ||
bpf_struct_ops.c | ||
bpf_task_storage.c | ||
btf_iter.c | ||
btf_relocate.c | ||
btf.c | ||
cgroup_iter.c | ||
cgroup.c | ||
core.c | ||
cpumap.c | ||
cpumask.c | ||
crypto.c | ||
devmap.c | ||
disasm.c | ||
disasm.h | ||
dispatcher.c | ||
hashtab.c | ||
helpers.c | ||
inode.c | ||
Kconfig | ||
link_iter.c | ||
local_storage.c | ||
log.c | ||
lpm_trie.c | ||
Makefile | ||
map_in_map.c | ||
map_in_map.h | ||
map_iter.c | ||
memalloc.c | ||
mmap_unlock_work.h | ||
mprog.c | ||
net_namespace.c | ||
offload.c | ||
percpu_freelist.c | ||
percpu_freelist.h | ||
prog_iter.c | ||
queue_stack_maps.c | ||
relo_core.c | ||
reuseport_array.c | ||
ringbuf.c | ||
stackmap.c | ||
syscall.c | ||
sysfs_btf.c | ||
task_iter.c | ||
tcx.c | ||
tnum.c | ||
token.c | ||
trampoline.c | ||
verifier.c |