linux-yocto/kernel/bpf
Peilin Ye cd1fd26bb1 bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
[ Upstream commit 6d78b4473cdb08b74662355a9e8510bde09c511e ]

Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can
cause various locking issues; see the following stack trace (edited for
style) as one example:

...
 [10.011566]  do_raw_spin_lock.cold
 [10.011570]  try_to_wake_up             (5) double-acquiring the same
 [10.011575]  kick_pool                      rq_lock, causing a hardlockup
 [10.011579]  __queue_work
 [10.011582]  queue_work_on
 [10.011585]  kernfs_notify
 [10.011589]  cgroup_file_notify
 [10.011593]  try_charge_memcg           (4) memcg accounting raises an
 [10.011597]  obj_cgroup_charge_pages        MEMCG_MAX event
 [10.011599]  obj_cgroup_charge_account
 [10.011600]  __memcg_slab_post_alloc_hook
 [10.011603]  __kmalloc_node_noprof
...
 [10.011611]  bpf_map_kmalloc_node
 [10.011612]  __bpf_async_init
 [10.011615]  bpf_timer_init             (3) BPF calls bpf_timer_init()
 [10.011617]  bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
 [10.011619]  bpf__sched_ext_ops_runnable
 [10.011620]  enqueue_task_scx           (2) BPF runs with rq_lock held
 [10.011622]  enqueue_task
 [10.011626]  ttwu_do_activate
 [10.011629]  sched_ttwu_pending         (1) grabs rq_lock
...

The above was reproduced on bpf-next (b338cf849ec8) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking the memcg accounting code a bit to make
a bpf_timer_init() call more likely to raise an MEMCG_MAX event.

We have also run into other similar variants (both internally and on
bpf-next), including double-acquiring cgroup_file_kn_lock, the same
worker_pool::lock, etc.

As suggested by Shakeel, fix this by using __GFP_HIGH instead of
GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg()
raises an MEMCG_MAX event, we call __memcg_memory_event() with
@allow_spinning=false and avoid calling cgroup_file_notify() there.

Depends on mm patch
"memcg: skip cgroup_file_notify if spinning is not allowed":
https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/

v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/
https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
v1 approach:
https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/

Fixes: b00628b1c7 ("bpf: Introduce bpf timers.")
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/20250909095222.2121438-1-yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
..
preload bpf/preload: Don't select USERMODE_DRIVER 2025-08-15 12:13:48 +02:00
arena.c bpf: Fix softlockup in arena_map_free on 64k page kernel 2025-02-27 04:30:19 -08:00
arraymap.c
bloom_filter.c
bpf_cgrp_storage.c bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage 2025-05-02 07:59:16 +02:00
bpf_inode_storage.c
bpf_iter.c
bpf_local_storage.c bpf: bpf_local_storage: Always use bpf_mem_alloc in PREEMPT_RT 2025-02-08 09:57:29 +01:00
bpf_lru_list.c bpf: Adjust free target to avoid global starvation of LRU map 2025-07-17 18:37:22 +02:00
bpf_lru_list.h bpf: Adjust free target to avoid global starvation of LRU map 2025-07-17 18:37:22 +02:00
bpf_lsm.c
bpf_struct_ops.c bpf: Pass the same orig_call value to trampoline functions 2025-06-27 11:11:31 +01:00
bpf_task_storage.c
btf_iter.c
btf_relocate.c
btf.c bpf: Use proper type to calculate bpf_raw_tp_null_args.mask index 2025-06-27 11:11:33 +01:00
cgroup_iter.c
cgroup.c bpf: Allow pre-ordering for bpf cgroup progs 2025-05-29 11:02:17 +02:00
core.c bpf: Allow fall back to interpreter for programs with stack size <= 512 2025-09-19 16:35:44 +02:00
cpumap.c
cpumask.c
crypto.c bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt 2025-09-19 16:35:44 +02:00
devmap.c
disasm.c
disasm.h
dispatcher.c
hashtab.c bpf: fix possible endless loop in BPF map iteration 2025-05-29 11:02:01 +02:00
helpers.c bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init() 2025-09-19 16:35:44 +02:00
inode.c
Kconfig
link_iter.c
local_storage.c
log.c
lpm_trie.c
Makefile
map_in_map.c
map_in_map.h
map_iter.c
memalloc.c
mmap_unlock_work.h
mprog.c
net_namespace.c
offload.c
percpu_freelist.c
percpu_freelist.h
prog_iter.c
queue_stack_maps.c
relo_core.c
reuseport_array.c
ringbuf.c bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic 2025-02-27 04:30:18 -08:00
stackmap.c
syscall.c bpf: Move bpf map owner out of common struct 2025-09-09 18:58:01 +02:00
sysfs_btf.c
task_iter.c
tcx.c
tnum.c
token.c
trampoline.c
verifier.c bpf: Make reg_not_null() true for CONST_PTR_TO_MAP 2025-08-20 18:30:39 +02:00