linux-yocto/kernel
Chen Ridong 993049c9b1 cgroup: split cgroup_destroy_wq into 3 workqueues
[ Upstream commit 79f919a89c9d06816dbdbbd168fa41d27411a7f9 ]

A hung task can occur during [1] LTP cgroup testing when repeatedly
mounting/unmounting perf_event and net_prio controllers with
systemd.unified_cgroup_hierarchy=1. The hang manifests in
cgroup_lock_and_drain_offline() during root destruction.

Related case:
cgroup_fj_function_perf_event cgroup_fj_function.sh perf_event
cgroup_fj_function_net_prio cgroup_fj_function.sh net_prio

Call Trace:
	cgroup_lock_and_drain_offline+0x14c/0x1e8
	cgroup_destroy_root+0x3c/0x2c0
	css_free_rwork_fn+0x248/0x338
	process_one_work+0x16c/0x3b8
	worker_thread+0x22c/0x3b0
	kthread+0xec/0x100
	ret_from_fork+0x10/0x20

Root Cause:

CPU0                            CPU1
mount perf_event                umount net_prio
cgroup1_get_tree                cgroup_kill_sb
rebind_subsystems               // root destruction enqueues
				// cgroup_destroy_wq
// kill all perf_event css
                                // one perf_event css A is dying
                                // css A offline enqueues cgroup_destroy_wq
                                // root destruction will be executed first
                                css_free_rwork_fn
                                cgroup_destroy_root
                                cgroup_lock_and_drain_offline
                                // some perf descendants are dying
                                // cgroup_destroy_wq max_active = 1
                                // waiting for css A to die

Problem scenario:
1. CPU0 mounts perf_event (rebind_subsystems)
2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
3. A dying perf_event CSS gets queued for offline after root destruction
4. Root destruction waits for offline completion, but offline work is
   blocked behind root destruction in cgroup_destroy_wq (max_active=1)

Solution:
Split cgroup_destroy_wq into three dedicated workqueues:
cgroup_offline_wq – Handles CSS offline operations
cgroup_release_wq – Manages resource release
cgroup_free_wq – Performs final memory deallocation

This separation eliminates blocking in the CSS free path while waiting for
offline operations to complete.

[1] https://github.com/linux-test-project/ltp/blob/master/runtest/controllers
Fixes: 334c3679ec ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Reported-by: Gao Yingjie <gaoyingjie@uniontech.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Suggested-by: Teju Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-25 10:58:50 +02:00
..
bpf bpf: Fix oob access in cgroup local storage 2025-09-09 18:54:11 +02:00
cgroup cgroup: split cgroup_destroy_wq into 3 workqueues 2025-09-25 10:58:50 +02:00
configs
debug kdb: Do not assume write() callback available 2025-02-21 13:50:09 +01:00
dma dma/pool: Ensure DMA_DIRECT_REMAP allocations are decrypted 2025-09-04 15:26:30 +02:00
entry entry: Respect changes to system call number by trace_sys_enter() 2024-04-03 15:19:44 +02:00
events perf/core: Prevent VMA split of buffer mappings 2025-08-15 12:05:11 +02:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:51:50 +01:00
gcov gcov: add support for GCC 14 2024-06-27 13:46:22 +02:00
irq genirq: Make handle_enforce_irqctx() unconditionally available 2025-02-21 13:48:57 +01:00
kcsan kcsan: test: Initialize dummy variable 2025-08-15 12:04:59 +02:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-11-20 11:52:10 +01:00
locking locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class() 2025-04-25 10:43:41 +02:00
module module: Prevent silent truncation of module name in delete_module(2) 2025-08-28 16:26:01 +02:00
power PM: sleep: console: Fix the black screen issue 2025-08-28 16:25:53 +02:00
printk printk: Fix signed integer overflow when defining LOG_BUF_LEN_MAX 2025-02-21 13:49:30 +01:00
rcu rcu: Protect ->defer_qs_iw_pending from data race 2025-08-28 16:25:55 +02:00
sched cpufreq/sched: Explicitly synchronize limits_changed flag handling 2025-09-09 18:54:18 +02:00
time hrtimers: Unconditionally update target CPU base after offline timer migration 2025-09-19 16:29:59 +02:00
trace tracing: Silence warning when chunk allocation fails in trace_pid_write 2025-09-19 16:29:56 +02:00
.gitignore
acct.c acct: block access to kernel internal filesystems 2025-03-07 16:56:39 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-01-31 16:17:00 -08:00
audit_fsnotify.c
audit_tree.c
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-28 17:07:08 +00:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-05 20:12:47 +00:00
audit.h
auditfilter.c ima: Avoid blocking in RCU read-side critical section 2024-07-11 12:47:16 +02:00
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-25 12:03:04 +02:00
backtracetest.c
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:29:32 +02:00
capability.c
cfi.c
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-06 12:10:40 +02:00
configs.c
context_tracking.c context_tracking: Fix noinstr vs KASAN 2023-03-10 09:33:45 +01:00
cpu_pm.c
cpu.c hrtimers: Handle CPU state correctly on hotplug 2025-01-23 17:17:15 +01:00
crash_core.c
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 17:00:20 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c perf: Fix sample vs do_exit() 2025-06-27 11:07:41 +01:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 13:55:39 +01:00
fork.c mm: drop the assumption that VM_SHARED always implies writable 2025-08-28 16:26:12 +02:00
freezer.c
gen_kheaders.sh kheaders: Ignore silly-rename files 2025-01-23 17:17:11 +01:00
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c jump_label: Fix static_key_slow_dec() yet again 2024-10-17 15:21:29 +02:00
kallsyms_internal.h kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] 2023-10-25 12:03:16 +02:00
kallsyms.c kallsyms: Add helper kallsyms_on_each_match_symbol() 2023-10-25 12:03:16 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: mark in_softirq_really() as __always_inline 2025-01-09 13:30:05 +01:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-19 16:21:08 +02:00
kexec_elf.c kexec: initialize ELF lowest address to ULONG_MAX 2025-04-10 14:33:36 +02:00
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 16:00:55 +02:00
kexec_internal.h
kexec.c kernel: kexec: copy user-array safely 2023-11-28 17:06:57 +00:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-11 23:03:02 +09:00
kmod.c
kprobes.c kprobes: Fix to check symbol prefixes correctly 2024-08-14 13:52:54 +02:00
ksysfs.c
kthread.c kthread: unpark only parked kthread 2024-10-17 15:22:28 +02:00
latencytop.c
Makefile kernel/numa.c: Move logging out of numa.h 2024-06-12 11:03:16 +02:00
module_signature.c
notifier.c
nsproxy.c
numa.c kernel/numa.c: Move logging out of numa.h 2024-06-12 11:03:16 +02:00
padata.c padata: do not leak refcount in reorder_work 2025-06-04 14:40:20 +02:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 13:04:54 +02:00
params.c module: ensure that kobject_put() is safe for module type kobjects 2025-05-18 08:21:23 +02:00
pid_namespace.c pid: Replace struct pid 1-element array with flex-array 2024-08-29 17:30:18 +02:00
pid.c pid: add pidfd_prepare() 2025-06-04 14:40:25 +02:00
profile.c profiling: remove profile=sleep support 2024-08-14 13:52:50 +02:00
ptrace.c
range.c
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 17:07:13 +00:00
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-11 23:03:03 +09:00
resource_kunit.c
resource.c resource: fix region_intersects() vs add_memory_driver_managed() 2024-10-17 15:21:55 +02:00
rseq.c rseq: Fix segfault on registration when rseq_cs is non-zero 2025-07-17 18:32:15 +02:00
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:11:00 +02:00
scs.c
seccomp.c
signal.c signal: restore the override_rlimit logic 2024-11-14 13:15:18 +01:00
smp.c smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu() 2024-09-12 11:10:24 +02:00
smpboot.c
smpboot.h
softirq.c lockdep: Fix wait context check on softirq for PREEMPT_RT 2025-06-04 14:40:03 +02:00
stackleak.c
stacktrace.c
static_call_inline.c x86/static-call: provide a way to do very early static-call updates 2024-12-19 18:08:58 +01:00
static_call.c
stop_machine.c
sys_ni.c syscalls: fix compat_sys_io_pgetevents_time64 usage 2024-07-05 09:31:59 +02:00
sys.c hrtimer: Use and report correct timerslack values for realtime tasks 2025-03-28 21:58:48 +01:00
sysctl-test.c
sysctl.c
task_work.c task_work: Introduce task_work_cancel() again 2024-08-03 08:49:34 +02:00
taskstats.c
torture.c torture: Fix hang during kthread shutdown phase 2023-03-10 09:34:07 +01:00
tracepoint.c
tsacct.c
ucount.c ucount: fix atomic_long_inc_below() argument type 2025-08-15 12:05:05 +02:00
uid16.c
uid16.h
umh.c freezer,umh: Fix call_usermode_helper_exec() vs SIGKILL 2023-02-22 12:59:50 +01:00
up.c
user_namespace.c
user-return-notifier.c
user.c
usermode_driver.c
utsname_sysctl.c
utsname.c
watch_queue.c watch_queue: fix pipe accounting mismatch 2025-04-10 14:33:30 +02:00
watchdog_hld.c watchdog/perf: properly initialize the turbo mode timestamp and rearm counter 2024-08-03 08:49:42 +02:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 17:07:09 +00:00
workqueue_internal.h
workqueue.c workqueue: Improve scalability of workqueue watchdog touch 2024-09-12 11:10:27 +02:00