linux-yocto/kernel
Chen Ridong ded4d207a3 cgroup: split cgroup_destroy_wq into 3 workqueues
[ Upstream commit 79f919a89c9d06816dbdbbd168fa41d27411a7f9 ]

A hung task can occur during [1] LTP cgroup testing when repeatedly
mounting/unmounting perf_event and net_prio controllers with
systemd.unified_cgroup_hierarchy=1. The hang manifests in
cgroup_lock_and_drain_offline() during root destruction.

Related case:
cgroup_fj_function_perf_event cgroup_fj_function.sh perf_event
cgroup_fj_function_net_prio cgroup_fj_function.sh net_prio

Call Trace:
	cgroup_lock_and_drain_offline+0x14c/0x1e8
	cgroup_destroy_root+0x3c/0x2c0
	css_free_rwork_fn+0x248/0x338
	process_one_work+0x16c/0x3b8
	worker_thread+0x22c/0x3b0
	kthread+0xec/0x100
	ret_from_fork+0x10/0x20

Root Cause:

CPU0                            CPU1
mount perf_event                umount net_prio
cgroup1_get_tree                cgroup_kill_sb
rebind_subsystems               // root destruction enqueues
				// cgroup_destroy_wq
// kill all perf_event css
                                // one perf_event css A is dying
                                // css A offline enqueues cgroup_destroy_wq
                                // root destruction will be executed first
                                css_free_rwork_fn
                                cgroup_destroy_root
                                cgroup_lock_and_drain_offline
                                // some perf descendants are dying
                                // cgroup_destroy_wq max_active = 1
                                // waiting for css A to die

Problem scenario:
1. CPU0 mounts perf_event (rebind_subsystems)
2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
3. A dying perf_event CSS gets queued for offline after root destruction
4. Root destruction waits for offline completion, but offline work is
   blocked behind root destruction in cgroup_destroy_wq (max_active=1)

Solution:
Split cgroup_destroy_wq into three dedicated workqueues:
cgroup_offline_wq – Handles CSS offline operations
cgroup_release_wq – Manages resource release
cgroup_free_wq – Performs final memory deallocation

This separation eliminates blocking in the CSS free path while waiting for
offline operations to complete.

[1] https://github.com/linux-test-project/ltp/blob/master/runtest/controllers
Fixes: 334c3679ec ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Reported-by: Gao Yingjie <gaoyingjie@uniontech.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Suggested-by: Teju Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-25 11:13:42 +02:00
..
bpf bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init() 2025-09-19 16:35:44 +02:00
cgroup cgroup: split cgroup_destroy_wq into 3 workqueues 2025-09-25 11:13:42 +02:00
configs tinyconfig: remove unnecessary 'is not set' for choice blocks 2024-09-01 20:34:38 +09:00
debug move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
dma dma-debug: fix physical address calculation for struct dma_debug_entry 2025-09-19 16:35:42 +02:00
entry treewide: context_tracking: Rename CONTEXT_* into CT_STATE_* 2024-07-29 07:33:10 +05:30
events perf/core: Prevent VMA split of buffer mappings 2025-08-15 12:14:09 +02:00
futex futex: Pass in task to futex_queue() 2025-03-22 12:54:14 -07:00
gcov gcov: add support for GCC 14 2024-06-15 10:43:06 -07:00
irq genirq/irq_sim: Initialize work context pointers properly 2025-07-10 16:05:07 +02:00
kcsan kcsan: test: Initialize dummy variable 2025-08-15 12:13:46 +02:00
livepatch livepatch: Replace snprintf() with sysfs_emit() 2024-07-02 16:56:18 +02:00
locking locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class() 2025-04-20 10:15:45 +02:00
module module: Prevent silent truncation of module name in delete_module(2) 2025-08-20 18:30:46 +02:00
power PM: sleep: console: Fix the black screen issue 2025-08-20 18:30:25 +02:00
printk printk: nbcon: Allow reacquire during panic 2025-08-20 18:30:47 +02:00
rcu rcu: Fix racy re-initialization of irq_work causing hangs 2025-08-20 18:30:58 +02:00
sched sched: Fix sched_numa_find_nth_cpu() if mask offline 2025-09-09 18:58:16 +02:00
time hrtimers: Unconditionally update target CPU base after offline timer migration 2025-09-19 16:35:47 +02:00
trace tracing: Silence warning when chunk allocation fails in trace_pid_write 2025-09-19 16:35:44 +02:00
.gitignore
acct.c acct: block access to kernel internal filesystems 2025-02-27 04:30:23 -08:00
async.c async: Use a dedicated unbound workqueue with raised min_active 2024-02-09 11:13:59 -10:00
audit_fsnotify.c
audit_tree.c fsnotify: create a wrapper fsnotify_find_inode_mark() 2024-04-04 16:24:16 +02:00
audit_watch.c fsnotify: create a wrapper fsnotify_find_inode_mark() 2024-04-04 16:24:16 +02:00
audit.c audit: Make use of str_enabled_disabled() helper 2024-09-03 16:35:16 -04:00
audit.h audit,module: restore audit logging in load failure case 2025-08-15 12:13:31 +02:00
auditfilter.c audit: use task_tgid_nr() instead of task_pid_nr() 2024-08-28 16:48:28 -04:00
auditsc.c audit,module: restore audit logging in load failure case 2025-08-15 12:13:31 +02:00
backtracetest.c backtracetest: add MODULE_DESCRIPTION() 2024-06-24 22:24:55 -07:00
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-04-29 08:29:29 -07:00
capability.c
cfi.c
compat.c
configs.c
context_tracking.c context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching 2024-08-15 21:30:43 +05:30
cpu_pm.c
cpu.c watchdog/hardlockup/perf: Fix perf_event memory leak 2025-04-10 14:39:11 +02:00
crash_core.c Document/kexec: generalize crash hotplug description 2024-09-01 20:43:37 -07:00
crash_reserve.c crash: fix crash memory reserve exceed system memory bug 2024-09-01 20:43:30 -07:00
cred.c cred: Use KMEM_CACHE() instead of kmem_cache_create() 2024-02-23 17:33:31 -05:00
delayacct.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
dma.c
elfcorehdr.c crash: remove dependency of FA_DUMP on CRASH_DUMP 2024-02-23 17:48:22 -08:00
exec_domain.c
exit.c perf: Fix sample vs do_exit() 2025-06-27 11:11:45 +01:00
exit.h
extable.c
fail_function.c
fork.c kernel/fork: only call untrack_pfn_clear() on VMAs duplicated for fork() 2025-05-29 11:03:14 +02:00
freezer.c sched,freezer: Remove unnecessary warning in __thaw_task 2025-07-24 08:56:37 +02:00
gen_kheaders.sh kheaders: Ignore silly-rename files 2025-01-23 17:22:55 +01:00
groups.c
hung_task.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
iomem.c
irq_work.c
jump_label.c jump_label: Fix static_key_slow_dec() yet again 2024-09-10 11:57:27 +02:00
kallsyms_internal.h kallsyms: get rid of code for absolute kallsyms 2024-07-20 16:33:21 +09:00
kallsyms_selftest.c kallsyms: Match symbols exactly with CONFIG_LTO_CLANG 2024-08-15 09:33:35 -07:00
kallsyms_selftest.h
kallsyms.c kallsyms: Match symbols exactly with CONFIG_LTO_CLANG 2024-08-15 09:33:35 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.kexec crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32 2024-11-14 22:43:48 -08:00
Kconfig.locks
Kconfig.preempt sched_ext: Build fix on !CONFIG_STACKTRACE[_SUPPORT] 2024-08-01 07:08:01 -10:00
kcov.c kcov: mark in_softirq_really() as __always_inline 2025-01-09 13:33:49 +01:00
kexec_core.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
kexec_elf.c kexec: initialize ELF lowest address to ULONG_MAX 2025-04-10 14:39:24 +02:00
kexec_file.c kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y 2024-09-01 17:59:01 -07:00
kexec_internal.h kexec: use atomic_try_cmpxchg_acquire() in kexec_trylock() 2024-09-01 20:43:23 -07:00
kexec.c crash: add a new kexec flag for hotplug support 2024-04-23 14:59:01 +10:00
kheaders.c
kprobes.c kprobes: Fix to check symbol prefixes correctly 2024-08-05 14:04:03 +09:00
ksyms_common.c
ksysfs.c profiling: remove prof_cpu_mask 2024-07-29 10:45:54 -07:00
kthread.c kthread: unpark only parked kthread 2024-10-09 12:47:19 -07:00
latencytop.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
Makefile mm: move kernel/numa.c to mm/ 2024-09-03 21:15:26 -07:00
module_signature.c
notifier.c
nsproxy.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
padata.c padata: do not leak refcount in reorder_work 2025-05-29 11:03:19 +02:00
panic.c objtool, panic: Disable SMAP in __stack_chk_fail() 2025-05-02 07:59:19 +02:00
params.c module: ensure that kobject_put() is safe for module type kobjects 2025-05-18 08:24:54 +02:00
pid_namespace.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
pid_sysctl.h sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
pid.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
profile.c profiling: remove profile=sleep support 2024-08-04 13:36:28 -07:00
ptrace.c ptrace_attach: shift send(SIGSTOP) into ptrace_set_stopped() 2024-02-22 15:38:52 -08:00
range.c
reboot.c Flush console log from kernel_power_off() 2025-04-20 10:15:12 +02:00
regset.c regset: use kvzalloc() for regset_get_alloc() 2024-04-25 21:07:03 -07:00
relay.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
resource_kunit.c resource, kunit: fix user-after-free in resource_test_region_intersects() 2024-10-09 12:47:19 -07:00
resource.c resource: fix false warning in __request_region() 2025-08-01 09:48:44 +01:00
rseq.c rseq: Fix segfault on registration when rseq_cs is non-zero 2025-07-17 18:37:24 +02:00
scftorture.c scftorture: Make torture_type static 2024-05-30 15:31:51 -07:00
scs.c
seccomp.c seccomp: passthrough uretprobe systemcall without filtering 2025-02-17 10:05:12 +01:00
signal.c pidfs: improve multi-threaded exec and premature thread-group leader exit polling 2025-05-29 11:02:09 +02:00
smp.c smp: print only local CPU info when sched_clock goes backward 2024-08-15 00:06:48 +05:30
smpboot.c
smpboot.h
softirq.c lockdep: Fix wait context check on softirq for PREEMPT_RT 2025-05-29 11:02:08 +02:00
stackleak.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
stacktrace.c stacktrace: fix kernel-doc typo 2023-12-29 12:22:29 -08:00
static_call_inline.c x86/static-call: provide a way to do very early static-call updates 2024-12-19 18:13:23 +01:00
static_call.c
stop_machine.c sched/core: Fix migrate_swap() vs. hotplug 2025-07-17 18:37:03 +02:00
sys_ni.c Probes updates for v6.11: 2024-07-18 12:19:20 -07:00
sys.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
sysctl-test.c sysctl: Add module description to sysctl-testing 2024-06-03 15:20:37 +02:00
sysctl.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
task_work.c sched/core: Disable page allocation in task_tick_mm_cid() 2024-10-11 10:49:32 +02:00
taskstats.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
torture.c torture: Add MODULE_DESCRIPTION() 2024-05-30 15:31:38 -07:00
tracepoint.c tracepoint: Support iterating tracepoints in a loading module 2024-09-25 23:23:44 +09:00
tsacct.c tsacct: replace strncpy() with strscpy() 2024-07-12 16:39:53 -07:00
ucount.c ucount: fix atomic_long_inc_below() argument type 2025-08-15 12:13:59 +02:00
uid16.c
uid16.h
umh.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
up.c
user_namespace.c user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation 2024-09-09 16:47:42 -07:00
user-return-notifier.c
user.c uidgid: make sure we fit into one cacheline 2024-09-12 12:16:09 +02:00
usermode_driver.c
utsname_sysctl.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
utsname.c
vhost_task.c vhost_task: fix vhost_task_create() documentation 2025-05-29 11:01:59 +02:00
vmcore_info.c mm: support only one page_type per page 2024-09-03 21:15:43 -07:00
watch_queue.c watch_queue: fix pipe accounting mismatch 2025-04-10 14:39:10 +02:00
watchdog_buddy.c
watchdog_perf.c watchdog/hardlockup/perf: Fix perf_event memory leak 2025-04-10 14:39:11 +02:00
watchdog.c watchdog: fix watchdog may detect false positive of softlockup 2025-06-27 11:11:22 +01:00
workqueue_internal.h
workqueue.c workqueue: Initialize wq_isolated_cpumask in workqueue_init_early() 2025-06-27 11:11:42 +01:00