Go to file
Yu Kuai 56ac639d6f blk-cgroup: fix possible deadlock while configuring policy
[ Upstream commit 5d726c4dbe ]

Following deadlock can be triggered easily by lockdep:

WARNING: possible circular locking dependency detected
6.17.0-rc3-00124-ga12c2658ced0 #1665 Not tainted
------------------------------------------------------
check/1334 is trying to acquire lock:
ff1100011d9d0678 (&q->sysfs_lock){+.+.}-{4:4}, at: blk_unregister_queue+0x53/0x180

but task is already holding lock:
ff1100011d9d00e0 (&q->q_usage_counter(queue)#3){++++}-{0:0}, at: del_gendisk+0xba/0x110

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&q->q_usage_counter(queue)#3){++++}-{0:0}:
       blk_queue_enter+0x40b/0x470
       blkg_conf_prep+0x7b/0x3c0
       tg_set_limit+0x10a/0x3e0
       cgroup_file_write+0xc6/0x420
       kernfs_fop_write_iter+0x189/0x280
       vfs_write+0x256/0x490
       ksys_write+0x83/0x190
       __x64_sys_write+0x21/0x30
       x64_sys_call+0x4608/0x4630
       do_syscall_64+0xdb/0x6b0
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #1 (&q->rq_qos_mutex){+.+.}-{4:4}:
       __mutex_lock+0xd8/0xf50
       mutex_lock_nested+0x2b/0x40
       wbt_init+0x17e/0x280
       wbt_enable_default+0xe9/0x140
       blk_register_queue+0x1da/0x2e0
       __add_disk+0x38c/0x5d0
       add_disk_fwnode+0x89/0x250
       device_add_disk+0x18/0x30
       virtblk_probe+0x13a3/0x1800
       virtio_dev_probe+0x389/0x610
       really_probe+0x136/0x620
       __driver_probe_device+0xb3/0x230
       driver_probe_device+0x2f/0xe0
       __driver_attach+0x158/0x250
       bus_for_each_dev+0xa9/0x130
       driver_attach+0x26/0x40
       bus_add_driver+0x178/0x3d0
       driver_register+0x7d/0x1c0
       __register_virtio_driver+0x2c/0x60
       virtio_blk_init+0x6f/0xe0
       do_one_initcall+0x94/0x540
       kernel_init_freeable+0x56a/0x7b0
       kernel_init+0x2b/0x270
       ret_from_fork+0x268/0x4c0
       ret_from_fork_asm+0x1a/0x30

-> #0 (&q->sysfs_lock){+.+.}-{4:4}:
       __lock_acquire+0x1835/0x2940
       lock_acquire+0xf9/0x450
       __mutex_lock+0xd8/0xf50
       mutex_lock_nested+0x2b/0x40
       blk_unregister_queue+0x53/0x180
       __del_gendisk+0x226/0x690
       del_gendisk+0xba/0x110
       sd_remove+0x49/0xb0 [sd_mod]
       device_remove+0x87/0xb0
       device_release_driver_internal+0x11e/0x230
       device_release_driver+0x1a/0x30
       bus_remove_device+0x14d/0x220
       device_del+0x1e1/0x5a0
       __scsi_remove_device+0x1ff/0x2f0
       scsi_remove_device+0x37/0x60
       sdev_store_delete+0x77/0x100
       dev_attr_store+0x1f/0x40
       sysfs_kf_write+0x65/0x90
       kernfs_fop_write_iter+0x189/0x280
       vfs_write+0x256/0x490
       ksys_write+0x83/0x190
       __x64_sys_write+0x21/0x30
       x64_sys_call+0x4608/0x4630
       do_syscall_64+0xdb/0x6b0
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

other info that might help us debug this:

Chain exists of:
  &q->sysfs_lock --> &q->rq_qos_mutex --> &q->q_usage_counter(queue)#3

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&q->q_usage_counter(queue)#3);
                               lock(&q->rq_qos_mutex);
                               lock(&q->q_usage_counter(queue)#3);
  lock(&q->sysfs_lock);

Root cause is that queue_usage_counter is grabbed with rq_qos_mutex
held in blkg_conf_prep(), while queue should be freezed before
rq_qos_mutex from other context.

The blk_queue_enter() from blkg_conf_prep() is used to protect against
policy deactivation, which is already protected with blkcg_mutex, hence
convert blk_queue_enter() to blkcg_mutex to fix this problem. Meanwhile,
consider that blkcg_mutex is held after queue is freezed from policy
deactivation, also convert blkg_alloc() to use GFP_NOIO.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-13 15:34:07 -05:00
arch ARM: tegra: transformer-20: fix audio-codec interrupt 2025-11-13 15:34:05 -05:00
block blk-cgroup: fix possible deadlock while configuring policy 2025-11-13 15:34:07 -05:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto crypto: essiv - Check ssize for decryption and in-place encryption 2025-10-19 16:33:42 +02:00
Documentation dpll: spec: add missing module-name and clock-id to pin-get reply 2025-11-13 15:33:59 -05:00
drivers clocksource/drivers/timer-rtl-otto: Do not interfere with interrupts 2025-11-13 15:34:07 -05:00
fs smb: client: fix potential cfid UAF in smb2_query_info_compound 2025-11-13 15:33:56 -05:00
include bpf: Do not limit bpf_cgroup_from_id to current's namespace 2025-11-13 15:34:06 -05:00
init init: handle bootloader identifier in kernel parameters 2025-10-19 16:33:50 +02:00
io_uring io_uring/zctx: check chained notif contexts 2025-11-13 15:34:03 -05:00
ipc ipc: fix to protect IPCS lookups using RCU 2025-06-27 11:11:22 +01:00
kernel futex: Don't leak robust_list pointer on exec race 2025-11-13 15:34:06 -05:00
lib kunit: test_dev_action: Correctly cast 'priv' pointer to long* 2025-11-13 15:33:57 -05:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm mm: prevent poison consumption when splitting THP 2025-10-29 14:08:57 +01:00
net Bluetooth: hci_core: Fix tracking of periodic advertisement 2025-11-13 15:33:58 -05:00
rust mm/ksm: fix flag-dropping behavior in ksm_madvise 2025-10-23 16:20:47 +02:00
samples ftrace/samples: Fix function size computation 2025-09-19 16:35:44 +02:00
scripts gcc-plugins: Remove TODO_verify_il for GCC >= 16 2025-10-06 11:17:52 +02:00
security KEYS: trusted_tpm1: Compare HMAC values in constant time 2025-10-19 16:33:51 +02:00
sound ASoC: fsl_sai: Fix sync error in consumer mode 2025-11-13 15:33:59 -05:00
tools selftests/bpf: Fix selftest verifier_arena_large failure 2025-11-13 15:34:06 -05:00
usr kbuild: hdrcheck: fix cross build with clang 2025-03-13 13:02:18 +01:00
virt KVM: Allow CPU to reschedule while setting per-page memory attributes 2025-07-17 18:37:08 +02:00
.clang-format clang-format: Update with v6.11-rc1's for_each macro list 2024-08-02 13:20:31 +02:00
.clippy.toml rust: give Clippy the minimum supported Rust version 2025-08-01 09:48:44 +01:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore Add Jeff Kirsher to .get_maintainer.ignore 2024-03-08 11:36:54 +00:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore rust: introduce .clippy.toml 2025-03-13 13:01:42 +01:00
.mailmap mailmap: add entry for Thorsten Blum 2024-11-07 14:14:59 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING
CREDITS MAINTAINERS: Remove self from DSA entry 2024-11-03 12:52:38 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: Update Alexey Makhalov's email address 2025-05-22 14:29:46 +02:00
Makefile Linux 6.12.57 2025-11-02 22:15:23 +09:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel

There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first.

In order to build the documentation, use make htmldocs or make pdfdocs. The formatted documentation can also be read online at:

https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory, several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.