Go to file
Danilo Krummrich f744201c61 rust: devres: fix race in Devres::drop()
In Devres::drop() we first remove the devres action and then drop the
wrapped device resource.

The design goal is to give the owner of a Devres object control over when
the device resource is dropped, but limit the overall scope to the
corresponding device being bound to a driver.

However, there's a race that was introduced with commit 8ff656643d
("rust: devres: remove action in `Devres::drop`"), but also has been
(partially) present from the initial version on.

In Devres::drop(), the devres action is removed successfully and
subsequently the destructor of the wrapped device resource runs.
However, there is no guarantee that the destructor of the wrapped device
resource completes before the driver core is done unbinding the
corresponding device.

If in Devres::drop(), the devres action can't be removed, it means that
the devres callback has been executed already, or is still running
concurrently. In case of the latter, either Devres::drop() wins revoking
the Revocable or the devres callback wins revoking the Revocable. If
Devres::drop() wins, we (again) have no guarantee that the destructor of
the wrapped device resource completes before the driver core is done
unbinding the corresponding device.

CPU0					CPU1
------------------------------------------------------------------------
Devres::drop() {			Devres::devres_callback() {
   self.data.revoke() {			   this.data.revoke() {
      is_available.swap() == true
					      is_available.swap == false
					   }
					}

					// [...]
					// device fully unbound
      drop_in_place() {
         // release device resource
      }
   }
}

Depending on the specific device resource, this can potentially lead to
user-after-free bugs.

In order to fix this, implement the following logic.

In the devres callback, we're always good when we get to revoke the
device resource ourselves, i.e. Revocable::revoke() returns true.

If Revocable::revoke() returns false, it means that Devres::drop(),
concurrently, already drops the device resource and we have to wait for
Devres::drop() to signal that it finished dropping the device resource.

Note that if we hit the case where we need to wait for the completion of
Devres::drop() in the devres callback, it means that we're actually
racing with a concurrent Devres::drop() call, which already started
revoking the device resource for us. This is rather unlikely and means
that the concurrent Devres::drop() already started doing our work and we
just need to wait for it to complete it for us. Hence, there should not
be any additional overhead from that.

(Actually, for now it's even better if Devres::drop() does the work for
us, since it can bypass the synchronize_rcu() call implied by
Revocable::revoke(), but this goes away anyways once I get to implement
the split devres callback approach, which allows us to first flip the
atomics of all registered Devres objects of a certain device, execute a
single synchronize_rcu() and then drop all revocable objects.)

In Devres::drop() we try to revoke the device resource. If that is *not*
successful, it means that the devres callback already did and we're good.

Otherwise, we try to remove the devres action, which, if successful,
means that we're good, since the device resource has just been revoked
by us *before* we removed the devres action successfully.

If the devres action could not be removed, it means that the devres
callback must be running concurrently, hence we signal that the device
resource has been revoked by us, using the completion.

This makes it safe to drop a Devres object from any task and at any point
of time, which is one of the design goals.

Fixes: 76c01ded72 ("rust: add devres abstraction")
Reported-by: Alice Ryhl <aliceryhl@google.com>
Closes: https://lore.kernel.org/lkml/aD64YNuqbPPZHAa5@google.com/
Reviewed-by: Benno Lossin <lossin@kernel.org>
Link: https://lore.kernel.org/r/20250612121817.1621-4-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-06-13 23:47:53 +02:00
arch The delayed from_timer() API cleanup: 2025-06-08 11:33:00 -07:00
block treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto EFI updates for v6.16 2025-05-30 12:42:57 -07:00
Documentation fourteen smb3 client fixes, most smbdirect related 2025-06-08 10:20:21 -07:00
drivers The delayed from_timer() API cleanup: 2025-06-08 11:33:00 -07:00
fs The delayed from_timer() API cleanup: 2025-06-08 11:33:00 -07:00
include The delayed from_timer() API cleanup: 2025-06-08 11:33:00 -07:00
init Rust changes for v6.16 2025-06-04 21:18:37 -07:00
io_uring io_uring-6.16-20250606 2025-06-06 13:09:03 -07:00
ipc - The 3 patch series "hung_task: extend blocking task stacktrace dump to 2025-05-31 19:12:53 -07:00
kernel The delayed from_timer() API cleanup: 2025-06-08 11:33:00 -07:00
lib 13 hotfixes. 6 are cc:stable and the remainder address post-6.15 issues 2025-06-06 21:45:45 -07:00
LICENSES LICENSES: add CC0-1.0 license text 2025-05-21 14:54:17 +02:00
mm treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
net treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
rust rust: devres: fix race in Devres::drop() 2025-06-13 23:47:53 +02:00
samples - The 3 patch series "hung_task: extend blocking task stacktrace dump to 2025-05-31 19:12:53 -07:00
scripts Kbuild updates for v6.16 2025-06-07 10:05:35 -07:00
security require gcc-8 and binutils-2.30 2025-05-31 08:16:52 -07:00
sound treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
tools turbostat v2025.06.08 2025-06-08 11:44:41 -07:00
usr usr/include: openrisc: don't HDRTEST bpf_perf_event.h 2025-05-12 15:03:17 +09:00
virt Merge branch 'kvm-lockdep-common' into HEAD 2025-05-28 06:29:17 -04:00
.clang-format Linux 6.15-rc5 2025-05-06 16:39:25 +10:00
.clippy.toml rust: clean Rust 1.88.0's warning about clippy::disallowed_macros configuration 2025-05-07 00:11:47 +02:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes
.gitignore .gitignore: ignore Python compiled bytecode 2025-04-24 10:12:46 -06:00
.mailmap fourteen smb3 client fixes, most smbdirect related 2025-06-08 10:20:21 -07:00
.pylintrc docs: add a .pylintrc file with sys path for docs scripts 2025-04-09 12:10:33 -06:00
.rustfmt.toml
COPYING
CREDITS Update Christoph's Email address and make it consistent 2025-05-12 23:50:31 -07:00
Kbuild drm: ensure drm headers are self-contained and pass kernel-doc 2025-02-12 10:44:43 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS fourteen smb3 client fixes, most smbdirect related 2025-06-08 10:20:21 -07:00
Makefile Linux 6.16-rc1 2025-06-08 13:44:43 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel

There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first.

In order to build the documentation, use make htmldocs or make pdfdocs. The formatted documentation can also be read online at:

https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory, several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.