-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQigAKCRCRxhvAZXjc
ojqsAQCI9BOJV953LWPr9fmmFVz78wA4kTxkFFDAxwH/0iYtmQD8DORngPfJERI1
FwTY/gk2N744vUpbbcSM/KOnM3ss8wE=
=B3rD
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.18-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs writeback updates from Christian Brauner:
"This contains work adressing lockups reported by users when a systemd
unit reading lots of files from a filesystem mounted with the lazytime
mount option exits.
With the lazytime mount option enabled we can be switching many dirty
inodes on cgroup exit to the parent cgroup. The numbers observed in
practice when systemd slice of a large cron job exits can easily reach
hundreds of thousands or millions.
The logic in inode_do_switch_wbs() which sorts the inode into
appropriate place in b_dirty list of the target wb however has linear
complexity in the number of dirty inodes thus overall time complexity
of switching all the inodes is quadratic leading to workers being
pegged for hours consuming 100% of the CPU and switching inodes to the
parent wb.
Simple reproducer of the issue:
FILES=10000
# Filesystem mounted with lazytime mount option
MNT=/mnt/
echo "Creating files and switching timestamps"
for (( j = 0; j < 50; j ++ )); do
mkdir $MNT/dir$j
for (( i = 0; i < $FILES; i++ )); do
echo "foo" >$MNT/dir$j/file$i
done
touch -a -t 202501010000 $MNT/dir$j/file*
done
wait
echo "Syncing and flushing"
sync
echo 3 >/proc/sys/vm/drop_caches
echo "Reading all files from a cgroup"
mkdir /sys/fs/cgroup/unified/mycg1 || exit
echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit
for (( j = 0; j < 50; j ++ )); do
cat /mnt/dir$j/file* >/dev/null &
done
wait
echo "Switching wbs"
# Now rmdir the cgroup after the script exits
This can be solved by:
- Avoiding contention on the wb->list_lock when switching inodes by
running a single work item per wb and managing a queue of items
switching to the wb
- Allowing rescheduling when switching inodes over to a different
cgroup to avoid softlockups
- Maintaining the b_dirty list ordering instead of sorting it"
* tag 'vfs-6.18-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
writeback: Add tracepoint to track pending inode switches
writeback: Avoid excessively long inode switching times
writeback: Avoid softlockup when switching many inodes
writeback: Avoid contention on wb->list_lock when switching inodes
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQgQAKCRCRxhvAZXjc
oiFXAQCpbLvkWbld9wLgxUBhq+q+kw5NvGxzpvqIhXwJB9F9YAEA44/Wevln4xGx
+kRUbP+xlRQqenIYs2dLzVHzAwAdfQ4=
=EO4Y
-----END PGP SIGNATURE-----
Merge tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace updates from Christian Brauner:
"This contains a larger set of changes around the generic namespace
infrastructure of the kernel.
Each specific namespace type (net, cgroup, mnt, ...) embedds a struct
ns_common which carries the reference count of the namespace and so
on.
We open-coded and cargo-culted so many quirks for each namespace type
that it just wasn't scalable anymore. So given there's a bunch of new
changes coming in that area I've started cleaning all of this up.
The core change is to make it possible to correctly initialize every
namespace uniformly and derive the correct initialization settings
from the type of the namespace such as namespace operations, namespace
type and so on. This leaves the new ns_common_init() function with a
single parameter which is the specific namespace type which derives
the correct parameters statically. This also means the compiler will
yell as soon as someone does something remotely fishy.
The ns_common_init() addition also allows us to remove ns_alloc_inum()
and drops any special-casing of the initial network namespace in the
network namespace initialization code that Linus complained about.
Another part is reworking the reference counting. The reference
counting was open-coded and copy-pasted for each namespace type even
though they all followed the same rules. This also removes all open
accesses to the reference count and makes it private and only uses a
very small set of dedicated helpers to manipulate them just like we do
for e.g., files.
In addition this generalizes the mount namespace iteration
infrastructure introduced a few cycles ago. As reminder, the vfs makes
it possible to iterate sequentially and bidirectionally through all
mount namespaces on the system or all mount namespaces that the caller
holds privilege over. This allow userspace to iterate over all mounts
in all mount namespaces using the listmount() and statmount() system
call.
Each mount namespace has a unique identifier for the lifetime of the
systems that is exposed to userspace. The network namespace also has a
unique identifier working exactly the same way. This extends the
concept to all other namespace types.
The new nstree type makes it possible to lookup namespaces purely by
their identifier and to walk the namespace list sequentially and
bidirectionally for all namespace types, allowing userspace to iterate
through all namespaces. Looking up namespaces in the namespace tree
works completely locklessly.
This also means we can move the mount namespace onto the generic
infrastructure and remove a bunch of code and members from struct
mnt_namespace itself.
There's a bunch of stuff coming on top of this in the future but for
now this uses the generic namespace tree to extend a concept
introduced first for pidfs a few cycles ago. For a while now we have
supported pidfs file handles for pidfds. This has proven to be very
useful.
This extends the concept to cover namespaces as well. It is possible
to encode and decode namespace file handles using the common
name_to_handle_at() and open_by_handle_at() apis.
As with pidfs file handles, namespace file handles are exhaustive,
meaning it is not required to actually hold a reference to nsfs in
able to decode aka open_by_handle_at() a namespace file handle.
Instead the FD_NSFS_ROOT constant can be passed which will let the
kernel grab a reference to the root of nsfs internally and thus decode
the file handle.
Namespaces file descriptors can already be derived from pidfds which
means they aren't subject to overmount protection bugs. IOW, it's
irrelevant if the caller would not have access to an appropriate
/proc/<pid>/ns/ directory as they could always just derive the
namespace based on a pidfd already.
It has the same advantage as pidfds. It's possible to reliably and for
the lifetime of the system refer to a namespace without pinning any
resources and to compare them trivially.
Permission checking is kept simple. If the caller is located in the
namespace the file handle refers to they are able to open it otherwise
they must hold privilege over the owning namespace of the relevant
namespace.
The namespace file handle layout is exposed as uapi and has a stable
and extensible format. For now it simply contains the namespace
identifier, the namespace type, and the inode number. The stable
format means that userspace may construct its own namespace file
handles without going through name_to_handle_at() as they are already
allowed for pidfs and cgroup file handles"
* tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits)
ns: drop assert
ns: move ns type into struct ns_common
nstree: make struct ns_tree private
ns: add ns_debug()
ns: simplify ns_common_init() further
cgroup: add missing ns_common include
ns: use inode initializer for initial namespaces
selftests/namespaces: verify initial namespace inode numbers
ns: rename to __ns_ref
nsfs: port to ns_ref_*() helpers
net: port to ns_ref_*() helpers
uts: port to ns_ref_*() helpers
ipv4: use check_net()
net: use check_net()
net-sysfs: use check_net()
user: port to ns_ref_*() helpers
time: port to ns_ref_*() helpers
pid: port to ns_ref_*() helpers
ipc: port to ns_ref_*() helpers
cgroup: port to ns_ref_*() helpers
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQeQAKCRCRxhvAZXjc
ohhnAQCrtEWukO2GOLdoZdB9nuijIhkE0kNnq3OoMnD/pWE9hQEA3hjJts0o2QQX
LcU62eEUg2airLRMSessPxhoMaJnVgo=
=JTvC
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.18-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull afs updates from Christian Brauner:
"This contains the change to enable afs to support RENAME_NOREPLACE and
RENAME_EXCHANGE if the server supports it"
* tag 'vfs-6.18-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
afs: Add support for RENAME_NOREPLACE and RENAME_EXCHANGE
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZgMQAKCRCRxhvAZXjc
ornXAP954dZjz+OJw6lJLCf0j9TXJOczGHvK3oW5ZD9KnqtTdwEA7p1A6WMOKJyl
8VtTgCS0yNt8QlznUnsSDfVm0jXVGAY=
=tUXG
-----END PGP SIGNATURE-----
Merge tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull copy_process updates from Christian Brauner:
"This contains the changes to enable support for clone3() on nios2
which apparently is still a thing.
The more exciting part of this is that it cleans up the inconsistency
in how the 64-bit flag argument is passed from copy_process() into the
various other copy_*() helpers"
[ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ]
* tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nios2: implement architecture-specific portion of sys_clone3
arch: copy_thread: pass clone_flags as u64
copy_process: pass clone_flags as u64 across calltree
copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQYgAKCRCRxhvAZXjc
olgGAQDWr4sD7kUt8TxifdAXsQNgyGG8qOUkb/BHHSqJ/5mKvAEAlTwJ+81tgNKT
hYYdPyvWdbgW6CnWeiQLi0JjpFvUPQU=
=uHwG
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs workqueue updates from Christian Brauner:
"This contains various workqueue changes affecting the filesystem
layer.
Currently if a user enqueue a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies
to schedule_work() that is using system_wq and queue_work(), that
makes use again of WORK_CPU_UNBOUND.
This replaces the use of system_wq and system_unbound_wq. system_wq is
a per-CPU workqueue which isn't very obvious from the name and
system_unbound_wq is to be used when locality is not required.
So this renames system_wq to system_percpu_wq, and system_unbound_wq
to system_dfl_wq.
This also adds a new WQ_PERCPU flag to allow the fs subsystem users to
explicitly request the use of per-CPU behavior. Both WQ_UNBOUND and
WQ_PERCPU flags coexist for one release cycle to allow callers to
transition their calls. WQ_UNBOUND will be removed in a next release
cycle"
* tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: WQ_PERCPU added to alloc_workqueue users
fs: replace use of system_wq with system_percpu_wq
fs: replace use of system_unbound_wq with system_dfl_wq
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQXAAKCRCRxhvAZXjc
ov62AQChMqYusZ9YfhcoH+ijaMjFTcysxvNJ9VOmgE5SUQCstAEAl5ogJG7Pybnc
rdZN+mCDxR4I47jGmudFh/9jIZcldgk=
=gyOG
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.18-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs rust updates from Christian Brauner:
"This contains a few minor vfs rust changes:
- Add the pid namespace Rust wrappers to the correct MAINTAINERS
entry
- Use to_result() in the Rust file error handling code
- Update imports for fs and pid_namespce Rust wrappers"
* tag 'vfs-6.18-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
rust: file: use to_result for error handling
pid: add Rust files to MAINTAINERS
rust: fs: update ARef and AlwaysRefCounted imports from sync::aref
rust: pid_namespace: update AlwaysRefCounted imports from sync::aref
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQVAAKCRCRxhvAZXjc
oqs7AP9zh3RdSi/M4LooS+eqKmrQensr2UzUmvXkjCPpld7rVgD/TmSYnRCuSFFE
B72t7OrjDRf3S7uvV0CXEwx7s+B2Uwk=
=xQC4
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.18-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:
"This just contains a few changes to pid_nr_ns() to make it more robust
and cleans up or improves a few users that ab- or misuse it currently"
* tag 'vfs-6.18-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
pid: change task_state() to use task_ppid_nr_ns()
pid: change bacct_add_tsk() to use task_ppid_nr_ns()
pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers
pid: Add a judgment for ns null in pid_nr_ns
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQTQAKCRCRxhvAZXjc
osmAAQCtiJC1AJi+LE+uHNKjlTCm5c6X7Oo0YPgqJ9tNlCoVpwEA2Rmu/r/WA0U1
MJPRbdHgBWXh1kmcdtE735tmfgrREgE=
=tL5r
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.18-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs iomap updates from Christian Brauner:
"This contains minor fixes and cleanups to the iomap code.
Nothing really stands out here"
* tag 'vfs-6.18-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: error out on file IO when there is no inline_data buffer
iomap: trace iomap_zero_iter zeroing activities
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQQgAKCRCRxhvAZXjc
oud9AQD5IG4sNnzCjsvcTDpQkbX5eZW+LFIiAiiN+nztZ+OcRQEAvC2N7YovfqM3
TWpVoNDKvEPdtDc9ttFMUKqBZYvxvgE=
=sEaL
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode updates from Christian Brauner:
"This contains a series I originally wrote and that Eric brought over
the finish line. It moves out the i_crypt_info and i_verity_info
pointers out of 'struct inode' and into the fs-specific part of the
inode.
So now the few filesytems that actually make use of this pay the price
in their own private inode storage instead of forcing it upon every
user of struct inode.
The pointer for the crypt and verity info is simply found by storing
an offset to its address in struct fsverity_operations and struct
fscrypt_operations. This shrinks struct inode by 16 bytes.
I hope to move a lot more out of it in the future so that struct inode
becomes really just about very core stuff that we need, much like
struct dentry and struct file, instead of the dumping ground it has
become over the years.
On top of this are a various changes associated with the ongoing inode
lifetime handling rework that multiple people are pushing forward:
- Stop accessing inode->i_count directly in f2fs and gfs2. They
simply should use the __iget() and iput() helpers
- Make the i_state flags an enum
- Rework the iput() logic
Currently, if we are the last iput, and we have the I_DIRTY_TIME
bit set, we will grab a reference on the inode again and then mark
it dirty and then redo the put. This is to make sure we delay the
time update for as long as possible
We can rework this logic to simply dec i_count if it is not 1, and
if it is do the time update while still holding the i_count
reference
Then we can replace the atomic_dec_and_lock with locking the
->i_lock and doing atomic_dec_and_test, since we did the
atomic_add_unless above
- Add an icount_read() helper and convert everyone that accesses
inode->i_count directly for this purpose to use the helper
- Expand dump_inode() to dump more information about an inode helping
in debugging
- Add some might_sleep() annotations to iput() and associated
helpers"
* tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: add might_sleep() annotation to iput() and more
fs: expand dump_inode()
inode: fix whitespace issues
fs: add an icount_read helper
fs: rework iput logic
fs: make the i_state flags an enum
fs: stop accessing ->i_count directly in f2fs and gfs2
fsverity: check IS_VERITY() in fsverity_cleanup_inode()
fs: remove inode::i_verity_info
btrfs: move verity info pointer to fs-specific part of inode
f2fs: move verity info pointer to fs-specific part of inode
ext4: move verity info pointer to fs-specific part of inode
fsverity: add support for info in fs-specific part of inode
fs: remove inode::i_crypt_info
ceph: move crypt info pointer to fs-specific part of inode
ubifs: move crypt info pointer to fs-specific part of inode
f2fs: move crypt info pointer to fs-specific part of inode
ext4: move crypt info pointer to fs-specific part of inode
fscrypt: add support for info in fs-specific part of inode
fscrypt: replace raw loads of info pointer with helper function
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQOwAKCRCRxhvAZXjc
oth/AQDvlOo+23/f2djgDGS8akjkBYVLW14OYzC0q5cbEnnGgAEAycHL50pp3n1o
3jMYlCByuv507vpCsDupo7QcJapmQAk=
=/NwE
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
"This contains some work around mount api handling:
- Output the warning message for mnt_too_revealing() triggered during
fsmount() to the fscontext log. This makes it possible for the
mount tool to output appropriate warnings on the command line.
For example, with the newest fsopen()-based mount(8) from
util-linux, the error messages now look like:
# mount -t proc proc /tmp
mount: /tmp: fsmount() failed: VFS: Mount too revealing.
dmesg(1) may have more information after failed mount system call.
- Do not consume fscontext log entries when returning -EMSGSIZE
Userspace generally expects APIs that return -EMSGSIZE to allow for
them to adjust their buffer size and retry the operation.
However, the fscontext log would previously clear the message even
in the -EMSGSIZE case.
Given that it is very cheap for us to check whether the buffer is
too small before we remove the message from the ring buffer, let's
just do that instead.
- Drop an unused argument from do_remount()"
* tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: fs/namespace.c: remove ms_flags argument from do_remount
selftests/filesystems: add basic fscontext log tests
fscontext: do not consume log entries when returning -EMSGSIZE
vfs: output mount_too_revealing() errors to fscontext
docs/vfs: Remove mentions to the old mount API helpers
fscontext: add custom-prefix log helpers
fs: Remove mount_bdev
fs: Remove mount_nodev
When calling in listmount() mnt_ns_release() may be passed a NULL
pointer. Handle that case gracefully.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQMQAKCRCRxhvAZXjc
omNLAQCgrwzd9sa1JTlixweu3OAxQlSEbLuMpEv7Ztm+B7Wz0AD9HtwPC44Kev03
GbMcB2DCFLC4evqYECj6IG7NBmoKsAs=
=1ICf
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Add "initramfs_options" parameter to set initramfs mount options.
This allows to add specific mount options to the rootfs to e.g.,
limit the memory size
- Add RWF_NOSIGNAL flag for pwritev2()
Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
signal from being raised when writing on disconnected pipes or
sockets. The flag is handled directly by the pipe filesystem and
converted to the existing MSG_NOSIGNAL flag for sockets
- Allow to pass pid namespace as procfs mount option
Ever since the introduction of pid namespaces, procfs has had very
implicit behaviour surrounding them (the pidns used by a procfs
mount is auto-selected based on the mounting process's active
pidns, and the pidns itself is basically hidden once the mount has
been constructed)
This implicit behaviour has historically meant that userspace was
required to do some special dances in order to configure the pidns
of a procfs mount as desired. Examples include:
* In order to bypass the mnt_too_revealing() check, Kubernetes
creates a procfs mount from an empty pidns so that user
namespaced containers can be nested (without this, the nested
containers would fail to mount procfs)
But this requires forking off a helper process because you cannot
just one-shot this using mount(2)
* Container runtimes in general need to fork into a container
before configuring its mounts, which can lead to security issues
in the case of shared-pidns containers (a privileged process in
the pidns can interact with your container runtime process)
While SUID_DUMP_DISABLE and user namespaces make this less of an
issue, the strict need for this due to a minor uAPI wart is kind
of unfortunate
Things would be much easier if there was a way for userspace to
just specify the pidns they want. So this pull request contains
changes to implement a new "pidns" argument which can be set
using fsconfig(2):
fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
or classic mount(2) / mount(8):
// mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
Cleanups:
- Remove the last references to EXPORT_OP_ASYNC_LOCK
- Make file_remove_privs_flags() static
- Remove redundant __GFP_NOWARN when GFP_NOWAIT is used
- Use try_cmpxchg() in start_dir_add()
- Use try_cmpxchg() in sb_init_done_wq()
- Replace offsetof() with struct_size() in ioctl_file_dedupe_range()
- Remove vfs_ioctl() export
- Replace rwlock() with spinlock in epoll code as rwlock causes
priority inversion on preempt rt kernels
- Make ns_entries in fs/proc/namespaces const
- Use a switch() statement() in init_special_inode() just like we do
in may_open()
- Use struct_size() in dir_add() in the initramfs code
- Use str_plural() in rd_load_image()
- Replace strcpy() with strscpy() in find_link()
- Rename generic_delete_inode() to inode_just_drop() and
generic_drop_inode() to inode_generic_drop()
- Remove unused arguments from fcntl_{g,s}et_rw_hint()
Fixes:
- Document @name parameter for name_contains_dotdot() helper
- Fix spelling mistake
- Always return zero from replace_fd() instead of the file descriptor
number
- Limit the size for copy_file_range() in compat mode to prevent a
signed overflow
- Fix debugfs mount options not being applied
- Verify the inode mode when loading it from disk in minixfs
- Verify the inode mode when loading it from disk in cramfs
- Don't trigger automounts with RESOLVE_NO_XDEV
If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
through automounts, but could still trigger them
- Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
tracepoints
- Fix unused variable warning in rd_load_image() on s390
- Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD
- Use ns_capable_noaudit() when determining net sysctl permissions
- Don't call path_put() under namespace semaphore in listmount() and
statmount()"
* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
fcntl: trim arguments
listmount: don't call path_put() under namespace semaphore
statmount: don't call path_put() under namespace semaphore
pid: use ns_capable_noaudit() when determining net sysctl permissions
fs: rename generic_delete_inode() and generic_drop_inode()
init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
initramfs: Replace strcpy() with strscpy() in find_link()
initrd: Use str_plural() in rd_load_image()
initramfs: Use struct_size() helper to improve dir_add()
initrd: Fix unused variable warning in rd_load_image() on s390
fs: use the switch statement in init_special_inode()
fs/proc/namespaces: make ns_entries const
filelock: add FL_RECLAIM to show_fl_flags() macro
eventpoll: Replace rwlock with spinlock
selftests/proc: add tests for new pidns APIs
procfs: add "pidns" mount option
pidns: move is-ancestor logic to helper
openat2: don't trigger automounts with RESOLVE_NO_XDEV
namei: move cross-device check to __traverse_mounts
namei: remove LOOKUP_NO_XDEV check from handle_mounts
...
There's a silly problem with the CC_HAS_ASM_GOTO_OUTPUT test: even with
a working compiler it will fail on some architectures simply because it
uses the mnemonic "jmp" for testing the inline asm.
And as reported by Geert, not all architectures use that mnemonic, so
the test fails spuriously on such platforms (including arm and riscv,
but also several other architectures).
This issue avoided any obvious test failures because the build still
works thanks to falling back on the old non-asm-goto code, which just
generates worse code.
Just use an empty asm statement instead.
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Fixes: e2ffa15b9b ("kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17")
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just one fix to the module freeing function that was declared __weak
when it should not have been. Thanks to Petr Pavlu for spotting this.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAmjFWbMACgkQ9OeQG+St
rGR9tQ/9FJF4N6gk/00P9V+KjQOXS9YuwWxa8gWiuSxU1F2A8sQu4WRkXbOFhSsP
RvXQW8QnczK4mHSYUr+UxFmmLQG4j3II5gZfpEMpdcjnBuYggWtO9h6msBj4A9Mg
eF/lA8bcyxDtd0GDgz3JTqc04Rb1gQ8TJW7qWdm1Y/zj9zwF6kir+cky5pb09s7c
eBFNBViNvzNP6c8da0Aj7CI8KYMO746MgIjrk87NVTJa/cFXd/L//0XqTa6JQg+x
RN3Di9NpyyQ+Yz3nEhDE8cf/dDaSQl5WaOcw6jdGebi00O96P3i9lCTtLyl1Um1S
tOE66hBpE9jX9IiAdaEOo9WQWGh/2R7j2svGHC0FnAwovK1afrIeUcLYM1VzwqxP
0lbo3j9MdHWYo0piIMm4SA13IdpT1xAVGDqpmeBjGKkL3MmmycxkOLOgx/3K3mK0
HivC7iE4cy2lXbc6g3BbO8BL7c3K9NADpwWSOzjP130cmHh0oBKe4mgMyGAEtFW5
5+8wO/8e5EoaDWe8ph+YGVw3dzszgn38TmRiA6D6+Pf8N+tn5ysU7p9fwLuMTXb8
06nV0qOfeq1CJ1KZgy4l8973q3TYb6WBTzEOmAigakRMiIW9qw74Aqo2FRZ35SE8
SNaE+lyMlYVdlc+qAa7oPGy300tQ9WnYfUEFUFLzMfkHcDfkKLw=
=fjZ9
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM fix from Russell King:
"Just one fix to the module freeing function that was declared __weak
when it should not have been. Thanks to Petr Pavlu for spotting this"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9458/1: module: Ensure the override of module_arch_freeing_init()
- Fix buffer overflow in osnoise_cpu_write()
The allocated buffer to read user space did not add a nul terminating byte
after copying from user the string. It then reads the string, and if user
space did not add a nul byte, the read will continue beyond the string.
Add a nul terminating byte after reading the string.
- Fix missing check for lockdown on tracing
There's a path from kprobe events or uprobe events that can update the
tracing system even if lockdown on tracing is activate. Add a check in the
dynamic event path.
- Add a recursion check for the function graph return path
Now that fprobes can hook to the function graph tracer and call different
code between the entry and the exit, the exit code may now call functions
that are not called in entry. This means that the exit handler can possibly
trigger recursion that is not caught and cause the system to crash.
Add the same recursion checks in the function exit handler as exists in the
entry handler path.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaNkbyxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qh6dAQDTFqLRb01RzaZuF/xG9A7UqNz9abq5
fQVwu1RG9xXnnAD/X9PfKfnqLhK/M2EJZT17PJ+nUlFqFoVL6lLJyrDLSw4=
=4j5J
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix buffer overflow in osnoise_cpu_write()
The allocated buffer to read user space did not add a nul terminating
byte after copying from user the string. It then reads the string,
and if user space did not add a nul byte, the read will continue
beyond the string.
Add a nul terminating byte after reading the string.
- Fix missing check for lockdown on tracing
There's a path from kprobe events or uprobe events that can update
the tracing system even if lockdown on tracing is activate. Add a
check in the dynamic event path.
- Add a recursion check for the function graph return path
Now that fprobes can hook to the function graph tracer and call
different code between the entry and the exit, the exit code may now
call functions that are not called in entry. This means that the exit
handler can possibly trigger recursion that is not caught and cause
the system to crash.
Add the same recursion checks in the function exit handler as exists
in the entry handler path.
* tag 'trace-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: fgraph: Protect return handler from recursion loop
tracing: dynevent: Add a missing lockdown check on dynevent
tracing/osnoise: Fix slab-out-of-bounds in _parse_integer_limit()
A few final driver specific fixes that have been sitting in -next for a
bit, the OMAP issue is likely to come up very infrequently since mixed
configuration SPI buses are rare and the Cadence issue is specific to
SoCFPGA systems.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmjZIHkACgkQJNaLcl1U
h9CAXQf/dTgcvdcuVlGX+Rs5NPo9/9M/n1B6sIH3TJCULU1fwht2XYu4ghUfVVmp
lSkHL+NYbXSUJ9lv+ZPJUjU9mlPOAjtFHX5X8I0jaiXeC8/23LQl79TuTvCfqpOF
MyoxJFEcA/nZiwn/IYybl8VMonoem5aJdXncDg70gCvQwWzxeNGAYW0NXdVk8gJz
YqC9ANI4wCyBcwokFJ9eBoz79XGPM2zDNPnYQnpW8lbJ3z0ec8l9IeAloS90bSb9
CY5RgXmFdALFMcrAYsIQl4WlZocb+NQ3S6KN2hHVUJW6A+YlW0H4YREMSmp1/Okg
boN54WdWHYmMFFF8Mrpi6w2ac1VCLQ==
=BUvX
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few final driver specific fixes that have been sitting in -next for
a bit.
The OMAP issue is likely to come up very infrequently since mixed
configuration SPI buses are rare and the Cadence issue is specific to
SoCFPGA systems"
* tag 'spi-fix-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: omap2-mcspi: drive SPI_CLK on transfer_setup()
spi: cadence-qspi: defer runtime support on socfpga if reset bit is enabled
aren't considered necessary for -stable kernels. 6 of these fixes are for
MM.
All singletons, please see the changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaNjJPAAKCRDdBJ7gKXxA
jhQiAP0TkBGRBu/IbHLDh39SHANINiM6y+FvOAPTR+Jyp9maSwD/RZlLZ0TnEeoL
htD9hxxVCRYxW5LrrQLxagrTnnOC6gA=
=B6P4
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"7 hotfixes. 4 are cc:stable and the remainder address post-6.16 issues
or aren't considered necessary for -stable kernels. 6 of these fixes
are for MM.
All singletons, please see the changelogs for details"
* tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
include/linux/pgtable.h: convert arch_enter_lazy_mmu_mode() and friends to static inlines
mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call()
mailmap: add entry for Bence Csókás
fs/proc/task_mmu: check p->vec_buf for NULL
kmsan: fix out-of-bounds access to shadow memory
mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
mm/hugetlb: fix folio is still mapped when deleted
While applying the patch for commit ede965fd55 ("i2c: rtl9300: remove
broken SMBus Quick operation support"), a conflict was incorrectly solved
by adding the I2C_FUNC_SMBUS_I2C_BLOCK feature flag. But the code to handle
I2C_SMBUS_I2C_BLOCK_DATA requests will be added by a separate commit.
Fixes: ede965fd55 ("i2c: rtl9300: remove broken SMBus Quick operation support")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Add a MAINTAINERS entry for the SpacemiT K1 I2C driver and its DT binding.
Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
I volunteered as maintainer of the DesignWare I2C driver, so update my
entry from reviewer to maintainer.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
The email address bounced. I couldn't find a newer one in recent git history,
so delete this email entry.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
- Fix a buffer overflow in actions_parse()
The "trigger_c" variable did not account for the nul byte when
determining its size.
- Fix a compare that had the values reversed
actions_destroy() is to reallocate when len is greater than the current size,
but the compare was testing if size is greater than the new length.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaNfbFhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qmozAPwKEWRb1gbDIeeva7r6HVRc1jKO3EfK
qT72fgqfKlwUawD/fM3mlW1+n25ZHMX+1e8eQV1CP5VOldgdQEHFzDEz0gI=
=i8D5
-----END PGP SIGNATURE-----
Merge tag 'trace-tools-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull rtla tool fixes from Steven Rostedt:
- Fix a buffer overflow in actions_parse()
The "trigger_c" variable did not account for the nul byte when
determining its size
- Fix a compare that had the values reversed
actions_destroy() is supposed to reallocate when len is greater than
the current size, but the compare was testing if size is greater than
the new length
* tag 'trace-tools-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rtla/actions: Fix condition for buffer reallocation
rtla: Fix buffer overflow in actions_parse
function_graph_enter_regs() prevents itself from recursion by
ftrace_test_recursion_trylock(), but __ftrace_return_to_handler(),
which is called at the exit, does not prevent such recursion.
Therefore, while it can prevent recursive calls from
fgraph_ops::entryfunc(), it is not able to prevent recursive calls
to fgraph from fgraph_ops::retfunc(), resulting in a recursive loop.
This can lead an unexpected recursion bug reported by Menglong.
is_endbr() is called in __ftrace_return_to_handler -> fprobe_return
-> kprobe_multi_link_exit_handler -> is_endbr.
To fix this issue, acquire ftrace_test_recursion_trylock() in the
__ftrace_return_to_handler() after unwind the shadow stack to mark
this section must prevent recursive call of fgraph inside user-defined
fgraph_ops::retfunc().
This is essentially a fix to commit 4346ba1604 ("fprobe: Rewrite
fprobe on function-graph tracer"), because before that fgraph was
only used from the function graph tracer. Fprobe allowed user to run
any callbacks from fgraph after that commit.
Reported-by: Menglong Dong <menglong8.dong@gmail.com>
Closes: https://lore.kernel.org/all/20250918120939.1706585-1-dongml2@chinatelecom.cn/
Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/175852292275.307379.9040117316112640553.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Menglong Dong <menglong8.dong@gmail.com>
Acked-by: Menglong Dong <menglong8.dong@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The condition to check if the actions buffer needs to be resized was
incorrect. The check `self->size >= self->len` would evaluate to
true on almost every call to `actions_new()`, causing the buffer to
be reallocated unnecessarily each time an action was added.
Fix the condition to `self->len >= self.size`, ensuring
that the buffer is only resized when it is actually full.
Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250915181101.52513-1-wander@redhat.com
Fixes: 6ea082b171 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Two small RISC-V fixes for v6.17-rc8:
- A race-free implementation of pudp_huge_get_and_clear() (based on the x86
code)
- A MAINTAINERS update to my E-mail address
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmjWyl4ACgkQx4+xDQu9
Kkt2UA/7BS/T/S307K/4bwFgZ9v5aketYCe5Qy0SLaIl6SpY66TY6BNy2USFQtVz
m2hWFrgg97tlnnW+a3alh3yc9DvNjPNKU8PgLowuzdV9d4y/Dky9Jy+x+Ixxy59H
i8kXANT/N8FuzrqCSsmi8qaB6rIowsr2z1esbyNGChIWy/ACg2nZISlpfR0wSQLd
XXcWmfyv3t3gt1bqqfZ6iXwL75cbT31Iz+kwsHRdyoQjiXNfHSW3kCBrDVyXNZkV
mnVutOhIfHBhXDrx+ceN+XBDC5Hnyr80TFyNM0gEflb2VKYXUpTNhSUmsK8Pxlnp
RkIdG/+Pw11z0anU6NxP10gOdSlm6r3jAvT2eh5mJPTGOydOtXVeGQMuJrgPQiTa
e00cuFnYTbdmqB/N5PPFhlGTowRaKThuEGKq+YfGCt6VBe22xdaNAv7tVdEIRJNs
W+J6YrcRgICwrrXQLR0FkYuvi0/X7BeZHfrMayzvnHxb/l8QQinMAJ8GiYCIUJpg
rzvepFq/TwjBA8u0jrsUYQyZYAja89r3V7dJCuLlY03PvM1k3jjbxbWZ/pFNlw6O
nBViy7gx1nAoTLy4zzjJjMcChwMWU7al2QlIdlgZ2GFLxEloksrBNLYmt1j1PI/N
qhddXe4CoL25Chqv2ayMi/0SEJyjF496qMOmgQbFtAWt5a0EFb8=
=/91y
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-v6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
- A race-free implementation of pudp_huge_get_and_clear() (based on the
x86 code)
- A MAINTAINERS update to my E-mail address
* tag 'riscv-for-linus-v6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
MAINTAINERS: Update Paul Walmsley's E-mail address
riscv: Use an atomic xchg in pudp_huge_get_and_clear()
of certain boot command line options, and re-enable CONFIG_PTDUMP
on i386 that was mistakenly turned off in the Kconfig.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmjWnKoRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1glpBAAwBCQL3tRDGDgitQJxB84dZZ69wT9Zl40
P/fjdeLg0CD3g0EOBLWeU52nkEWVXETPONv0iRNsiaxhEiQP6ALJXn4G1/8opADX
Z1HyFKa5ISVn+kAtEZ7DHFW8iSRPw3BsI3jnYcUH9NDlb1Yp4tYqkimXO21+tkvF
48POy6yn5tONCUUA2sRyiL8EoWJzS0IJle8ENg0fxniXAaoE3/xgf8UUYUTknIuH
HwNUA2rKME7HwpmT0mcmMvGpKLzNCEkatN9PoH/voCtbyK5Au7moEE70OvNbf/B+
W7UibqOe7oSuT6ZPv7BfIpY6QMl1k/fc5LetWZc8oUfxUPb0ZZq4/g1HFi7NNW8p
a1D1M3dmECqqxYzUV5NYQdm7gI4jxjPGGEhxOohQpdcDdHao5ciZPzsBfqGRNQuC
/RkMDPRcxgLArA7B6cI9aWunc5eyE6zDMkT2s07K6R/JlAiM4Z0StumnDS3nLVEG
5DT2gjICGTB1qOp+5XIp3O2q8tON2ou3AG/PybC8GlQ/XMwbLQ2FFY78j4uk144T
43TXs/H0ELcletr9Cc6gTGXnKmLYCyOymb8+qhgnlzaXxZWfJvkSG/CvKFMZ4lJB
KhufsTT79UyRuLDALn9VhGCLXLCp91fv4zYSvVvifdAt/+dYB0tpDYNt5B0EudyR
oPvxKOcoeLs=
=qhdq
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix a CPU topology code regression that caused the mishandling of
certain boot command line options, and re-enable CONFIG_PTDUMP on i386
that was mistakenly turned off in the Kconfig"
* tag 'x86-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Implement topology_is_core_online() to address SMT regression
x86/Kconfig: Reenable PTDUMP on i386
leaving the dl_server stuck, and a dl_server throttling
bug causing lag to fair tasks.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmjWmxcRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hj+w/9FQlzprbY072g9nBTri+tl1g37UShmbPA
bogKP0xenl7l1Xfrk/aEIWYEIQ9XrAvaeVS8g0g/4Qp+j1q5+mNVpgAZ1ZUUrNHn
l/csP36Otx8kF00hKMlir/doPKO91lNv+mNpsEmp4nLHtgbLfktxtSAWVwZNwsq8
jBAHUJ9JxZA8gngObZnNBuSWM30LNwfB0T5fPNw5ryBjfEJaDClXo/ZsUfyoppbo
DoTh1Kcjsz8w2dRT1yY88I6xEq6RMDkwCwpzQ2TA0ff7HgQxqC2uuKZmVbNJAQ6t
kjJR8+A0G9QH0zK2D3vGljW1fwBtmb3j7YYTHEfkhUXL739rR4PpZ5koasD8TGUd
w9t41JIzpxJrApUWaiAliuKNGJXQW58kpHuryQkqB+RfhesPUxM3bIMQSm0fTBpM
fp3rAfMY9vYmJ6JtEVnKSSo2iGcFcN70VNnwJV/ZCthoegCQER4QvmAxbq6GDQx/
ZtMZqLMiDvuLPQ/aRYbEFw9FbV9SnZ6GEDqcdlT6DOsA3ldpFek4GqcCAoKTuVd4
lUupjdRIhroC6jyjigKNtHJ4iwqkE5UedXvxn8igpPCA2uyRl/rjviraFt2OhE/x
Of+2QK4iRVnz8sa2t8phKep524hLcH35S/i3zpR/QLA9vFTFBFtwLskFRvpTVLCj
Bn6UeCxq08U=
=9ZgA
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix two dl_server regressions: a race that can end up leaving the
dl_server stuck, and a dl_server throttling bug causing lag to fair
tasks"
* tag 'sched-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix dl_server behaviour
sched/deadline: Fix dl_server getting stuck
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmjWp/UACgkQiiy9cAdy
T1HfOAv/XC7dm8W2D7gEmfH9esKyyQKf3lWl08xSaJXeGBUCtyl4hiFwoE/hFQS2
JeicMU8jiUxIYiaqH+eAhqvWvzuAuNgxXZmk0kMQVmGNt/ifZd9xlvN/D6UkEw7T
olpoiT4+fZLjpi3VWfQQT43LAYywIJYoG0Upn4injfxjjeRpj/7vxMjXSfTLrUpg
01IaXnS8j1lXl1o44P4V4k3iv8tF0YjcBOTTZP3AAnmKX9sprovV+JDaawpTZccr
JLlvTcciWtoaRoo6xPks9A2eqwPR1/Bgni9GO8yQAboSRB1573CNBeXpsJ6dy8Df
V+bHSPVV7NgoVwt1re/9pfJ7K8ylKrBr6625SKjkWfNYttlIlUfk6NI7qMW91gaP
sjGrBMJe14ypek8wRKMIarPffxULFVeCk7Gsz1E+LEqBqUH1ggiVWrEETyUuY7FS
m8leCHCc5SYmuRAOQC7uB6Hj6UAUk08SpvQX4796p0GKZMy2/4AActTWZKlIZg1I
ek3YPYat
=cYD6
-----END PGP SIGNATURE-----
Merge tag 'v6.17rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix unlink bug
- Fix potential out of bounds access in processing compound requests
* tag 'v6.17rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix wrong index reference in smb2_compound_op()
smb: client: handle unlink(2) of files open by different clients
- mediatek: Make sure MT8195 AUDIO power domain isn't left powered-on
-----BEGIN PGP SIGNATURE-----
iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAmjWicoXHHVsZi5oYW5z
c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCme8A/+LBPtC7wv2bIxJP3EDZM6DkNg
naIxZI/m4xoSK2RtSGu5LHf4ggr+DImJHixKQARBlLoDJxAY7zlfgJziNPSL3yp5
mw4CQpTAb0GjJQmEonj1r3Am7foLEUARuG3YR5Uk5r5f+jEWHkNqu0kmTiz17I12
Ueed89yYnMPE/ex0TY40Z0XS4t5esypD//bSnJ1zyAPg1g/58e1vWLVHNPvVxaZv
lbI30ADMQkJ7ef4ibMl3VAgm6LDyJQUOqFxlKGTFfJdpLC7pKlXq8bGTFB24NKSH
1e1CRlsn2Mi4dTcj8zqj+IhhM4uqDAjEezIB26PahKFl/yrW8tlvLp4m14nbFSaK
uj+sQQw9R1PvcuLvxlFVrxF/J180kyeT0ReLcEN1C97EcVs8EZpcde/g3NAa22Rm
9WpM8NSupyMHqGPhgTA1YVQh6cSt5g19b3FQYTlMJp8G4kO2nJoqZYwl6OiHAMqc
YdXC7qOchMBTk2G9uIxKrKKss/M3/a8X9qUVxjw61M6P/PEzqlNZGICzv2wKvzF6
L+MLzoj7MqG4H3HXx6iQtszRFEbLpa0uzt6sD+SGIaG6EAm4IxNoovV1BGYyRDFZ
g4asVAa0Qz9VUkH8xalyQ7HPCUeg6EQ2153FzScd849XZFzb0tj8+uquhPg58bB6
y+u/byT1ZQDXXyW7EJE=
=c/ev
-----END PGP SIGNATURE-----
Merge tag 'pmdomain-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fix from Ulf Hansson:
- mediatek: Make sure MT8195 AUDIO power domain isn't left powered-on
* tag 'pmdomain-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: mediatek: set default off flag for MT8195 AUDIO power domain
Fixes and New HW Supoort
- amd/pmc: Use 8042 quirk for Stellaris Slim Gen6 AMD
- dell: Set USTT mode according to BIOS after reboot
- dell-lis3lv02d: Add Latitude E6530
- lg-laptop: Fix setting the fan mode
The following is an automated shortlog grouped by driver:
amd/pmc:
- Add Stellaris Slim Gen6 AMD to spurious 8042 quirks list
dell-lis3lv02d:
- Add Latitude E6530
dell:
- Set USTT mode according to BIOS after reboot
lg-laptop:
- Fix WMAB call in fan_mode_store()
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCaNaJCgAKCRBZrE9hU+XO
MdkhAQDsxiGH0G0bCEzmluHOyBJCGYAU9N2FXr2q4JO2ykzFCAD/e/62TuD6lQuP
75aQV+h2eBn7oOOG576ROTYZ2g8M1gc=
=U7jd
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes and New HW Supoort
- amd/pmc: Use 8042 quirk for Stellaris Slim Gen6 AMD
- dell: Set USTT mode according to BIOS after reboot
- dell-lis3lv02d: Add Latitude E6530
- lg-laptop: Fix setting the fan mode"
* tag 'platform-drivers-x86-v6.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: lg-laptop: Fix WMAB call in fan_mode_store()
platform/x86: dell-lis3lv02d: Add Latitude E6530
platform/x86/dell: Set USTT mode according to BIOS after reboot
platform/x86/amd/pmc: Add Stellaris Slim Gen6 AMD to spurious 8042 quirks list
- allow looking up GPIOs by the secondary firmware node too
- fix memory leak in gpio-regmap
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmjWZBgACgkQEacuoBRx
13KzWQ//b0TTLgektBbPsGbPKNR8Jm0ma0Def+mT5Z6mZz5L2QjuJkbCKpiNyU/u
+yDnCvvn23baUHXnsvFOB7/s4Hm7fv97EhXcsNF6x5Nr10MYQIiqqQo7IdvjvaT3
HKHEfrbr6Ei2j7tDoRURi8tyCGVchCRla8pZVItIvuyN+qMfcV5IxnObRi1g4FTu
GM+fxYI30dwSnYqEDovqpYulb+oVEkIztkWUL6tiL5n7dQ9slUuo3QZQAfcYECln
g23UwrK/vR3r0EBbV5Y7EFCMxnQ97dNiOiMZ7JsjsPkp7CI43borF8O5NQcdiPp9
wNtpqUSwEHv+X6jj8J9U1+EVCwftJRTwZ7XBqpIjJh+8yilwbrEOLmfz/9T1COJo
pk93D2WQckeeodX4akBSZQs9Yc7bjTxlbhyLoR+iVer7kjk8qkHciy08W65BigtQ
Ka1VIlAmANpfNxXyoVyWi20Gk6WGA7js7kwhYJJagaRfbV37d5Lcs9f4hcCh0P5m
O+juFb3JChpw4sSTn2K3ML8gOElo16aRjYd4f0GquQSrSoazYkUS05wVuYHncifO
jWFptOOT43cP8mfT3aC6OOprgimPgXlDmkaQYVuUp9+cFHHML/t912S6+GvL4mGg
qe7WmwVL4gVxQHNXJdz92NqNOtwQkvRgXqgYah/UDr7ROYl0F48=
=tnUa
-----END PGP SIGNATURE-----
Merge tag 'gpio-fixes-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- allow looking up GPIOs by the secondary firmware node too
- fix memory leak in gpio-regmap
* tag 'gpio-fixes-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: regmap: fix memory leak of gpio_regmap structure
gpiolib: Extend software-node support to support secondary software-nodes
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmjWKd4QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgphViEADJF/LndvCUkcNFagHcmAHoJWXajelT/c2w
yUfiP5kH3nyjC+eXUNT819QyXNxgr7jhJWtOSnWXGcX28aHEx1BuCQSbCW7L0VxR
DtEfipBeSDt/ixMMokgV0fG/je9D8s4xHLUqFW3tni91jK73G9JxXZoQzj1lQ5t8
qNDGH4Z5BZvX/ClxWXZf1tIPWy1Bbjx8YncJQCDxg9C1fOJXlP9NuWBk7iL5svPV
trgIExix3YoyjOk9d5/P6604wbzTffF8CGoPJEC08LZxxjXkob/ipsd6+Wv1aCyF
3RNIX5bsoN/u0uabyfh5imYxGOkesqqK96sTz+pOExTALNtwKerdmTV790tq73yG
EeNDAHkRve5xBAgHwRlbU9sH6mQypzhiR7DaLXe/INKp6rUOMOhH3JgYwycrzvnC
bDgI0kbs1IrEk/rr1yGupu0Fqav30yWlgQ13vVu3rwp2cGabTnmoOPl34siWjEc9
XL1Q0ftsBtXPxOomKYIDatBCiN8i33/KZ5/IGhDH6qO2or0ydzjK1yEeuDcIDClg
HCfGCnh5Rs11c4iiMjSBSwmWjwwbciZ4XV5XM9VqcqRDeKg0XHb/jaRYYM3UGu37
nTUmJlLN/S90S9tYTd9b7iUlpq5PKZV3TvSW1QeZVxelTgRw7MIwyWGz4qH6XUc3
Im0RjM/ZiA==
=AKR9
-----END PGP SIGNATURE-----
Merge tag 'block-6.17-20250925' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
"A regression fix for this series where an attempt to silence an EOD
error got messed up a bit, and then a change of git trees for the
block and io_uring trees.
Switching the git trees to kernel.org now, as I've just about had it
trying to battle AI bots that bring the box to its knees, continually.
At least I don't have to maintain the kernel.org side"
* tag 'block-6.17-20250925' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
MAINTAINERS: update io_uring and block tree git trees
block: fix EOD return for device with nr_sectors == 0
fbcon:
- fix OOB access in font allocation
- fix integer overflow in font handling
amdgpu:
- Backlight fix
- DC preblend fix
- DCN 3.5 fix
- Cleanup output_tf_change
xe:
- Don't expose sysfs attributes not applicable for VFs
- Fix build with CONFIG_MODULES=n
- Don't copy pinned kernel bos twice on suspend
i915:
- Set O_LARGEFILE in __create_shmem()
- Guard reg_val against a INVALID_TRANSCODER [ddi]
ast:
- sleeps causing cpu stall fix
panthor:
- scheduler race condition fix
gma500:
- NULL ptr deref in hdmi teardown fix
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmjWFIEACgkQDHTzWXnE
hr60DQ//RMHlSkyfNdfVXeAmNxAp9ruzOS9mduDLoMYxgniSyB6o3vt5s3KYFP4B
eusj2Lceg+9AU5J50iXUrhyQjFwE9gFWevjlEKlL5Rr5AxnHGrIvVQGD26C2PiT8
JgMzKYNezYslpMLxyP0DuQxTzT6E88P1H7PUFF41KpoR1ZqcW2hGRSkKhSKv9aea
OcqORMMTRtOy3m3mCKfSGUa1TmiPpxKAT6WQP9id3W8vIMHUElHy7K9Qx1tbycp/
wlT7J66yN8xsVh5Z/8SUTRQXJJ05iBxm37zWvAiUD7DN+RrDgQDVNk65G9Gb3lMC
FfuHARi79Ng2fUl/O99pqsEiOjgqtVcx0gL3pPtahnly8efS5huNu8s8IAcbepRv
uBjrppewxLSM0YdJu5Dw2QSP9YsGsM7IF9bMk/362JndoVWp+T8zY1+snFqD4RRv
P4LHpfmaf2uqtxeort8oiYKrXjNHBbcxsfnH/jWLMtgx2gW6Et8JruaOtesrNJ2G
NVa02RWMsDMGaazrg2+4aH+NjjxxgrR1blZKttupsLtyYqftmW9l0OQ6zfg36/d5
njoEUzgHrYBBHbRM3IR4OhQ5f54lWnBl2+MoeR6c1dI9A7QVFUUKjcwFA/TxI2tr
kNZNoEI/rMOLo6b4yYBh+0l76XEyrxQIy9iAGZrog+pEECEs9GQ=
=iIm3
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2025-09-26' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes, some fbcon font handling fixes, then amdgpu/xe/i915 with
a few, and a few misc fixes for other drivers. Seems about right for
this stage, and I don't know of anything outstanding.
fbcon:
- fix OOB access in font allocation
- fix integer overflow in font handling
amdgpu:
- Backlight fix
- DC preblend fix
- DCN 3.5 fix
- Cleanup output_tf_change
xe:
- Don't expose sysfs attributes not applicable for VFs
- Fix build with CONFIG_MODULES=n
- Don't copy pinned kernel bos twice on suspend
i915:
- Set O_LARGEFILE in __create_shmem()
- Guard reg_val against a INVALID_TRANSCODER [ddi]
ast:
- sleeps causing cpu stall fix
panthor:
- scheduler race condition fix
gma500:
- NULL ptr deref in hdmi teardown fix"
* tag 'drm-fixes-2025-09-26' of https://gitlab.freedesktop.org/drm/kernel:
drm/panthor: Defer scheduler entitiy destruction to queue release
drm/amd/display: remove output_tf_change flag
drm/amd/display: Init DCN35 clocks from pre-os HW values
drm/amd/display: Use mpc.preblend flag to indicate preblend
drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume
fbcon: Fix OOB access in font allocation
drm/i915/ddi: Guard reg_val against a INVALID_TRANSCODER
drm/i915: set O_LARGEFILE in __create_shmem()
drm/xe: Don't copy pinned kernel bos twice on suspend
drm/xe: Fix build with CONFIG_MODULES=n
drm/xe/vf: Don't expose sysfs attributes not applicable for VFs
fbcon: fix integer overflow in fbcon_do_set_font
drm/gma500: Fix null dereference in hdmi teardown
drm/ast: Use msleep instead of mdelay for edid read
In smb2_compound_op(), the loop that processes each command's response
uses wrong indices when accessing response bufferes.
This incorrect indexing leads to improper handling of command results.
Also, if incorrectly computed index is greather than or equal to
MAX_COMPOUND, it can cause out-of-bounds accesses.
Fixes: 3681c74d34 ("smb: client: handle lack of EA support in smb2_query_path_info()") # 6.14
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Remove superfluous argument from fcntl_{get/set}_rw_hint.
No functional change.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Massage listmount() and make sure we don't call path_put() under the
namespace semaphore. If we put the last reference we're fscked.
Fixes: b4c2bea8ce ("add listmount(2) syscall")
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Christian Brauner <brauner@kernel.org>
Massage statmount() and make sure we don't call path_put() under the
namespace semaphore. If we put the last reference we're fscked.
Fixes: 46eae99ef7 ("add statmount(2) syscall")
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Christian Brauner <brauner@kernel.org>
Commit 20d72b00ca ("netfs: Fix the request's work item to not
require a ref") modified netfs_alloc_request() to initialize the
reference counter to 2 instead of 1. The rationale was that the
requet's "work" would release the second reference after completion
(via netfs_{read,write}_collection_worker()). That works most of the
time if all goes well.
However, it leaks this additional reference if the request is released
before the I/O operation has been submitted: the error code path only
decrements the reference counter once and the work item will never be
queued because there will never be a completion.
This has caused outages of our whole server cluster today because
tasks were blocked in netfs_wait_for_outstanding_io(), leading to
deadlocks in Ceph (another bug that I will address soon in another
patch). This was caused by a netfs_pgpriv2_begin_copy_to_cache() call
which failed in fscache_begin_write_operation(). The leaked
netfs_io_request was never completed, leaving `netfs_inode.io_count`
with a positive value forever.
All of this is super-fragile code. Finding out which code paths will
lead to an eventual completion and which do not is hard to see:
- Some functions like netfs_create_write_req() allocate a request, but
will never submit any I/O.
- netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read()
and then netfs_put_request(); however, netfs_unbuffered_read() can
also fail early before submitting the I/O request, therefore another
netfs_put_request() call must be added there.
A rule of thumb is that functions that return a `netfs_io_request` do
not submit I/O, and all of their callers must be checked.
For my taste, the whole netfs code needs an overhaul to make reference
counting easier to understand and less fragile & obscure. But to fix
this bug here and now and produce a patch that is adequate for a
stable backport, I tried a minimal approach that quickly frees the
request object upon early failure.
I decided against adding a second netfs_put_request() each time
because that would cause code duplication which obscures the code
further. Instead, I added the function netfs_put_failed_request()
which frees such a failed request synchronously under the assumption
that the reference count is exactly 2 (as initially set by
netfs_alloc_request() and never touched), verified by a
WARN_ON_ONCE(). It then deinitializes the request object (without
going through the "cleanup_work" indirection) and frees the allocation
(with RCU protection to protect against concurrent access by
netfs_requests_seq_start()).
All code paths that fail early have been changed to call
netfs_put_failed_request() instead of netfs_put_request().
Additionally, I have added a netfs_put_request() call to
netfs_unbuffered_read() as explained above because the
netfs_put_failed_request() approach does not work there.
Fixes: 20d72b00ca ("netfs: Fix the request's work item to not require a ref")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
OOB and overflow fixes for fbcon, and a race condition fix for panthor.
-----BEGIN PGP SIGNATURE-----
iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaNUoIgAKCRAnX84Zoj2+
dm6kAXwKfrCrV/IsZ5Ns7rmHWJwfG0LeMc3ag+7hgJijTjHh46NfOLUMSYxUHWEt
WsK8xB8Bf0HJrloDICxTtt4ynTXGqwXC7Po8bMIYM+ct7Z7L5wvTb8RKTpaREwGs
a33fThctDg==
=5YeQ
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2025-09-25' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A CPU stall fix for ast, a NULL pointer dereference fix for gma500, an
OOB and overflow fixes for fbcon, and a race condition fix for panthor.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250925-smilodon-of-luxurious-genius-4ebee7@penduick
commit c519c3c0a1 ("mm/kasan: avoid lazy MMU mode hazards") introduced
the use of arch_enter_lazy_mmu_mode(), which results in the compiler
complaining about "statement has no effect", when
__HAVE_ARCH_LAZY_MMU_MODE is not defined in include/linux/pgtable.h
The exact warning/error is:
In file included from ./include/linux/kasan.h:37,
from mm/kasan/shadow.c:14:
mm/kasan/shadow.c: In function kasan_populate_vmalloc_pte:
./include/linux/pgtable.h:247:41: error: statement with no effect [-Werror=unused-value]
247 | #define arch_enter_lazy_mmu_mode() (LAZY_MMU_DEFAULT)
| ^
mm/kasan/shadow.c:322:9: note: in expansion of macro arch_enter_lazy_mmu_mode> 322 | arch_enter_lazy_mmu_mode();
| ^~~~~~~~~~~~~~~~~~~~~~~~
switching these "functions" to static inlines fixes this up.
Fixes: c519c3c0a1 ("mm/kasan: avoid lazy MMU mode hazards")
Reported-by: Balbir Singh <balbirs@nvidia.com>
Closes: https://lkml.kernel.org/r/20250912235515.367061-1-balbirs@nvidia.com
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The callback return value is ignored in damon_sysfs_damon_call(), which
means that it is not possible to detect invalid user input when writing
commands such as 'commit' to
/sys/kernel/mm/damon/admin/kdamonds/<K>/state. Fix it.
Link: https://lkml.kernel.org/r/20250920132546.5822-1-akinobu.mita@gmail.com
Fixes: f64539dcdb ("mm/damon/sysfs: use damon_call() for update_schemes_stats")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>