From b55eb6eb2a7427428c59b293a0900131fc849595 Mon Sep 17 00:00:00 2001 From: Mike Yuan Date: Wed, 4 Jun 2025 15:03:42 +0000 Subject: [PATCH 1/8] pidfs: never refuse ppid == 0 in PIDFD_GET_INFO In systemd we spotted an issue after switching to ioctl(PIDFD_GET_INFO) for obtaining pid number the pidfd refers to, that for processes with a parent from outer pidns PIDFD_GET_INFO unexpectedly yields -ESRCH [1]. It turned out that there's an arbitrary check blocking this, which is not really sensible given getppid() happily returns 0 for such processes. Just drop the spurious check and userspace ought to handle ppid == 0 properly everywhere. [1] https://github.com/systemd/systemd/issues/37715 Fixes: cdda1f26e74b ("pidfd: add ioctl to retrieve pid info") Signed-off-by: Mike Yuan Link: https://lore.kernel.org/20250604150238.42664-1-me@yhndnzj.com Cc: Christian Brauner Cc: Luca Boccassi Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner --- fs/pidfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/pidfs.c b/fs/pidfs.c index c1f0a067be40..69919be1c9d8 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -366,7 +366,7 @@ static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg) kinfo.pid = task_pid_vnr(task); kinfo.mask |= PIDFD_INFO_PID; - if (kinfo.pid == 0 || kinfo.tgid == 0 || (kinfo.ppid == 0 && kinfo.pid != 1)) + if (kinfo.pid == 0 || kinfo.tgid == 0) return -ESRCH; copy_out: From 714d02b41939d2720379e11ef25227aec4e5bec9 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 5 Jun 2025 12:15:30 +0200 Subject: [PATCH 2/8] ovl: fix regression caused by lookup helpers API changes The lookup helpers API was changed by merge of vfs-6.16-rc1.async.dir to pass a non-const qstr pointer argument to lookup_one*() helpers. All of the callers of this API were changed to pass a pointer to temp copy of qstr, except overlays that was passing a const pointer to dentry->d_name that was changed to pass a non-const copy instead when doing a lookup in lower layer which is not the fs of said dentry. This wrong use of the API caused a regression in fstest overlay/012. Fix the regression by making a non-const copy of dentry->d_name prior to calling the lookup API, but the API should be fixed to not allow this class of bugs. Cc: NeilBrown Fixes: 5741909697a3 ("VFS: improve interface for lookup_one functions") Fixes: 390e34bc1490 ("VFS: change lookup_one_common and lookup_noperm_common to take a qstr") Signed-off-by: Amir Goldstein Link: https://lore.kernel.org/20250605101530.2336320-1-amir73il@gmail.com Signed-off-by: Christian Brauner --- fs/overlayfs/namei.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c index bf722daf19a9..f75c49ceb460 100644 --- a/fs/overlayfs/namei.c +++ b/fs/overlayfs/namei.c @@ -1371,7 +1371,7 @@ out: bool ovl_lower_positive(struct dentry *dentry) { struct ovl_entry *poe = OVL_E(dentry->d_parent); - struct qstr *name = &dentry->d_name; + const struct qstr *name = &dentry->d_name; const struct cred *old_cred; unsigned int i; bool positive = false; @@ -1394,9 +1394,15 @@ bool ovl_lower_positive(struct dentry *dentry) struct dentry *this; struct ovl_path *parentpath = &ovl_lowerstack(poe)[i]; + /* + * We need to make a non-const copy of dentry->d_name, + * because lookup_one_positive_unlocked() will hash name + * with parentpath base, which is on another (lower fs). + */ this = lookup_one_positive_unlocked( mnt_idmap(parentpath->layer->mnt), - name, parentpath->dentry); + &QSTR_LEN(name->name, name->len), + parentpath->dentry); if (IS_ERR(this)) { switch (PTR_ERR(this)) { case -ENOENT: From 800d0b9b6a8b1b354637b4194cc167ad1ce2bdd3 Mon Sep 17 00:00:00 2001 From: Stephen Smalley Date: Thu, 5 Jun 2025 12:51:16 -0400 Subject: [PATCH 3/8] fs/xattr.c: fix simple_xattr_list() commit 8b0ba61df5a1 ("fs/xattr.c: fix simple_xattr_list to always include security.* xattrs") failed to reset err after the call to security_inode_listsecurity(), which returns the length of the returned xattr name. This results in simple_xattr_list() incorrectly returning this length even if a POSIX acl is also set on the inode. Reported-by: Collin Funk Closes: https://lore.kernel.org/selinux/8734ceal7q.fsf@gmail.com/ Reported-by: Paul Eggert Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2369561 Fixes: 8b0ba61df5a1 ("fs/xattr.c: fix simple_xattr_list to always include security.* xattrs") Signed-off-by: Stephen Smalley Link: https://lore.kernel.org/20250605165116.2063-1-stephen.smalley.work@gmail.com Signed-off-by: Christian Brauner --- fs/xattr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/xattr.c b/fs/xattr.c index 8ec5b0204bfd..600ae97969cf 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -1479,6 +1479,7 @@ ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs, buffer += err; } remaining_size -= err; + err = 0; read_lock(&xattrs->lock); for (rbp = rb_first(&xattrs->rb_root); rbp; rbp = rb_next(rbp)) { From 6bdd3a01fe4627ad7a562ba38eb759eba715b671 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 10 Jun 2025 16:00:20 +0200 Subject: [PATCH 4/8] fs: add missing values to TRACE_IOCB_STRINGS Make sure all values are covered. Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/20250610140020.2227932-1-hch@lst.de Signed-off-by: Christian Brauner --- include/linux/fs.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 96c7925a6551..d27c402f1162 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -399,7 +399,9 @@ struct readahead_control; { IOCB_WAITQ, "WAITQ" }, \ { IOCB_NOIO, "NOIO" }, \ { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ - { IOCB_DIO_CALLER_COMP, "CALLER_COMP" } + { IOCB_DIO_CALLER_COMP, "CALLER_COMP" }, \ + { IOCB_AIO_RW, "AIO_RW" }, \ + { IOCB_HAS_METADATA, "AIO_HAS_METADATA" } struct kiocb { struct file *ki_filp; From ad5a0351064c8f744416df3a49156d2c61af5bc0 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 10 Jun 2025 11:04:04 +1000 Subject: [PATCH 5/8] VFS: change try_lookup_noperm() to skip revalidation The recent change from using d_hash_and_lookup() to using try_lookup_noperm() inadvertently introduce a d_revalidate() call when the lookup was successful. Steven French reports that this resulted in worse than halving of performance in some cases. Prior to the offending patch the only caller of try_lookup_noperm() was autofs which does not need the d_revalidate(). So it is safe to remove the d_revalidate() call providing we stop using try_lookup_noperm() to implement lookup_noperm(). The "try_" in the name is strongly suggestive that the caller isn't expecting much effort, so it seems reasonable to avoid the effort of d_revalidate(). Fixes: 06c567403ae5 ("Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS") Reported-by: Steve French Link: https://lore.kernel.org/all/CAH2r5mu5SfBrdc2CFHwzft8=n9koPMk+Jzwpy-oUMx-wCRCesQ@mail.gmail.com/ Signed-off-by: NeilBrown Link: https://lore.kernel.org/174951744454.608730.18354002683881684261@noble.neil.brown.name Tested-by: Steve French Signed-off-by: Christian Brauner --- fs/namei.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 4bb889fc980b..f761cafaeaad 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2917,7 +2917,8 @@ static int lookup_one_common(struct mnt_idmap *idmap, * @base: base directory to lookup from * * Look up a dentry by name in the dcache, returning NULL if it does not - * currently exist. The function does not try to create a dentry. + * currently exist. The function does not try to create a dentry and if one + * is found it doesn't try to revalidate it. * * Note that this routine is purely a helper for filesystem usage and should * not be called by generic code. It does no permission checking. @@ -2933,7 +2934,7 @@ struct dentry *try_lookup_noperm(struct qstr *name, struct dentry *base) if (err) return ERR_PTR(err); - return lookup_dcache(name, base, 0); + return d_lookup(base, name); } EXPORT_SYMBOL(try_lookup_noperm); @@ -3057,14 +3058,22 @@ EXPORT_SYMBOL(lookup_one_positive_unlocked); * Note that this routine is purely a helper for filesystem usage and should * not be called by generic code. It does no permission checking. * - * Unlike lookup_noperm, it should be called without the parent + * Unlike lookup_noperm(), it should be called without the parent * i_rwsem held, and will take the i_rwsem itself if necessary. + * + * Unlike try_lookup_noperm() it *does* revalidate the dentry if it already + * existed. */ struct dentry *lookup_noperm_unlocked(struct qstr *name, struct dentry *base) { struct dentry *ret; + int err; - ret = try_lookup_noperm(name, base); + err = lookup_noperm_common(name, base); + if (err) + return ERR_PTR(err); + + ret = lookup_dcache(name, base, 0); if (!ret) ret = lookup_slow(name, base, 0); return ret; From 527c88d8390d6c0358dea4d71696795c05328925 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 12 Jun 2025 09:22:45 +0200 Subject: [PATCH 6/8] ovl: fix debug print in case of mkdir error We want to print the name in case of mkdir failure and now we will get a cryptic (efault) as name. Fixes: c54b386969a5 ("VFS: Change vfs_mkdir() to return the dentry.") Signed-off-by: Amir Goldstein Link: https://lore.kernel.org/20250612072245.2825938-1-amir73il@gmail.com Signed-off-by: Christian Brauner --- fs/overlayfs/overlayfs.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 8baaba0a3fe5..497323128e5f 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -246,9 +246,11 @@ static inline struct dentry *ovl_do_mkdir(struct ovl_fs *ofs, struct dentry *dentry, umode_t mode) { - dentry = vfs_mkdir(ovl_upper_mnt_idmap(ofs), dir, dentry, mode); - pr_debug("mkdir(%pd2, 0%o) = %i\n", dentry, mode, PTR_ERR_OR_ZERO(dentry)); - return dentry; + struct dentry *ret; + + ret = vfs_mkdir(ovl_upper_mnt_idmap(ofs), dir, dentry, mode); + pr_debug("mkdir(%pd2, 0%o) = %i\n", dentry, mode, PTR_ERR_OR_ZERO(ret)); + return ret; } static inline int ovl_do_mknod(struct ovl_fs *ofs, From 0b9d62a47149083d581d8b2abb04124b6175cb29 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 11 Jun 2025 09:40:44 -0700 Subject: [PATCH 7/8] fs: unlock the superblock during iterate_supers_type This function takes super_lock in shared mode, so it should release the same lock. Cc: stable@vger.kernel.org # v6.16-rc1 Fixes: af7551cf13cf7f ("super: remove pointless s_root checks") Signed-off-by: "Darrick J. Wong" Link: https://lore.kernel.org/20250611164044.GF6138@frogsfrogsfrogs Reviewed-by: Jan Kara Signed-off-by: Christian Brauner --- fs/super.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/super.c b/fs/super.c index 21799e213fd7..80418ca8e215 100644 --- a/fs/super.c +++ b/fs/super.c @@ -964,8 +964,10 @@ void iterate_supers_type(struct file_system_type *type, spin_unlock(&sb_lock); locked = super_lock_shared(sb); - if (locked) + if (locked) { f(sb, arg); + super_unlock_shared(sb); + } spin_lock(&sb_lock); if (p) From dd2d6b7f6f519d078a866a36a625b0297d81c5bc Mon Sep 17 00:00:00 2001 From: Luis Henriques Date: Fri, 13 Jun 2025 11:11:11 +0100 Subject: [PATCH 8/8] fs: drop assert in file_seek_cur_needs_f_lock The assert in function file_seek_cur_needs_f_lock() can be triggered very easily because there are many users of vfs_llseek() (such as overlayfs) that do their custom locking around llseek instead of relying on fdget_pos(). Just drop the overzealous assertion. Fixes: da06e3c51794 ("fs: don't needlessly acquire f_lock") Suggested-by: Jan Kara Suggested-by: Mateusz Guzik Signed-off-by: Luis Henriques Link: https://lore.kernel.org/20250613101111.17716-1-luis@igalia.com Signed-off-by: Christian Brauner --- fs/file.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/file.c b/fs/file.c index 3a3146664cf3..b6db031545e6 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1198,8 +1198,12 @@ bool file_seek_cur_needs_f_lock(struct file *file) if (!(file->f_mode & FMODE_ATOMIC_POS) && !file->f_op->iterate_shared) return false; - VFS_WARN_ON_ONCE((file_count(file) > 1) && - !mutex_is_locked(&file->f_pos_lock)); + /* + * Note that we are not guaranteed to be called after fdget_pos() on + * this file obj, in which case the caller is expected to provide the + * appropriate locking. + */ + return true; }