mirror of
git://git.yoctoproject.org/linux-yocto.git
synced 2025-10-22 15:03:53 +02:00
f97f100227
81492 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
73697928c8 |
fs: relax assertions on failure to encode file handles
commit |
||
![]() |
5111b14836 |
btrfs: adjust subpage bit start based on sectorsize
[ Upstream commit |
||
![]() |
65b98a7e65 |
cifs: prevent NULL pointer dereference in UTF16 conversion
commit 70bccd9855dae56942f2b18a08ba137bb54093a0 upstream. There can be a NULL pointer dereference bug here. NULL is passed to __cifs_sfu_make_node without checks, which passes it unchecked to cifs_strndup_to_utf16, which in turn passes it to cifs_local_to_utf16_bytes where '*from' is dereferenced, causing a crash. This patch adds a check for NULL 'src' in cifs_strndup_to_utf16 and returns NULL early to prevent dereferencing NULL pointer. Found by Linux Verification Center (linuxtesting.org) with SVACE Signed-off-by: Makar Semyonov <m.semenov@tssltd.ru> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
0b0d6ad8a7 |
proc: fix missing pde_set_flags() for net proc files
commit 2ce3d282bd5050fca8577defeff08ada0d55d062 upstream.
To avoid potential UAF issues during module removal races, we use
pde_set_flags() to save proc_ops flags in PDE itself before
proc_register(), and then use pde_has_proc_*() helpers instead of directly
dereferencing pde->proc_ops->*.
However, the pde_set_flags() call was missing when creating net related
proc files. This omission caused incorrect behavior which FMODE_LSEEK was
being cleared inappropriately in proc_reg_open() for net proc files. Lars
reported it in this link[1].
Fix this by ensuring pde_set_flags() is called when register proc entry,
and add NULL check for proc_ops in pde_set_flags().
[wangzijie1@honor.com: stash pde->proc_ops in a local const variable, per Christian]
Link: https://lkml.kernel.org/r/20250821105806.1453833-1-wangzijie1@honor.com
Link: https://lkml.kernel.org/r/20250818123102.959595-1-wangzijie1@honor.com
Link: https://lore.kernel.org/all/20250815195616.64497967@chagall.paradoxon.rec/ [1]
Fixes:
|
||
![]() |
42c415c53a |
ocfs2: prevent release journal inode after journal shutdown
commit f46e8ef8bb7b452584f2e75337b619ac51a7cadf upstream.
Before calling ocfs2_delete_osb(), ocfs2_journal_shutdown() has already
been executed in ocfs2_dismount_volume(), so osb->journal must be NULL.
Therefore, the following calltrace will inevitably fail when it reaches
jbd2_journal_release_jbd_inode().
ocfs2_dismount_volume()->
ocfs2_delete_osb()->
ocfs2_free_slot_info()->
__ocfs2_free_slot_info()->
evict()->
ocfs2_evict_inode()->
ocfs2_clear_inode()->
jbd2_journal_release_jbd_inode(osb->journal->j_journal,
Adding osb->journal checks will prevent null-ptr-deref during the above
execution path.
Link: https://lkml.kernel.org/r/tencent_357489BEAEE4AED74CBD67D246DBD2C4C606@qq.com
Fixes:
|
||
![]() |
1edc2feb9c |
fs: writeback: fix use-after-free in __mark_inode_dirty()
[ Upstream commit d02d2c98d25793902f65803ab853b592c7a96b29 ] An use-after-free issue occurred when __mark_inode_dirty() get the bdi_writeback that was in the progress of switching. CPU: 1 PID: 562 Comm: systemd-random- Not tainted 6.6.56-gb4403bd46a8e #1 ...... pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __mark_inode_dirty+0x124/0x418 lr : __mark_inode_dirty+0x118/0x418 sp : ffffffc08c9dbbc0 ........ Call trace: __mark_inode_dirty+0x124/0x418 generic_update_time+0x4c/0x60 file_modified+0xcc/0xd0 ext4_buffered_write_iter+0x58/0x124 ext4_file_write_iter+0x54/0x704 vfs_write+0x1c0/0x308 ksys_write+0x74/0x10c __arm64_sys_write+0x1c/0x28 invoke_syscall+0x48/0x114 el0_svc_common.constprop.0+0xc0/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x40/0xe4 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x194/0x198 Root cause is: systemd-random-seed kworker ---------------------------------------------------------------------- ___mark_inode_dirty inode_switch_wbs_work_fn spin_lock(&inode->i_lock); inode_attach_wb locked_inode_to_wb_and_lock_list get inode->i_wb spin_unlock(&inode->i_lock); spin_lock(&wb->list_lock) spin_lock(&inode->i_lock) inode_io_list_move_locked spin_unlock(&wb->list_lock) spin_unlock(&inode->i_lock) spin_lock(&old_wb->list_lock) inode_do_switch_wbs spin_lock(&inode->i_lock) inode->i_wb = new_wb spin_unlock(&inode->i_lock) spin_unlock(&old_wb->list_lock) wb_put_many(old_wb, nr_switched) cgwb_release old wb released wb_wakeup_delayed() accesses wb, then trigger the use-after-free issue Fix this race condition by holding inode spinlock until wb_wakeup_delayed() finished. Signed-off-by: Jiufei Xue <jiufei.xue@samsung.com> Link: https://lore.kernel.org/20250728100715.3863241-1-jiufei.xue@samsung.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
![]() |
9b1e6a7152 |
btrfs: avoid load/store tearing races when checking if an inode was logged
[ Upstream commit 986bf6ed44dff7fbae7b43a0882757ee7f5ba21b ]
At inode_logged() we do a couple lockless checks for ->logged_trans, and
these are generally safe except the second one in case we get a load or
store tearing due to a concurrent call updating ->logged_trans (either at
btrfs_log_inode() or later at inode_logged()).
In the first case it's safe to compare to the current transaction ID since
once ->logged_trans is set the current transaction, we never set it to a
lower value.
In the second case, where we check if it's greater than zero, we are prone
to load/store tearing races, since we can have a concurrent task updating
to the current transaction ID with store tearing for example, instead of
updating with a single 64 bits write, to update with two 32 bits writes or
four 16 bits writes. In that case the reading side at inode_logged() could
see a positive value that does not match the current transaction and then
return a false negative.
Fix this by doing the second check while holding the inode's spinlock, add
some comments about it too. Also add the data_race() annotation to the
first check to avoid any reports from KCSAN (or similar tools) and comment
about it.
Fixes:
|
||
![]() |
564ce572c5 |
btrfs: fix race between setting last_dir_index_offset and inode logging
[ Upstream commit 59a0dd4ab98970086fd096281b1606c506ff2698 ]
At inode_logged() if we find that the inode was not logged before we
update its ->last_dir_index_offset to (u64)-1 with the goal that the
next directory log operation will see the (u64)-1 and then figure out
it must check what was the index of the last logged dir index key and
update ->last_dir_index_offset to that key's offset (this is done in
update_last_dir_index_offset()).
This however has a possibility for a time window where a race can happen
and lead to directory logging skipping dir index keys that should be
logged. The race happens like this:
1) Task A calls inode_logged(), sees ->logged_trans as 0 and then checks
that the inode item was logged before, but before it sets the inode's
->last_dir_index_offset to (u64)-1...
2) Task B is at btrfs_log_inode() which calls inode_logged() early, and
that has set ->last_dir_index_offset to (u64)-1;
3) Task B then enters log_directory_changes() which calls
update_last_dir_index_offset(). There it sees ->last_dir_index_offset
is (u64)-1 and that the inode was logged before (ctx->logged_before is
true), and so it searches for the last logged dir index key in the log
tree and it finds that it has an offset (index) value of N, so it sets
->last_dir_index_offset to N, so that we can skip index keys that are
less than or equal to N (later at process_dir_items_leaf());
4) Task A now sets ->last_dir_index_offset to (u64)-1, undoing the update
that task B just did;
5) Task B will now skip every index key when it enters
process_dir_items_leaf(), since ->last_dir_index_offset is (u64)-1.
Fix this by making inode_logged() not touch ->last_dir_index_offset and
initializing it to 0 when an inode is loaded (at btrfs_alloc_inode()) and
then having update_last_dir_index_offset() treat a value of 0 as meaning
we must check the log tree and update with the index of the last logged
index key. This is fine since the minimum possible value for
->last_dir_index_offset is 1 (BTRFS_DIR_START_INDEX - 1 = 2 - 1 = 1).
This also simplifies the management of ->last_dir_index_offset and now
all accesses to it are done under the inode's log_mutex.
Fixes:
|
||
![]() |
b9cd542763 |
btrfs: fix race between logging inode and checking if it was logged before
[ Upstream commit ef07b74e1be56f9eafda6aadebb9ebba0743c9f0 ]
There's a race between checking if an inode was logged before and logging
an inode that can cause us to mark an inode as not logged just after it
was logged by a concurrent task:
1) We have inode X which was not logged before neither in the current
transaction not in past transaction since the inode was loaded into
memory, so it's ->logged_trans value is 0;
2) We are at transaction N;
3) Task A calls inode_logged() against inode X, sees that ->logged_trans
is 0 and there is a log tree and so it proceeds to search in the log
tree for an inode item for inode X. It doesn't see any, but before
it sets ->logged_trans to N - 1...
3) Task B calls btrfs_log_inode() against inode X, logs the inode and
sets ->logged_trans to N;
4) Task A now sets ->logged_trans to N - 1;
5) At this point anyone calling inode_logged() gets 0 (inode not logged)
since ->logged_trans is greater than 0 and less than N, but our inode
was really logged. As a consequence operations like rename, unlink and
link that happen afterwards in the current transaction end up not
updating the log when they should.
Fix this by ensuring inode_logged() only updates ->logged_trans in case
the inode item is not found in the log tree if after tacking the inode's
lock (spinlock struct btrfs_inode::lock) the ->logged_trans value is still
zero, since the inode lock is what protects setting ->logged_trans at
btrfs_log_inode().
Fixes:
|
||
![]() |
e358d4b622 |
xfs: do not propagate ENODATA disk errors into xattr code
commit ae668cd567a6a7622bc813ee0bb61c42bed61ba7 upstream. ENODATA (aka ENOATTR) has a very specific meaning in the xfs xattr code; namely, that the requested attribute name could not be found. However, a medium error from disk may also return ENODATA. At best, this medium error may escape to userspace as "attribute not found" when in fact it's an IO (disk) error. At worst, we may oops in xfs_attr_leaf_get() when we do: error = xfs_attr_leaf_hasname(args, &bp); if (error == -ENOATTR) { xfs_trans_brelse(args->trans, bp); return error; } because an ENODATA/ENOATTR error from disk leaves us with a null bp, and the xfs_trans_brelse will then null-deref it. As discussed on the list, we really need to modify the lower level IO functions to trap all disk errors and ensure that we don't let unique errors like this leak up into higher xfs functions - many like this should be remapped to EIO. However, this patch directly addresses a reported bug in the xattr code, and should be safe to backport to stable kernels. A larger-scope patch to handle more unique errors at lower levels can follow later. (Note, prior to |
||
![]() |
b0165b6ead |
smb3 client: fix return code mapping of remap_file_range
commit 0e08fa789d39aa01923e3ba144bd808291895c3c upstream. We were returning -EOPNOTSUPP for various remap_file_range cases but for some of these the copy_file_range_syscall() requires -EINVAL to be returned (e.g. where source and target file ranges overlap when source and target are the same file). This fixes xfstest generic/157 which was expecting EINVAL for that (and also e.g. for when the src offset is beyond end of file). Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
3fc11ff13f |
fs/smb: Fix inconsistent refcnt update
commit ab529e6ca1f67bcf31f3ea80c72bffde2e9e053e upstream. A possible inconsistent update of refcount was identified in `smb2_compound_op`. Such inconsistent update could lead to possible resource leaks. Why it is a possible bug: 1. In the comment section of the function, it clearly states that the reference to `cfile` should be dropped after calling this function. 2. Every control flow path would check and drop the reference to `cfile`, except the patched one. 3. Existing callers would not handle refcount update of `cfile` if -ENOMEM is returned. To fix the bug, an extra goto label "out" is added, to make sure that the cleanup logic would always be respected. As the problem is caused by the allocation failure of `vars`, the cleanup logic between label "finished" and "out" can be safely ignored. According to the definition of function `is_replayable_error`, the error code of "-ENOMEM" is not recoverable. Therefore, the replay logic also gets ignored. Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
d7f5e35e70 |
efivarfs: Fix slab-out-of-bounds in efivarfs_d_compare
[ Upstream commit a6358f8cf64850f3f27857b8ed8c1b08cfc4685c ]
Observed on kernel 6.6 (present on master as well):
BUG: KASAN: slab-out-of-bounds in memcmp+0x98/0xd0
Call trace:
kasan_check_range+0xe8/0x190
__asan_loadN+0x1c/0x28
memcmp+0x98/0xd0
efivarfs_d_compare+0x68/0xd8
__d_lookup_rcu_op_compare+0x178/0x218
__d_lookup_rcu+0x1f8/0x228
d_alloc_parallel+0x150/0x648
lookup_open.isra.0+0x5f0/0x8d0
open_last_lookups+0x264/0x828
path_openat+0x130/0x3f8
do_filp_open+0x114/0x248
do_sys_openat2+0x340/0x3c0
__arm64_sys_openat+0x120/0x1a0
If dentry->d_name.len < EFI_VARIABLE_GUID_LEN , 'guid' can become
negative, leadings to oob. The issue can be triggered by parallel
lookups using invalid filename:
T1 T2
lookup_open
->lookup
simple_lookup
d_add
// invalid dentry is added to hash list
lookup_open
d_alloc_parallel
__d_lookup_rcu
__d_lookup_rcu_op_compare
hlist_bl_for_each_entry_rcu
// invalid dentry can be retrieved
->d_compare
efivarfs_d_compare
// oob
Fix it by checking 'guid' before cmp.
Fixes:
|
||
![]() |
c32e3c71aa |
NFS: Fix a race when updating an existing write
commit 76d2e3890fb169168c73f2e4f8375c7cc24a765e upstream.
After nfs_lock_and_join_requests() tests for whether the request is
still attached to the mapping, nothing prevents a call to
nfs_inode_remove_request() from succeeding until we actually lock the
page group.
The reason is that whoever called nfs_inode_remove_request() doesn't
necessarily have a lock on the page group head.
So in order to avoid races, let's take the page group lock earlier in
nfs_lock_and_join_requests(), and hold it across the removal of the
request in nfs_inode_remove_request().
Reported-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Joe Quanaim <jdq@meta.com>
Tested-by: Andrew Steffen <aksteffen@meta.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes:
|
||
![]() |
ee7c9f8481 |
nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests
commit
|
||
![]() |
c9e7de284d |
smb: client: fix race with concurrent opens in rename(2)
[ Upstream commit d84291fc7453df7881a970716f8256273aca5747 ] Besides sending the rename request to the server, the rename process also involves closing any deferred close, waiting for outstanding I/O to complete as well as marking all existing open handles as deleted to prevent them from deferring closes, which increases the race window for potential concurrent opens on the target file. Fix this by unhashing the dentry in advance to prevent any concurrent opens on the target. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
![]() |
c3848c3dc8 |
smb: client: fix race with concurrent opens in unlink(2)
[ Upstream commit 0af1561b2d60bab2a2b00720a5c7b292ecc549ec ] According to some logs reported by customers, CIFS client might end up reporting unlinked files as existing in stat(2) due to concurrent opens racing with unlink(2). Besides sending the removal request to the server, the unlink process could involve closing any deferred close as well as marking all existing open handles as deleted to prevent them from deferring closes, which increases the race window for potential concurrent opens. Fix this by unhashing the dentry in cifs_unlink() to prevent any subsequent opens. Any open attempts, while we're still unlinking, will block on parent's i_rwsem. Reported-by: Jay Shin <jaeshin@redhat.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
![]() |
4664302159 |
alloc_fdtable(): change calling conventions.
[ Upstream commit
|
||
![]() |
901f62efd6 |
f2fs: fix to avoid out-of-boundary access in dnode page
[ Upstream commit
|
||
![]() |
1efdc57d42 |
f2fs: fix to call clear_page_private_reference in .{release,invalid}_folio
[ Upstream commit |
||
![]() |
0a98c1d614 |
ext4: preserve SB_I_VERSION on remount
[ Upstream commit f2326fd14a224e4cccbab89e14c52279ff79b7ec ] IMA testing revealed that after an ext4 remount, file accesses triggered full measurements even without modifications, instead of skipping as expected when i_version is unchanged. Debugging showed `SB_I_VERSION` was cleared in reconfigure_super() during remount due to commit |
||
![]() |
3100116850 |
use uniform permission checks for all mount propagation changes
[ Upstream commit cffd0441872e7f6b1fce5e78fb1c99187a291330 ]
do_change_type() and do_set_group() are operating on different
aspects of the same thing - propagation graph. The latter
asks for mounts involved to be mounted in namespace(s) the caller
has CAP_SYS_ADMIN for. The former is a mess - originally it
didn't even check that mount *is* mounted. That got fixed,
but the resulting check turns out to be too strict for userland -
in effect, we check that mount is in our namespace, having already
checked that we have CAP_SYS_ADMIN there.
What we really need (in both cases) is
* only touch mounts that are mounted. That's a must-have
constraint - data corruption happens if it get violated.
* don't allow to mess with a namespace unless you already
have enough permissions to do so (i.e. CAP_SYS_ADMIN in its userns).
That's an equivalent of what do_set_group() does; let's extract that
into a helper (may_change_propagation()) and use it in both
do_set_group() and do_change_type().
Fixes:
|
||
![]() |
c58c6b532b |
fs/buffer: fix use-after-free when call bh_read() helper
[ Upstream commit 7375f22495e7cd1c5b3b5af9dcc4f6dffe34ce49 ]
There's issue as follows:
BUG: KASAN: stack-out-of-bounds in end_buffer_read_sync+0xe3/0x110
Read of size 8 at addr ffffc9000168f7f8 by task swapper/3/0
CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.16.0-862.14.0.6.x86_64
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_address_description.constprop.0+0x2c/0x390
print_report+0xb4/0x270
kasan_report+0xb8/0xf0
end_buffer_read_sync+0xe3/0x110
end_bio_bh_io_sync+0x56/0x80
blk_update_request+0x30a/0x720
scsi_end_request+0x51/0x2b0
scsi_io_completion+0xe3/0x480
? scsi_device_unbusy+0x11e/0x160
blk_complete_reqs+0x7b/0x90
handle_softirqs+0xef/0x370
irq_exit_rcu+0xa5/0xd0
sysvec_apic_timer_interrupt+0x6e/0x90
</IRQ>
Above issue happens when do ntfs3 filesystem mount, issue may happens
as follows:
mount IRQ
ntfs_fill_super
read_cache_page
do_read_cache_folio
filemap_read_folio
mpage_read_folio
do_mpage_readpage
ntfs_get_block_vbo
bh_read
submit_bh
wait_on_buffer(bh);
blk_complete_reqs
scsi_io_completion
scsi_end_request
blk_update_request
end_bio_bh_io_sync
end_buffer_read_sync
__end_buffer_read_notouch
unlock_buffer
wait_on_buffer(bh);--> return will return to caller
put_bh
--> trigger stack-out-of-bounds
In the mpage_read_folio() function, the stack variable 'map_bh' is
passed to ntfs_get_block_vbo(). Once unlock_buffer() unlocks and
wait_on_buffer() returns to continue processing, the stack variable
is likely to be reclaimed. Consequently, during the end_buffer_read_sync()
process, calling put_bh() may result in stack overrun.
If the bh is not allocated on the stack, it belongs to a folio. Freeing
a buffer head which belongs to a folio is done by drop_buffers() which
will fail to free buffers which are still locked. So it is safe to call
put_bh() before __end_buffer_read_notouch().
Fixes:
|
||
![]() |
524e90e58a |
smb: server: split ksmbd_rdma_stop_listening() out of ksmbd_rdma_destroy()
[ Upstream commit bac7b996d42e458a94578f4227795a0d4deef6fa ]
We can't call destroy_workqueue(smb_direct_wq); before stop_sessions()!
Otherwise already existing connections try to use smb_direct_wq as
a NULL pointer.
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes:
|
||
![]() |
86c21c80c9 |
squashfs: fix memory leak in squashfs_fill_super
commit b64700d41bdc4e9f82f1346c15a3678ebb91a89c upstream.
If sb_min_blocksize returns 0, squashfs_fill_super exits without freeing
allocated memory (sb->s_fs_info).
Fix this by moving the call to sb_min_blocksize to before memory is
allocated.
Link: https://lkml.kernel.org/r/20250811223740.110392-1-phillip@squashfs.org.uk
Fixes:
|
||
![]() |
ba86db5b77 |
mm: update memfd seal write check to include F_SEAL_WRITE
[ Upstream commit
|
||
![]() |
74c68b295b |
btrfs: populate otime when logging an inode item
[ Upstream commit |
||
![]() |
3372691721 |
btrfs: send: use fallocate for hole punching with send stream v2
[ Upstream commit
|
||
![]() |
5a924f7ed6 |
xfs: fully decouple XFS_IBULK* flags from XFS_IWALK* flags
[ Upstream commit d2845519b0723c5d5a0266cbf410495f9b8fd65c ]
Fix up xfs_inumbers to now pass in the XFS_IBULK* flags into the flags
argument to xfs_inobt_walk, which expects the XFS_IWALK* flags.
Currently passing the wrong flags works for non-debug builds because
the only XFS_IWALK* flag has the same encoding as the corresponding
XFS_IBULK* flag, but in debug builds it can trigger an assert that no
incorrect flag is passed. Instead just extra the relevant flag.
Fixes:
|
||
![]() |
4290e34fb8 |
btrfs: abort transaction on unexpected eb generation at btrfs_copy_root()
[ Upstream commit
|
||
![]() |
7cda0fdde5 |
btrfs: qgroup: fix race between quota disable and quota rescan ioctl
[ Upstream commit
|
||
![]() |
4588a94a2d |
cifs: reset iface weights when we cannot find a candidate
[ Upstream commit |
||
![]() |
3c88aa59dd |
fscrypt: Don't use problematic non-inline crypto engines
[ Upstream commit |
||
![]() |
feb78a8dd8 |
btrfs: fix qgroup reservation leak on failure to allocate ordered extent
[ Upstream commit |
||
![]() |
e98dc1909f |
f2fs: fix to do sanity check on ino and xnid
[ Upstream commit
|
||
![]() |
26cb9aad94 |
jbd2: prevent softlockup in jbd2_log_do_checkpoint()
commit 9d98cf4632258720f18265a058e62fde120c0151 upstream. Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list() periodically release j_list_lock after processing a batch of buffers to avoid long hold times on the j_list_lock. However, since both functions contend for j_list_lock, the combined time spent waiting and processing can be significant. jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when need_resched() is true to avoid softlockups during prolonged operations. But jbd2_log_do_checkpoint() only exits its loop when need_resched() is true, relying on potentially sleeping functions like __flush_batch() or wait_on_buffer() to trigger rescheduling. If those functions do not sleep, the kernel may hit a softlockup. watchdog: BUG: soft lockup - CPU#3 stuck for 156s! [kworker/u129:2:373] CPU: 3 PID: 373 Comm: kworker/u129:2 Kdump: loaded Not tainted 6.6.0+ #10 Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.27 06/13/2017 Workqueue: writeback wb_workfn (flush-7:2) pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : native_queued_spin_lock_slowpath+0x358/0x418 lr : jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] Call trace: native_queued_spin_lock_slowpath+0x358/0x418 jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] __jbd2_log_wait_for_space+0xfc/0x2f8 [jbd2] add_transaction_credits+0x3bc/0x418 [jbd2] start_this_handle+0xf8/0x560 [jbd2] jbd2__journal_start+0x118/0x228 [jbd2] __ext4_journal_start_sb+0x110/0x188 [ext4] ext4_do_writepages+0x3dc/0x740 [ext4] ext4_writepages+0xa4/0x190 [ext4] do_writepages+0x94/0x228 __writeback_single_inode+0x48/0x318 writeback_sb_inodes+0x204/0x590 __writeback_inodes_wb+0x54/0xf8 wb_writeback+0x2cc/0x3d8 wb_do_writeback+0x2e0/0x2f8 wb_workfn+0x80/0x2a8 process_one_work+0x178/0x3e8 worker_thread+0x234/0x3b8 kthread+0xf0/0x108 ret_from_fork+0x10/0x20 So explicitly call cond_resched() in jbd2_log_do_checkpoint() to avoid softlockup. Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250812063752.912130-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
5412c47f3c |
ext4: fix hole length calculation overflow in non-extent inodes
commit 02c7f7219ac0e2277b3379a3a0e9841ef464b6d4 upstream.
In a filesystem with a block size larger than 4KB, the hole length
calculation for a non-extent inode in ext4_ind_map_blocks() can easily
exceed INT_MAX. Then it could return a zero length hole and trigger the
following waring and infinite in the iomap infrastructure.
------------[ cut here ]------------
WARNING: CPU: 3 PID: 434101 at fs/iomap/iter.c:34 iomap_iter_done+0x148/0x190
CPU: 3 UID: 0 PID: 434101 Comm: fsstress Not tainted 6.16.0-rc7+ #128 PREEMPT(voluntary)
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : iomap_iter_done+0x148/0x190
lr : iomap_iter+0x174/0x230
sp : ffff8000880af740
x29: ffff8000880af740 x28: ffff0000db8e6840 x27: 0000000000000000
x26: 0000000000000000 x25: ffff8000880af830 x24: 0000004000000000
x23: 0000000000000002 x22: 000001bfdbfa8000 x21: ffffa6a41c002e48
x20: 0000000000000001 x19: ffff8000880af808 x18: 0000000000000000
x17: 0000000000000000 x16: ffffa6a495ee6cd0 x15: 0000000000000000
x14: 00000000000003d4 x13: 00000000fa83b2da x12: 0000b236fc95f18c
x11: ffffa6a4978b9c08 x10: 0000000000001da0 x9 : ffffa6a41c1a2a44
x8 : ffff8000880af5c8 x7 : 0000000001000000 x6 : 0000000000000000
x5 : 0000000000000004 x4 : 000001bfdbfa8000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000004004030000 x0 : 0000000000000000
Call trace:
iomap_iter_done+0x148/0x190 (P)
iomap_iter+0x174/0x230
iomap_fiemap+0x154/0x1d8
ext4_fiemap+0x110/0x140 [ext4]
do_vfs_ioctl+0x4b8/0xbc0
__arm64_sys_ioctl+0x8c/0x120
invoke_syscall+0x6c/0x100
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x38/0x120
el0t_64_sync_handler+0x10c/0x138
el0t_64_sync+0x198/0x1a0
---[ end trace 0000000000000000 ]---
Cc: stable@kernel.org
Fixes:
|
||
![]() |
512e126d02 |
ext4: use kmalloc_array() for array space allocation
commit 76dba1fe277f6befd6ef650e1946f626c547387a upstream. Replace kmalloc(size * sizeof) with kmalloc_array() for safer memory allocation and overflow prevention. Cc: stable@kernel.org Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Link: https://patch.msgid.link/20250811125816.570142-1-liaoyuanhong@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
80c7d2cb45 |
ext4: don't try to clear the orphan_present feature block device is r/o
commit c5e104a91e7b6fa12c1dc2d8bf84abb7ef9b89ad upstream.
When the file system is frozen in preparation for taking an LVM
snapshot, the journal is checkpointed and if the orphan_file feature
is enabled, and the orphan file is empty, we clear the orphan_present
feature flag. But if there are pending inodes that need to be removed
the orphan_present feature flag can't be cleared.
The problem comes if the block device is read-only. In that case, we
can't process the orphan inode list, so it is skipped in
ext4_orphan_cleanup(). But then in ext4_mark_recovery_complete(),
this results in the ext4 error "Orphan file not empty on read-only fs"
firing and the file system mount is aborted.
Fix this by clearing the needs_recovery flag in the block device is
read-only. We do this after the call to ext4_load_and_init-journal()
since there are some error checks need to be done in case the journal
needs to be replayed and the block device is read-only, or if the
block device containing the externa journal is read-only, etc.
Cc: stable@kernel.org
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1108271
Cc: stable@vger.kernel.org
Fixes:
|
||
![]() |
7fad321554 |
ext4: fix reserved gdt blocks handling in fsmap
commit 3ffbdd1f1165f1b2d6a94d1b1aabef57120deaf7 upstream.
In some cases like small FSes with no meta_bg and where the resize
doesn't need extra gdt blocks as it can fit in the current one,
s_reserved_gdt_blocks is set as 0, which causes fsmap to emit a 0
length entry, which is incorrect.
$ mkfs.ext4 -b 65536 -O bigalloc /dev/sda 5G
$ mount /dev/sda /mnt/scratch
$ xfs_io -c "fsmap -d" /mnt/scartch
0: 253:48 [0..127]: static fs metadata 128
1: 253:48 [128..255]: special 102:1 128
2: 253:48 [256..255]: special 102:2 0 <---- 0 len entry
3: 253:48 [256..383]: special 102:3 128
Fix this by adding a check for this case.
Cc: stable@kernel.org
Fixes:
|
||
![]() |
30d16d088b |
ext4: fix fsmap end of range reporting with bigalloc
commit bae76c035bf0852844151e68098c9b7cd63ef238 upstream. With bigalloc enabled, the logic to report last extent has a bug since we try to use cluster units instead of block units. This can cause an issue where extra incorrect entries might be returned back to the user. This was flagged by generic/365 with 64k bs and -O bigalloc. ** Details of issue ** The issue was noticed on 5G 64k blocksize FS with -O bigalloc which has only 1 bg. $ xfs_io -c "fsmap -d" /mnt/scratch 0: 253:48 [0..127]: static fs metadata 128 /* sb */ 1: 253:48 [128..255]: special 102:1 128 /* gdt */ 3: 253:48 [256..383]: special 102:3 128 /* block bitmap */ 4: 253:48 [384..2303]: unknown 1920 /* flex bg empty space */ 5: 253:48 [2304..2431]: special 102:4 128 /* inode bitmap */ 6: 253:48 [2432..4351]: unknown 1920 /* flex bg empty space */ 7: 253:48 [4352..6911]: inodes 2560 8: 253:48 [6912..538623]: unknown 531712 9: 253:48 [538624..10485759]: free space |
||
![]() |
98296036aa |
ext4: check fast symlink for ea_inode correctly
commit b4cc4a4077268522e3d0d34de4b2dc144e2330fa upstream.
The check for a fast symlink in the presence of only an
external xattr inode is incorrect. If a fast symlink does
not have an xattr block (i_file_acl == 0), but does have
an external xattr inode that increases inode i_blocks, then
the check for a fast symlink will incorrectly fail and
__ext4_iget()->ext4_ind_check_inode() will report the inode
is corrupt when it "validates" i_data[] on the next read:
# ln -s foo /mnt/tmp/bar
# setfattr -h -n trusted.test \
-v "$(yes | head -n 4000)" /mnt/tmp/bar
# umount /mnt/tmp
# mount /mnt/tmp
# ls -l /mnt/tmp
ls: cannot access '/mnt/tmp/bar': Structure needs cleaning
total 4
? l?????????? ? ? ? ? ? bar
# dmesg | tail -1
EXT4-fs error (device dm-8): __ext4_iget:5098:
inode #24578: block 7303014: comm ls: invalid block
(note that "block 7303014" = 0x6f6f66 = "foo" in LE order).
ext4_inode_is_fast_symlink() should check the superblock
EXT4_FEATURE_INCOMPAT_EA_INODE feature flag, not the inode
EXT4_EA_INODE_FL, since the latter is only set on the xattr
inode itself, and not on the inode that uses this xattr.
Cc: stable@vger.kernel.org
Fixes:
|
||
![]() |
832da84046 |
ksmbd: extend the connection limiting mechanism to support IPv6
commit c0d41112f1a5828c194b59cca953114bc3776ef2 upstream.
Update the connection tracking logic to handle both IPv4 and IPv6
address families.
Cc: stable@vger.kernel.org
Fixes:
|
||
![]() |
fcb1f77b8e |
btrfs: do not allow relocation of partially dropped subvolumes
commit 4289b494ac553e74e86fed1c66b2bf9530bc1082 upstream.
[BUG]
There is an internal report that balance triggered transaction abort,
with the following call trace:
item 85 key (594509824 169 0) itemoff 12599 itemsize 33
extent refs 1 gen 197740 flags 2
ref#0: tree block backref root 7
item 86 key (594558976 169 0) itemoff 12566 itemsize 33
extent refs 1 gen 197522 flags 2
ref#0: tree block backref root 7
...
BTRFS error (device loop0): extent item not found for insert, bytenr 594526208 num_bytes 16384 parent 449921024 root_objectid 934 owner 1 offset 0
BTRFS error (device loop0): failed to run delayed ref for logical 594526208 num_bytes 16384 type 182 action 1 ref_mod 1: -117
------------[ cut here ]------------
BTRFS: Transaction aborted (error -117)
WARNING: CPU: 1 PID: 6963 at ../fs/btrfs/extent-tree.c:2168 btrfs_run_delayed_refs+0xfa/0x110 [btrfs]
And btrfs check doesn't report anything wrong related to the extent
tree.
[CAUSE]
The cause is a little complex, firstly the extent tree indeed doesn't
have the backref for 594526208.
The extent tree only have the following two backrefs around that bytenr
on-disk:
item 65 key (594509824 METADATA_ITEM 0) itemoff 13880 itemsize 33
refs 1 gen 197740 flags TREE_BLOCK
tree block skinny level 0
(176 0x7) tree block backref root CSUM_TREE
item 66 key (594558976 METADATA_ITEM 0) itemoff 13847 itemsize 33
refs 1 gen 197522 flags TREE_BLOCK
tree block skinny level 0
(176 0x7) tree block backref root CSUM_TREE
But the such missing backref item is not an corruption on disk, as the
offending delayed ref belongs to subvolume 934, and that subvolume is
being dropped:
item 0 key (934 ROOT_ITEM 198229) itemoff 15844 itemsize 439
generation 198229 root_dirid 256 bytenr 10741039104 byte_limit 0 bytes_used 345571328
last_snapshot 198229 flags 0x1000000000001(RDONLY) refs 0
drop_progress key (206324 EXTENT_DATA 2711650304) drop_level 2
level 2 generation_v2 198229
And that offending tree block 594526208 is inside the dropped range of
that subvolume. That explains why there is no backref item for that
bytenr and why btrfs check is not reporting anything wrong.
But this also shows another problem, as btrfs will do all the orphan
subvolume cleanup at a read-write mount.
So half-dropped subvolume should not exist after an RW mount, and
balance itself is also exclusive to subvolume cleanup, meaning we
shouldn't hit a subvolume half-dropped during relocation.
The root cause is, there is no orphan item for this subvolume.
In fact there are 5 subvolumes from around 2021 that have the same
problem.
It looks like the original report has some older kernels running, and
caused those zombie subvolumes.
Thankfully upstream commit
|
||
![]() |
d548030f92 |
btrfs: fix log tree replay failure due to file with 0 links and extents
commit |
||
![]() |
184fd94b13 |
btrfs: zoned: do not remove unwritten non-data block group
commit |
||
![]() |
940baded5e |
btrfs: abort transaction during log replay if walk_log_tree() failed
commit
|
||
![]() |
e50baade7b |
btrfs: zoned: use filesystem size not disk size for reclaim decision
commit |
||
![]() |
8e448bf5dc |
ext4: fix largest free orders lists corruption on mb_optimize_scan switch
commit |
||
![]() |
2199da0bb3 |
ext4: fix zombie groups in average fragment size lists
commit |