commit 8e7d178d06 upstream.
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable@vger.kernel.org
Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6bb919397 upstream.
Repeated connections from clients with the same IP address may exhaust
the max connections and prevent other normal client connections.
This patch limit repeated connections from clients with the same IP.
Reported-by: tianshuo han <hantianshuo233@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b445309ee upstream.
SMB3.1.1 POSIX mounts require sockets to be created with NFS reparse
points.
Cc: linux-cifs@vger.kernel.org
Cc: Ralph Boehme <slow@samba.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-by: Matthew Richardson <m.richardson@ed.ac.uk>
Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a967e758f8 upstream.
SMB3.1.1 POSIX mounts require symlinks to be created natively with
IO_REPARSE_TAG_SYMLINK reparse point.
Cc: linux-cifs@vger.kernel.org
Cc: Ralph Boehme <slow@samba.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-by: Matthew Richardson <m.richardson@ed.ac.uk>
Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59b33fab4c upstream.
After commit 5c70eb5c59 ("net: better track kernel sockets lifetime"),
kernel sockets now use net_passive reference counting. However, commit
95d2b9f693 ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
restored the manual socket refcount manipulation without adapting to this
new mechanism, causing a memory leak.
The issue can be reproduced by[1]:
1. Creating a network namespace
2. Mounting and Unmounting CIFS within the namespace
3. Deleting the namespace
Some memory leaks may appear after a period of time following step 3.
unreferenced object 0xffff9951419f6b00 (size 256):
comm "ip", pid 447, jiffies 4294692389 (age 14.730s)
hex dump (first 32 bytes):
1b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 80 77 c2 44 51 99 ff ff .........w.DQ...
backtrace:
__kmem_cache_alloc_node+0x30e/0x3d0
__kmalloc+0x52/0x120
net_alloc_generic+0x1d/0x30
copy_net_ns+0x86/0x200
create_new_namespaces+0x117/0x300
unshare_nsproxy_namespaces+0x60/0xa0
ksys_unshare+0x148/0x360
__x64_sys_unshare+0x12/0x20
do_syscall_64+0x59/0x110
entry_SYSCALL_64_after_hwframe+0x78/0xe2
...
unreferenced object 0xffff9951442e7500 (size 32):
comm "mount.cifs", pid 475, jiffies 4294693782 (age 13.343s)
hex dump (first 32 bytes):
40 c5 38 46 51 99 ff ff 18 01 96 42 51 99 ff ff @.8FQ......BQ...
01 00 00 00 6f 00 c5 07 6f 00 d8 07 00 00 00 00 ....o...o.......
backtrace:
__kmem_cache_alloc_node+0x30e/0x3d0
kmalloc_trace+0x2a/0x90
ref_tracker_alloc+0x8e/0x1d0
sk_alloc+0x18c/0x1c0
inet_create+0xf1/0x370
__sock_create+0xd7/0x1e0
generic_ip_connect+0x1d4/0x5a0 [cifs]
cifs_get_tcp_session+0x5d0/0x8a0 [cifs]
cifs_mount_get_session+0x47/0x1b0 [cifs]
dfs_mount_share+0xfa/0xa10 [cifs]
cifs_mount+0x68/0x2b0 [cifs]
cifs_smb3_do_mount+0x10b/0x760 [cifs]
smb3_get_tree+0x112/0x2e0 [cifs]
vfs_get_tree+0x29/0xf0
path_mount+0x2d4/0xa00
__se_sys_mount+0x165/0x1d0
Root cause:
When creating kernel sockets, sk_alloc() calls net_passive_inc() for
sockets with sk_net_refcnt=0. The CIFS code manually converts kernel
sockets to user sockets by setting sk_net_refcnt=1, but doesn't call
the corresponding net_passive_dec(). This creates an imbalance in the
net_passive counter, which prevents the network namespace from being
destroyed when its last user reference is dropped. As a result, the
entire namespace and all its associated resources remain allocated.
Timeline of patches leading to this issue:
- commit ef7134c7fc ("smb: client: Fix use-after-free of network
namespace.") in v6.12 fixed the original netns UAF by manually
managing socket refcounts
- commit e9f2517a3e ("smb: client: fix TCP timers deadlock after
rmmod") in v6.13 attempted to use kernel sockets but introduced
TCP timer issues
- commit 5c70eb5c59 ("net: better track kernel sockets lifetime")
in v6.14-rc5 introduced the net_passive mechanism with
sk_net_refcnt_upgrade() for proper socket conversion
- commit 95d2b9f693 ("Revert "smb: client: fix TCP timers deadlock
after rmmod"") in v6.15-rc3 reverted to manual refcount management
without adapting to the new net_passive changes
Fix this by using sk_net_refcnt_upgrade() which properly handles the
net_passive counter when converting kernel sockets to user sockets.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220343 [1]
Fixes: 95d2b9f693 ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
Cc: stable@vger.kernel.org
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f8ff9486f upstream.
If STATX_BASIC_STATS flags are not given as an argument to vfs_getattr,
It can not get ctime and mtime in kstat.
This causes a problem showing mtime and ctime outdated from cifs.ko.
File: /xfstest.test/foo
Size: 4096 Blocks: 8 IO Block: 1048576 regular file
Device: 0,65 Inode: 2033391 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Context: system_u:object_r:cifs_t:s0
Access: 2025-07-23 22:15:30.136051900 +0100
Modify: 1970-01-01 01:00:00.000000000 +0100
Change: 1970-01-01 01:00:00.000000000 +0100
Birth: 2025-07-23 22:15:30.136051900 +0100
Cc: stable@vger.kernel.org
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44a3059c4c upstream.
If client send multiple session setup requests to ksmbd,
Preauh_HashValue race condition could happen.
There is no need to free sess->Preauh_HashValue at session setup phase.
It can be freed together with session at connection termination phase.
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27661
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b493ab6f3 upstream.
If client send two session setups with krb5 authenticate to ksmbd,
null pointer dereference error in generate_encryptionkey could happen.
sess->Preauth_HashValue is set to NULL if session is valid.
So this patch skip generate encryption key if session is valid.
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27654
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e5a73150776f18547ee685c9f6bfafe549714899 upstream.
If two calls to nfsd_open_local_fh() race and both successfully call
nfsd_file_acquire_local(), they will both get an extra reference to the
net to accompany the file reference stored in *pnf.
One of them will fail to store (using xchg()) the file reference in
*pnf and will drop that reference but WON'T drop the accompanying
reference to the net. This leak means that when the nfs server is shut
down it will hang in nfsd_shutdown_net() waiting for
&nn->nfsd_net_free_done.
This patch adds the missing nfsd_net_put().
Reported-by: Mike Snitzer <snitzer@kernel.org>
Fixes: e6f7e1487a ("nfs_localio: simplify interface to nfsd for getting nfsd_file")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neil@brown.name>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f9a348e0de19226fc3c7e81de7677d3fa2c4b2d8 upstream.
Clients will typically precede a DELEGRETURN for a delegation with
delegated timestamp with a SETATTR to set the timestamps on the server
to match what the client has.
knfsd implements this by using the nfsd_setattr() infrastructure, which
will set ATTR_CTIME on any update that goes to notify_change(). This is
problematic as it means that the client will get a spurious ctime
update when updating the atime.
POSIX unfortunately doesn't phrase it succinctly, but updating the atime
due to reads should not update the ctime. In this case, the client is
sending a SETATTR to update the atime on the server to match its latest
value. The ctime should not be advanced in this case as that would
incorrectly indicate a change to the inode.
Fix this by not implicitly setting ATTR_CTIME when ATTR_DELEG is set in
__nfsd_setattr(). The decoder for FATTR4_WORD2_TIME_DELEG_MODIFY already
sets ATTR_CTIME, so this is sufficient to make it skip setting the ctime
on atime-only updates.
Fixes: 7e13f4f8d2 ("nfsd: handle delegated timestamps in SETATTR")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 03537826f7 ]
This matches the timeout for tcp connections.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24eff17887 ]
Calling enqueue_reassembly() and wake_up_interruptible(&info->wait_reassembly_queue)
or put_receive_buffer() means the response/data_transfer pointer might
get re-used by another thread, which means these should be
the last operations before calling return.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bdd7afc6dc ]
We should call put_receive_buffer() before waking up the callers.
For the internal error case of response->type being unexpected,
we now also call smbd_disconnect_rdma_connection() instead
of not waking up the callers at all.
Note that the SMBD_TRANSFER_DATA case still has problems,
which will be addressed in the next commit in order to make
it easier to review this one.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 047682c370 ]
In case of failures either ib_dma_map_single() might not be called yet
or ib_dma_unmap_single() was already called.
We should make sure put_receive_buffer() only calls
ib_dma_unmap_single() if needed.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24b6afc36d ]
There's no need to maintain two lists, we can just
have a single list of receive buffers, which are free to use.
It just added unneeded complexity and resulted in
ib_dma_unmap_single() not being called from recv_done()
for empty keepalive packets.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a6c015b7ac ]
Calling enqueue_reassembly() and wake_up_interruptible(&t->wait_reassembly_queue)
or put_receive_buffer() means the recvmsg/data_transfer pointer might
get re-used by another thread, which means these should be
the last operations before calling return.
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cfe76fdbb9 ]
We should call put_recvmsg() before smb_direct_disconnect_rdma_connection()
in order to call it before waking up the callers.
In all error cases we should call smb_direct_disconnect_rdma_connection()
in order to avoid stale connections.
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit afb4108c92 ]
In case of failures either ib_dma_map_single() might not be called yet
or ib_dma_unmap_single() was already called.
We should make sure put_recvmsg() only calls ib_dma_unmap_single() if needed.
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 01027a62b5 ]
There's no need to maintain two lists, we can just
have a single list of receive buffers, which are free to use.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fdd015de76 ]
In order for the wait in nfs_uuid_put() to be safe, it is necessary to
ensure that nfs_uuid_add_file() doesn't add a new entry once the
nfs_uuid->net has been NULLed out.
Also fix up the wake_up_var_locked() / wait_var_event_spinlock() to both
use the nfs_uuid address, since nfl, and &nfl->uuid could be used elsewhere.
Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/all/175262893035.2234665.1735173020338594784@noble.neil.brown.name/
Fixes: 21fb440346 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e144d53cf2 ]
If the struct nfs_file_localio is closed, its list entry will be empty,
but the nfs_uuid->files list might still contain other entries.
Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Fixes: 21fb440346 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99765233ab ]
If the NFS client is doing writeback from a workqueue context, avoid using
__GFP_NORETRY for allocations if the task has set PF_MEMALLOC_NOIO or
PF_MEMALLOC_NOFS. The combination of these flags makes memory allocation
failures much more likely.
We've seen those allocation failures show up when the loopback driver is
doing writeback from a workqueue to a file on NFS, where memory allocation
failure results in errors or corruption within the loopback device's
filesystem.
Suggested-by: Trond Myklebust <trondmy@kernel.org>
Fixes: 0bae835b63 ("NFS: Avoid writeback threads getting stuck in mempool_alloc()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/f83ac1155a4bc670f2663959a7a068571e06afd9.1752111622.git.bcodding@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9acb237def ]
Currently, when the server supports NFS4.1 security labels then
security.selinux label in included twice. Instead, only add it
when the server doesn't possess security label support.
Fixes: 243fea1346 ("NFSv4.2: fix listxattr to return selinux security label")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Link: https://lore.kernel.org/r/20250722205641.79394-1-okorniev@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ef93a685e0 ]
The function needs to check the minimal filehandle length before it can
access the embedded filehandle.
Reported-by: zhangjian <zhangjian496@huawei.com>
Fixes: 20fa190272 ("nfs: add export operations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f06bedfa62 ]
When an applications get killed (SIGTERM/SIGINT) while pNFS client performs a connection
to DS, client ends in an infinite loop of connect-disconnect. This
source of the issue, it that flexfilelayoutdev#nfs4_ff_layout_prepare_ds gets an error
on nfs4_pnfs_ds_connect with status ERESTARTSYS, which is set by rpc_signal_task, but
the error is treated as transient, thus retried.
The issue is reproducible with Ctrl+C the following script(there should be ~1000 files in
a directory, client should must not have any connections to DSes):
```
echo 3 > /proc/sys/vm/drop_caches
for i in *
do
head -1 $i
done
```
The change aims to propagate the nfs4_ff_layout_prepare_ds error state
to the caller that can decide whatever this is a retryable error or not.
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Link: https://lore.kernel.org/r/20250627071751.189663-1-tigran.mkrtchyan@desy.de
Fixes: 260f32adb8 ("pNFS/flexfiles: Check the result of nfs4_pnfs_ds_connect")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f2d42a17b ]
Test: androbench by default setting, use 64GB sdcard.
the random write speed:
without this patch 3.5MB/s
with this patch 7MB/s
After patch "11a347fb6cef", the random write speed decreased significantly.
the .write_iter() interface had been modified, and check the differences
with generic_file_write_iter(), when calling generic_write_sync() and
exfat_file_write_iter() to call vfs_fsync_range(), the fdatasync flag is
wrong, and make not use the fdatasync mode, and make random write speed
decreased. So use generic_write_sync() instead of vfs_fsync_range().
Fixes: 11a347fb6c ("exfat: change to get file size from DataLength")
Signed-off-by: Zhengxu Zhang <zhengxu.zhang@unisoc.com>
Acked-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1005a3ca28 ]
w/ "mode=lfs" mount option, generic/299 will cause system panic as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2835!
Call Trace:
<TASK>
f2fs_allocate_data_block+0x6f4/0xc50
f2fs_map_blocks+0x970/0x1550
f2fs_iomap_begin+0xb2/0x1e0
iomap_iter+0x1d6/0x430
__iomap_dio_rw+0x208/0x9a0
f2fs_file_write_iter+0x6b3/0xfa0
aio_write+0x15d/0x2e0
io_submit_one+0x55e/0xab0
__x64_sys_io_submit+0xa5/0x230
do_syscall_64+0x84/0x2f0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0010:new_curseg+0x70f/0x720
The root cause of we run out-of-space is: in f2fs_map_blocks(), f2fs may
trigger foreground gc only if it allocates any physical block, it will be
a little bit later when there is multiple threads writing data w/
aio/dio/bufio method in parallel, since we always use OPU in lfs mode, so
f2fs_map_blocks() does block allocations aggressively.
In order to fix this issue, let's give a chance to trigger foreground
gc in prior to block allocation in f2fs_map_blocks().
Fixes: 36abef4e79 ("f2fs: introduce mode=lfs mount option")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e194e140ab ]
In lfs mode, dirty data needs OPU, we'd better calculate lower_p and
upper_p w/ them during has_not_enough_free_secs(), otherwise we may
encounter out-of-space issue due to we missed to reclaim enough
free section w/ foreground gc.
Fixes: 36abef4e79 ("f2fs: introduce mode=lfs mount option")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5661998536 ]
- touch /mnt/f2fs/012345678901234567890123456789012345678901234567890123
- truncate -s $((1024*1024*1024)) \
/mnt/f2fs/012345678901234567890123456789012345678901234567890123
- touch /mnt/f2fs/file
- truncate -s $((1024*1024*1024)) /mnt/f2fs/file
- mkfs.f2fs /mnt/f2fs/012345678901234567890123456789012345678901234567890123 \
-c /mnt/f2fs/file
- mount /mnt/f2fs/012345678901234567890123456789012345678901234567890123 \
/mnt/f2fs/loop
[16937.192225] F2FS-fs (loop0): Mount Device [ 0]: /mnt/f2fs/012345678901234567890123456789012345678901234567890123\xff\x01, 511, 0 - 3ffff
[16937.192268] F2FS-fs (loop0): Failed to find devices
If device path length equals to MAX_PATH_LEN, sbi->devs.path[] may
not end up w/ null character due to path array is fully filled, So
accidently, fields locate after path[] may be treated as part of
device path, result in parsing wrong device path.
struct f2fs_dev_info {
...
char path[MAX_PATH_LEN];
...
};
Let's add one byte space for sbi->devs.path[] to store null
character of device path string.
Fixes: 3c62be17d4 ("f2fs: support multiple devices")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a96d1d73c ]
This patch adds missing upper boundary check while setting
gc_valid_thresh_ratio via sysfs.
Fixes: e791d00bd0 ("f2fs: add valid block ratio not to do excessive GC for one time GC")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 154467f4ad ]
KMSAN reported a use of uninitialized value in `__is_extent_mergeable()`
and `__is_back_mergeable()` via the read extent tree path.
The root cause is that `get_read_extent_info()` only initializes three
fields (`fofs`, `blk`, `len`) of `struct extent_info`, leaving the
remaining fields uninitialized. This leads to undefined behavior
when those fields are accessed later, especially during
extent merging.
Fix it by zero-initializing the `extent_info` struct before population.
Reported-by: syzbot+b8c1d60e95df65e827d4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b8c1d60e95df65e827d4
Fixes: 94afd6d6e5 ("f2fs: extent cache: support unaligned extent")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Abinash Singh <abinashsinghlalotra@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39868685c2 ]
The decompress_io_ctx may be released asynchronously after
I/O completion. If this file is deleted immediately after read,
and the kworker of processing post_read_wq has not been executed yet
due to high workloads, It is possible that the inode(f2fs_inode_info)
is evicted and freed before it is used f2fs_free_dic.
The UAF case as below:
Thread A Thread B
- f2fs_decompress_end_io
- f2fs_put_dic
- queue_work
add free_dic work to post_read_wq
- do_unlink
- iput
- evict
- call_rcu
This file is deleted after read.
Thread C kworker to process post_read_wq
- rcu_do_batch
- f2fs_free_inode
- kmem_cache_free
inode is freed by rcu
- process_scheduled_works
- f2fs_late_free_dic
- f2fs_free_dic
- f2fs_release_decomp_mem
read (dic->inode)->i_compress_algorithm
This patch store compress_algorithm and sbi in dic to avoid inode UAF.
In addition, the previous solution is deprecated in [1] may cause system hang.
[1] https://lore.kernel.org/all/c36ab955-c8db-4a8b-a9d0-f07b5f426c3f@kernel.org
Cc: Daeho Jeong <daehojeong@google.com>
Fixes: bff139b49d ("f2fs: handle decompress only post processing in softirq")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Baocong Liu <baocong.liu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8e2a9b6564 ]
No logic changes, just cleanup and prepare for fixing the UAF issue
in f2fs_free_dic.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Baocong Liu <baocong.liu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 39868685c2 ("f2fs: compress: fix UAF of f2fs_inode_info in f2fs_free_dic")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 90d5c9ba3e ]
=============================
[ BUG: Invalid wait context ]
6.13.0-rc1 #84 Tainted: G O
-----------------------------
cat/56160 is trying to lock:
ffff888105c86648 (&cprc->stat_lock){+.+.}-{3:3}, at: update_general_status+0x32a/0x8c0 [f2fs]
other info that might help us debug this:
context-{5:5}
2 locks held by cat/56160:
#0: ffff88810a002a98 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x56/0x4c0
#1: ffffffffa0462638 (f2fs_stat_lock){....}-{2:2}, at: stat_show+0x29/0x1020 [f2fs]
stack backtrace:
CPU: 0 UID: 0 PID: 56160 Comm: cat Tainted: G O 6.13.0-rc1 #84
Tainted: [O]=OOT_MODULE
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
<TASK>
dump_stack_lvl+0x88/0xd0
dump_stack+0x14/0x20
__lock_acquire+0x8d4/0xbb0
lock_acquire+0xd6/0x300
_raw_spin_lock+0x38/0x50
update_general_status+0x32a/0x8c0 [f2fs]
stat_show+0x50/0x1020 [f2fs]
seq_read_iter+0x116/0x4c0
seq_read+0xfa/0x130
full_proxy_read+0x66/0x90
vfs_read+0xc4/0x350
ksys_read+0x74/0xf0
__x64_sys_read+0x1d/0x20
x64_sys_call+0x17d9/0x1b80
do_syscall_64+0x68/0x130
entry_SYSCALL_64_after_hwframe+0x67/0x6f
RIP: 0033:0x7f2ca53147e2
- seq_read
- stat_show
- raw_spin_lock_irqsave(&f2fs_stat_lock, flags)
: f2fs_stat_lock is raw_spinlock_t type variable
- update_general_status
- spin_lock(&sbi->cprc_info.stat_lock);
: stat_lock is spinlock_t type variable
The root cause is the lock order is incorrect [1], we should not acquire
spinlock_t lock after raw_spinlock_t lock, as if CONFIG_PREEMPT_LOCK is
on, spinlock_t is implemented based on rtmutex, which can sleep after
holding the lock.
To fix this issue, let's use change f2fs_stat_lock lock type from
raw_spinlock_t to spinlock_t, it's safe due to:
- we don't need to use raw version of spinlock as the path is not
performance sensitive.
- we don't need to use irqsave version of spinlock as it won't be
used in irq context.
Quoted from [1]:
"Extend lockdep to validate lock wait-type context.
The current wait-types are:
LD_WAIT_FREE, /* wait free, rcu etc.. */
LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */
Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).
This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep()."
[1] https://lore.kernel.org/all/20200321113242.427089655@linutronix.de
Fixes: 98237fcda4 ("f2fs: use spin_lock to avoid hang")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8142daf8a5 ]
one_time mode is only for background GC. So, we need to set it back to
false when foreground GC is enforced.
Fixes: 9748c2ddea ("f2fs: do FG_GC when GC boosting is required for zoned devices")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97103dcec2 ]
The sizeof(void *) is the incorrect argument in the kmalloc_array call, it
best to fix this by using sizeof(*cache_folios) instead.
Fortunately the sizes of void* and folio* happen to be the same, so this
has not shown up as a run time issue.
[akpm@linux-foundation.org: fix build]
Link: https://lkml.kernel.org/r/20250708142604.1891156-1-colin.i.king@gmail.com
Fixes: 2e227ff5e2 ("squashfs: add optional full compressed block caching")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Chanho Min <chanho.min@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c9e3fb050e ]
Remove an access to page->mapping and a few calls to the old page-based
APIs. This doesn't support large folios, but it's still a nice
improvement.
Link: https://lkml.kernel.org/r/20250612143903.2849289-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 97103dcec2 ("squashfs: fix incorrect argument to sizeof in kmalloc_array call")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 856db37592 ]
In dbAllocCtl(), read_metapage() increases the reference count of the
metapage. However, when dp->tree.budmin < 0, the function returns -EIO
without calling release_metapage() to decrease the reference count,
leading to a memory leak.
Add release_metapage(mp) before the error return to properly manage
the metapage reference count and prevent the leak.
Fixes: a5f5e4698f ("jfs: fix shift-out-of-bounds in dbSplit")
Signed-off-by: Zheng Yu <zheng.yu@northwestern.edu>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be77ab6b9f ]
Zero-length AV pairs should be considered as valid target infos.
Don't skip the next AV pairs that follow them.
Cc: linux-cifs@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Fixes: 0e8ae9b953 ("smb: client: parse av pair type 4 in CHALLENGE_MESSAGE")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2138e89cb0 ]
The do_k_string() and do_c_string() functions do essentially the same
thing which is they add a string and a comma onto the end of an existing
string. At the end, the caller will overwrite the last comma with a
newline. Later, in orangefs_kernel_debug_init(), we add a newline to
the string.
The change to do_k_string() is just cosmetic. I moved the "- 1" to
the other side of the comparison and made it "+ 1". This has no
effect on runtime, I just wanted the functions to match each other
and the rest of the file.
However in do_c_string(), I removed the "- 2" which allows us to print
two extra characters. I noticed this issue while reviewing the code
and I doubt affects anything in real life. My guess is that this was
double counting the comma and the newline. The "+ 1" accounts for
the newline, and the caller will delete the final comma which ensures
there is enough space for the newline.
Removing the "- 2" lets us print 2 more characters, but mainly it makes
the code more consistent and understandable for reviewers.
Fixes: 44f4641073 ("orangefs: clean up debugfs globals")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 91b8ca8b26 ]
Currently we clear BH_New bit in case of error and also in the standard
ext4_write_end() handler (in block_commit_write()). However
ext4_journalled_write_end() misses this clearing and thus we are leaving
stale BH_New bits behind. Generally ext4_block_write_begin() clears
these bits before any harm can be done but in case blocksize < pagesize
and we hit some error when processing a page with these stale bits,
we'll try to zero buffers with these stale BH_New bits and jbd2 will
complain (as buffers were not prepared for writing in this transaction).
Fix the problem by clearing BH_New bits in ext4_journalled_write_end()
and WARN if ext4_block_write_begin() sees stale BH_New bits.
Reported-by: Baolin Liu <liubaolin12138@163.com>
Reported-by: Zhi Long <longzhi@sangfor.com.cn>
Fixes: 3910b513fc ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250709084831.23876-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>