commit 7f9ee5fc97 upstream.
Fix a memory leak in bpf_prog_test_run_xdp() where the context buffer
allocated by bpf_ctx_init() is not freed when the function returns early
due to a data size check.
On the failing path:
ctx = bpf_ctx_init(...);
if (kattr->test.data_size_in - meta_sz < ETH_HLEN)
return -EINVAL;
The early return bypasses the cleanup label that kfree()s ctx, leading to a
leak detectable by kmemleak under fuzzing. Change the return to jump to the
existing free_ctx label.
Fixes: fe9544ed1a ("bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN")
Reported-by: BPF Runtime Fuzzer (BRF)
Signed-off-by: Shardul Bankar <shardulsb08@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20251014120037.1981316-1-shardulsb08@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0e1677654259a2f3ccf728de1edde922a3c4ba57 ]
A race condition was found in sg_proc_debug_helper(). It was observed on
a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring
/proc/scsi/sg/debug every second. A very large elapsed time would
sometimes appear. This is caused by two race conditions.
We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64
architecture. A patched kernel was built, and the race condition could
not be observed anymore after the application of this patch. A
reproducer C program utilising the scsi_debug module was also built by
Changhui Zhong and can be viewed here:
https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c
The first race happens between the reading of hp->duration in
sg_proc_debug_helper() and request completion in sg_rq_end_io(). The
hp->duration member variable may hold either of two types of
information:
#1 - The start time of the request. This value is present while
the request is not yet finished.
#2 - The total execution time of the request (end_time - start_time).
If sg_proc_debug_helper() executes *after* the value of hp->duration was
changed from #1 to #2, but *before* srp->done is set to 1 in
sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the
elapsed time (value type #2) is subtracted from a timestamp, which
cannot yield a valid elapsed time (which is a type #2 value as well).
To fix this issue, the value of hp->duration must change under the
protection of the sfp->rq_list_lock in sg_rq_end_io(). Since
sg_proc_debug_helper() takes this read lock, the change to srp->done and
srp->header.duration will happen atomically from the perspective of
sg_proc_debug_helper() and the race condition is thus eliminated.
The second race condition happens between sg_proc_debug_helper() and
sg_new_write(). Even though hp->duration is set to the current time
stamp in sg_add_request() under the write lock's protection, it gets
overwritten by a call to get_sg_io_hdr(), which calls copy_from_user()
to copy struct sg_io_hdr from userspace into kernel space. hp->duration
is set to the start time again in sg_common_write(). If
sg_proc_debug_helper() is called between these two calls, an arbitrary
value set by userspace (usually zero) is used to compute the elapsed
time.
To fix this issue, hp->duration must be set to the current timestamp
again after get_sg_io_hdr() returns successfully. A small race window
still exists between get_sg_io_hdr() and setting hp->duration, but this
window is only a few instructions wide and does not result in observable
issues in practice, as confirmed by testing.
Additionally, we fix the format specifier from %d to %u for printing
unsigned int values in sg_proc_debug_helper().
Signed-off-by: Michal Rábek <mrabek@redhat.com>
Suggested-by: Tomas Henzl <thenzl@redhat.com>
Tested-by: Changhui Zhong <czhong@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Link: https://patch.msgid.link/20251212160900.64924-1-mrabek@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 90ed688792a6b7012b3e8a2f858bc3fe7454d0eb ]
Drivers does cache sync during runtime resume, setting all writable
registers. Not all writable registers are set in cache default, resulting
in the erorr message:
fsl-sai 30c30000.sai: using zero-initialized flat cache, this may cause
unexpected behavior
Fix this by adding missing writable register defaults.
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://patch.msgid.link/20251216102246.676181-1-alexander.stein@ew.tq-group.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e2cb8ef0372665854fca6fa7b30b20dd35acffeb ]
Add a DMI quirk for the Honor MagicBook X16 2025 laptop
fixing the issue where the internal microphone was
not detected.
Signed-off-by: Andrew Elantsev <elantsew.andrew@gmail.com>
Link: https://patch.msgid.link/20251210203800.142822-1-elantsew.andrew@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d5602236f5db19e8b337a2cd87a90ace5ea776d ]
syzbot is still reporting
unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
even after commit 93a27b5891 ("can: j1939: add missing calls in
NETDEV_UNREGISTER notification handler") was added. A debug printk() patch
found that j1939_session_activate() can succeed even after
j1939_cancel_active_session() from j1939_netdev_notify(NETDEV_UNREGISTER)
has completed.
Since j1939_cancel_active_session() is processed with the session list lock
held, checking ndev->reg_state in j1939_session_activate() with the session
list lock held can reliably close the race window.
Reported-by: syzbot <syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/b9653191-d479-4c8b-8536-1326d028db5c@I-love.SAKURA.ne.jp
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit efc4c35b741af973de90f6826bf35d3b3ac36bf1 ]
Fix inconsistent error handling for sscanf() return value check.
Implicit boolean conversion is used instead of explicit return
value checks. The code checks if (!sscanf(...)) which is incorrect
because:
1. sscanf returns the number of successfully parsed items
2. On success, it returns 1 (one item passed)
3. On failure, it returns 0 or EOF
4. The check 'if (!sscanf(...))' is wrong because it treats
success (1) as failure
All occurrences of sscanf() now uses explicit return value check.
With this behavior it returns '-EINVAL' when parsing fails (returns
0 or EOF), and continues when parsing succeeds (returns 1).
Signed-off-by: Sumeet Pawnikar <sumeet4linux@gmail.com>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20251207151549.202452-1-sumeet4linux@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7bda1910c4bccd4b8d4726620bb3d6bbfb62286e ]
The device becomes visible to userspace via device_register()
even before it fully initialized by idr_init(). If userspace
or another thread tries to register a zone immediately after
device_register(), the control_type_valid() will fail because
the control_type is not yet in the list. The IDR is not yet
initialized, so this race condition causes zone registration
failure.
Move idr_init() and list addition before device_register()
fix the race condition.
Signed-off-by: Sumeet Pawnikar <sumeet4linux@gmail.com>
[ rjw: Subject adjustment, empty line added ]
Link: https://patch.msgid.link/20251205190216.5032-1-sumeet4linux@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec69daabe45256f98ac86c651b8ad1b2574489a7 ]
syzbot is reporting
unregister_netdevice: waiting for sit0 to become free. Usage count = 2
problem. A debug printk() patch found that a refcount is obtained at
xdp_convert_md_to_buff() from bpf_prog_test_run_xdp().
According to commit ec94670fcb ("bpf: Support specifying ingress via
xdp_md context in BPF_PROG_TEST_RUN"), the refcount obtained by
xdp_convert_md_to_buff() will be released by xdp_convert_buff_to_md().
Therefore, we can consider that the error handling path introduced by
commit 1c19499825 ("bpf: introduce frags support to
bpf_prog_test_run_xdp()") forgot to call xdp_convert_buff_to_md().
Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Fixes: 1c19499825 ("bpf: introduce frags support to bpf_prog_test_run_xdp()")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/af090e53-9d9b-4412-8acb-957733b3975c@I-love.SAKURA.ne.jp
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e558cca217790286e799a8baacd1610bda31b261 ]
The xdp_frame structure takes up part of the XDP frame headroom,
limiting the size of the metadata. However, in bpf_test_run, we don't
take this into account, which makes it possible for userspace to supply
a metadata size that is too large (taking up the entire headroom).
If userspace supplies such a large metadata size in live packet mode,
the xdp_update_frame_from_buff() call in xdp_test_run_init_page() call
will fail, after which packet transmission proceeds with an
uninitialised frame structure, leading to the usual Bad Stuff.
The commit in the Fixes tag fixed a related bug where the second check
in xdp_update_frame_from_buff() could fail, but did not add any
additional constraints on the metadata size. Complete the fix by adding
an additional check on the metadata size. Reorder the checks slightly to
make the logic clearer and add a comment.
Link: https://lore.kernel.org/r/fa2be179-bad7-4ee3-8668-4903d1853461@hust.edu.cn
Fixes: b6f1f780b3 ("bpf, test_run: Fix packet size check for live packet mode")
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260105114747.1358750-1-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe9544ed1a ]
To test bpf_xdp_pull_data(), an xdp packet containing fragments as well
as free linear data area after xdp->data_end needs to be created.
However, bpf_prog_test_run_xdp() always fills the linear area with
data_in before creating fragments, leaving no space to pull data. This
patch will allow users to specify the linear data size through
ctx->data_end.
Currently, ctx_in->data_end must match data_size_in and will not be the
final ctx->data_end seen by xdp programs. This is because ctx->data_end
is populated according to the xdp_buff passed to test_run. The linear
data area available in an xdp_buff, max_linear_sz, is alawys filled up
before copying data_in into fragments.
This patch will allow users to specify the size of data that goes into
the linear area. When ctx_in->data_end is different from data_size_in,
only ctx_in->data_end bytes of data will be put into the linear area when
creating the xdp_buff.
While ctx_in->data_end will be allowed to be different from data_size_in,
it cannot be larger than the data_size_in as there will be no data to
copy from user space. If it is larger than the maximum linear data area
size, the layout suggested by the user will not be honored. Data beyond
max_linear_sz bytes will still be copied into fragments.
Finally, since it is possible for a NIC to produce a xdp_buff with empty
linear data area, allow it when calling bpf_test_init() from
bpf_prog_test_run_xdp() so that we can test XDP kfuncs with such
xdp_buff. This is done by moving lower-bound check to callers as most of
them already do except bpf_prog_test_run_skb(). The change also fixes a
bug that allows passing an xdp_buff with data < ETH_HLEN. This can
happen when ctx is used and metadata is at least ETH_HLEN.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250922233356.3356453-7-ameryhung@gmail.com
Stable-dep-of: e558cca21779 ("bpf, test_run: Subtract size of xdp_frame from allowed metadata size")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7eb83bff02 ]
Change the variable naming in bpf_prog_test_run_xdp() to make the
overall logic less confusing. As different modes were added to the
function over the time, some variables got overloaded, making
it hard to understand and changing the code becomes error-prone.
Replace "size" with "linear_sz" where it refers to the size of metadata
and data. If "size" refers to input data size, use test.data_size_in
directly.
Replace "max_data_sz" with "max_linear_sz" to better reflect the fact
that it is the maximum size of metadata and data (i.e., linear_sz). Also,
xdp_rxq.frags_size is always PAGE_SIZE, so just set it directly instead
of subtracting headroom and tailroom and adding them back.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250922233356.3356453-6-ameryhung@gmail.com
Stable-dep-of: e558cca21779 ("bpf, test_run: Subtract size of xdp_frame from allowed metadata size")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4fc012daf9 ]
The bpf selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow failed on
arm64 with 64KB page:
xdp_adjust_tail/xdp_adjust_frags_tail_grow:FAIL
In bpf_prog_test_run_xdp(), the xdp->frame_sz is set to 4K, but later on
when constructing frags, with 64K page size, the frag data_len could
be more than 4K. This will cause problems in bpf_xdp_frags_increase_tail().
To fix the failure, the xdp->frame_sz is set to be PAGE_SIZE so kernel
can test different page size properly. With the kernel change, the user
space and bpf prog needs adjustment. Currently, the MAX_SKB_FRAGS default
value is 17, so for 4K page, the maximum packet size will be less than 68K.
To test 64K page, a bigger maximum packet size than 68K is desired. So two
different functions are implemented for subtest xdp_adjust_frags_tail_grow.
Depending on different page size, different data input/output sizes are used
to adapt with different page size.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035032.2207498-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: e558cca21779 ("bpf, test_run: Subtract size of xdp_frame from allowed metadata size")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c6c209ceb87f64a6ceebe61761951dcbbf4a0baa ]
I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881.
None of these RFCs have an NFS status code that match the numeric
value "11".
Based on the meaning of the EAGAIN errno, I presume the use of this
status in NFSD means NFS4ERR_DELAY. So replace the one usage of
nfserr_eagain, and remove it from NFSD's NFS status conversion
tables.
As far as I can tell, NFSERR_EAGAIN has existed since the pre-git
era, but was not actually used by any code until commit f4e44b3933
("NFSD: delay unmount source's export after inter-server copy
completed."), at which time it become possible for NFSD to return
a status code of 11 (which is not valid NFS protocol).
Fixes: f4e44b3933 ("NFSD: delay unmount source's export after inter-server copy completed.")
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4806ded4c1 ]
Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and
fs/nfs/nfs3xdr.c
Will also be used by fs/nfsd/localio.c
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Stable-dep-of: c6c209ceb87f ("NFSD: Remove NFSERR_EAGAIN")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2857bd59feb63fcf40fe4baf55401baea6b4feb4 ]
Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particularly.
We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.
nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.
However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it. For this we
add a new flag protected with nn->client_lock. It is set only while it
is safe to make client tracking calls, and v4_end_grace only schedules
work while the flag is set with the spinlock held.
So this patch adds a nfsd_net field "client_tracking_active" which is
set as described. Another field "grace_end_forced", is set when
v4_end_grace is written. After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.
This resolves a race which can result in use-after-free.
Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/T/#t
Fixes: 7f5ef2e900 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neil@brown.name>
Tested-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 830988b6cf197e6dcffdfe2008c5738e6c6c3c0f ]
If ac97_add_adapter() fails, put_device() is the correct way to drop
the device reference. kfree() is not required.
Add kfree() if idr_alloc() fails and in ac97_adapter_release() to do
the cleanup.
Found by code review.
Fixes: 74426fbff6 ("ALSA: ac97: add an ac97 bus")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Link: https://patch.msgid.link/20251219162845.657525-1-lihaoxiang@isrc.iscas.ac.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f5548c318d ]
Currently, scan_get_next_rmap_item() walks every page address in a VMA to
locate mergeable pages. This becomes highly inefficient when scanning
large virtual memory areas that contain mostly unmapped regions, causing
ksmd to use large amount of cpu without deduplicating much pages.
This patch replaces the per-address lookup with a range walk using
walk_page_range(). The range walker allows KSM to skip over entire
unmapped holes in a VMA, avoiding unnecessary lookups. This problem was
previously discussed in [1].
Consider the following test program which creates a 32 TiB mapping in the
virtual address space but only populates a single page:
/* 32 TiB */
const size_t size = 32ul * 1024 * 1024 * 1024 * 1024;
int main() {
char *area = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0);
if (area == MAP_FAILED) {
perror("mmap() failed\n");
return -1;
}
/* Populate a single page such that we get an anon_vma. */
*area = 0;
/* Enable KSM. */
madvise(area, size, MADV_MERGEABLE);
pause();
return 0;
}
$ ./ksm-sparse &
$ echo 1 > /sys/kernel/mm/ksm/run
Without this patch ksmd uses 100% of the cpu for a long time (more then 1
hour in my test machine) scanning all the 32 TiB virtual address space
that contain only one mapped page. This makes ksmd essentially deadlocked
not able to deduplicate anything of value. With this patch ksmd walks
only the one mapped page and skips the rest of the 32 TiB virtual address
space, making the scan fast using little cpu.
Link: https://lkml.kernel.org/r/20251023035841.41406-1-pedrodemargomes@gmail.com
Link: https://lkml.kernel.org/r/20251022153059.22763-1-pedrodemargomes@gmail.com
Link: https://lore.kernel.org/linux-mm/423de7a3-1c62-4e72-8e79-19a6413e420c@redhat.com/ [1]
Fixes: 31dbd01f31 ("ksm: Kernel SamePage Merging")
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: craftfever <craftfever@airmail.cc>
Closes: https://lkml.kernel.org/r/020cf8de6e773bb78ba7614ef250129f11a63781@murena.io
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ replace pmdp_get_lockless with pmd_read_atomic and pmdp_get with
READ_ONCE(*pmdp) ]
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e07cda5f23 ]
Let's add walk_page_range_vma(), which is similar to walk_page_vma(),
however, is only interested in a subset of the VMA range.
To be used in KSM code to stop using follow_page() next.
Link: https://lkml.kernel.org/r/20221021101141.84170-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: f5548c318d ("ksm: use range-walk function to jump over holes in scan_get_next_rmap_item")
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c92510f5e3f82ba11c95991824a41e59a9c5ed81 ]
arp_create() is the only dev_hard_header() caller
making assumption about skb->head being unchanged.
A recent commit broke this assumption.
Initialize @arp pointer after dev_hard_header() call.
Fixes: db5b4e39c4e6 ("ip6_gre: make ip6gre_header() robust")
Reported-by: syzbot+58b44a770a1585795351@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260107212250.384552-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b5bdabb5449b652122e43f507f73789041d4abe ]
The max buffer size of ENETC RX BD is 0xFFFF bytes, so if the PAGE_SIZE
is greater than 128K, ENETC_RXB_DMA_SIZE and ENETC_RXB_DMA_SIZE_XDP will
be greater than 0xFFFF, thus causing a build warning.
This will not cause any practical issues because ENETC is currently only
used on the ARM64 platform, and the max PAGE_SIZE is 64K. So this patch
is only for fixing the build warning that occurs when compiling ENETC
drivers for other platforms.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601050637.kHEKKOG7-lkp@intel.com/
Fixes: e59bc32df2 ("net: enetc: correct the value of ENETC_RXB_TRUESIZE")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260107091204.1980222-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit afa27621a28af317523e0836dad430bec551eb54 ]
When asynchronously writing to the device registers and if usb_submit_urb()
fail, the code fail to release allocated to this point resources.
Fixes: 323b34963d ("drivers: net: usb: pegasus: fix control urb submission")
Signed-off-by: Petko Manolov <petkan@nucleusys.com>
Link: https://patch.msgid.link/20260106084821.3746677-1-petko.manolov@konsulko.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7fabe4ad9219866c203164a214c474c95b36bf2 ]
For years I wondered why the Apple Cinema Display driver would not
just work for me. Turns out the hidraw driver instantly takes it
over. Fix by adding appledisplay VID/PIDs to hid_have_special_driver.
Fixes: 069e8a65cd ("Driver for Apple Cinema Display")
Signed-off-by: René Rebe <rene@exactco.de>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 238e03d0466239410b72294b79494e43d4fabe77 ]
When skb_segment_list() is called during packet forwarding, it handles
packets that were aggregated by the GRO engine.
Historically, the segmentation logic in skb_segment_list assumes that
individual segments are split from a parent SKB and may need to carry
their own socket memory accounting. Accordingly, the code transfers
truesize from the parent to the newly created segments.
Prior to commit ed4cccef64 ("gro: fix ownership transfer"), this
truesize subtraction in skb_segment_list() was valid because fragments
still carry a reference to the original socket.
However, commit ed4cccef64 ("gro: fix ownership transfer") changed
this behavior by ensuring that fraglist entries are explicitly
orphaned (skb->sk = NULL) to prevent illegal orphaning later in the
stack. This change meant that the entire socket memory charge remained
with the head SKB, but the corresponding accounting logic in
skb_segment_list() was never updated.
As a result, the current code unconditionally adds each fragment's
truesize to delta_truesize and subtracts it from the parent SKB. Since
the fragments are no longer charged to the socket, this subtraction
results in an effective under-count of memory when the head is freed.
This causes sk_wmem_alloc to remain non-zero, preventing socket
destruction and leading to a persistent memory leak.
The leak can be observed via KMEMLEAK when tearing down the networking
environment:
unreferenced object 0xffff8881e6eb9100 (size 2048):
comm "ping", pid 6720, jiffies 4295492526
backtrace:
kmem_cache_alloc_noprof+0x5c6/0x800
sk_prot_alloc+0x5b/0x220
sk_alloc+0x35/0xa00
inet6_create.part.0+0x303/0x10d0
__sock_create+0x248/0x640
__sys_socket+0x11b/0x1d0
Since skb_segment_list() is exclusively used for SKB_GSO_FRAGLIST
packets constructed by GRO, the truesize adjustment is removed.
The call to skb_release_head_state() must be preserved. As documented in
commit cf673ed0e0 ("net: fix fraglist segmentation reference count
leak"), it is still required to correctly drop references to SKB
extensions that may be overwritten during __copy_skb_header().
Fixes: ed4cccef64 ("gro: fix ownership transfer")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260104213101.352887-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ffeafa65b2b26df2f5b5a6118d3174f17bd12ec5 ]
Fix the max number of bits passed to find_first_zero_bit() in
bnxt_alloc_agg_idx(). We were incorrectly passing the number of
long words. find_first_zero_bit() may fail to find a zero bit and
cause a wrong ID to be used. If the wrong ID is already in use, this
can cause data corruption. Sometimes an error like this can also be
seen:
bnxt_en 0000:83:00.0 enp131s0np0: TPA end agg_buf 2 != expected agg_bufs 1
Fix it by passing the correct number of bits MAX_TPA_P5. Use
DECLARE_BITMAP() to more cleanly define the bitmap. Add a sanity
check to warn if a bit cannot be found and reset the ring [MChan].
Fixes: ec4d8e7cf0 ("bnxt_en: Add TPA ID mapping logic for 57500 chips.")
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Srijit Bose <srijit.bose@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251231083625.3911652-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fea2993aec ]
Move the reset helpers, subsequent patches will need some
of them on the Tx path.
While at it rename bnxt_sched_reset(), on more recent chips
it schedules a queue reset, instead of a fuller reset.
Link: https://lore.kernel.org/r/20230720010440.1967136-2-kuba@kernel.org
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: ffeafa65b2b2 ("bnxt_en: Fix potential data corruption with HW GRO/LRO")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92e6e0a87f6860a4710f9494f8c704d498ae60f8 ]
Commit 1f52d7b622 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
allocated memory for pp_qlt in ipc_mux_init() but did not free it in
ipc_mux_deinit(). This results in a memory leak when the driver is
unloaded.
Free the allocated memory in ipc_mux_deinit() to fix the leak.
Fixes: 1f52d7b622 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
Co-developed-by: Jianhao Xu <jianhao.xu@seu.edu.cn>
Signed-off-by: Jianhao Xu <jianhao.xu@seu.edu.cn>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20251230071853.1062223-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 144297e2a24e3e54aee1180ec21120ea38822b97 ]
Dumping module EEPROM on newer modules is supported through the netlink
interface only.
Querying with old userspace ethtool (or other tools, such as 'lshw')
which still uses the ioctl interface results in an error message that
could flood dmesg (in addition to the expected error return value).
The original message was added under the assumption that the driver
should be able to handle all module types, but now that such flows are
easily triggered from userspace, it doesn't serve its purpose.
Change the log level of the print in mlx5_query_module_eeprom() to
debug.
Fixes: bb64143eee ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20251225132717.358820-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 02d1e1a3f9239cdb3ecf2c6d365fb959d1bf39df ]
Directly increment the TSO features incurs a side effect: it will also
directly clear the flags in NETIF_F_ALL_FOR_ALL on the master device,
which can cause issues such as the inability to enable the nocache copy
feature on the bonding driver.
The fix is to include NETIF_F_ALL_FOR_ALL in the update mask, thereby
preventing it from being cleared.
Fixes: b0ce3508b2 ("bonding: allow TSO being set on bonding master")
Signed-off-by: Di Zhu <zhud@hygon.cn>
Link: https://patch.msgid.link/20251224012224.56185-1-zhud@hygon.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c0856c225b39b1def6c9a6bc56faca79550da13 ]
When the ping program uses an IPPROTO_ICMP socket to send ICMP_ECHO
messages, ICMP_MIB_OUTMSGS is counted twice.
ping_v4_sendmsg
ping_v4_push_pending_frames
ip_push_pending_frames
ip_finish_skb
__ip_make_skb
icmp_out_count(net, icmp_type); // first count
icmp_out_count(sock_net(sk), user_icmph.type); // second count
However, when the ping program uses an IPPROTO_RAW socket,
ICMP_MIB_OUTMSGS is counted correctly only once.
Therefore, the first count should be removed.
Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251224063145.3615282-1-yuan.gao@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 34f3ff52cb9fa7dbf04f5c734fcc4cb6ed5d1a95 ]
Commit 15faa1f67a ("lan966x: Fix crash when adding interface under a lag")
fixed a similar issue in the lan966x driver caused by a NULL pointer dereference.
The ocelot_set_aggr_pgids() function in the ocelot driver has similar logic
and is susceptible to the same crash.
This issue specifically affects the ocelot_vsc7514.c frontend, which leaves
unused ports as NULL pointers. The felix_vsc9959.c frontend is unaffected as
it uses the DSA framework which registers all ports.
Fix this by checking if the port pointer is valid before accessing it.
Fixes: 528d3f190c ("net: mscc: ocelot: drop the use of the "lags" array")
Signed-off-by: Jerry Wu <w.7erry@foxmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/tencent_75EF812B305E26B0869C673DD1160866C90A@qq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3128df6be147768fe536986fbb85db1d37806a9f ]
When using an 802.1ad bridge with vlan_tunnel, the C-VLAN tag is
incorrectly stripped from frames during egress processing.
br_handle_egress_vlan_tunnel() uses skb_vlan_pop() to remove the S-VLAN
from hwaccel before VXLAN encapsulation. However, skb_vlan_pop() also
moves any "next" VLAN from the payload into hwaccel:
/* move next vlan tag to hw accel tag */
__skb_vlan_pop(skb, &vlan_tci);
__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
For QinQ frames where the C-VLAN sits in the payload, this moves it to
hwaccel where it gets lost during VXLAN encapsulation.
Fix by calling __vlan_hwaccel_clear_tag() directly, which clears only
the hwaccel S-VLAN and leaves the payload untouched.
This path is only taken when vlan_tunnel is enabled and tunnel_info
is configured, so 802.1Q bridges are unaffected.
Tested with 802.1ad bridge + VXLAN vlan_tunnel, verified C-VLAN
preserved in VXLAN payload via tcpdump.
Fixes: 11538d039a ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20251228020057.2788865-1-knecht.alexandre@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a428e0da1248c353557970848994f35fd3f005e2 ]
devlink_alloc() may return NULL on allocation failure, but
prestera_devlink_alloc() unconditionally calls devlink_priv() on
the returned pointer.
This leads to a NULL pointer dereference if devlink allocation fails.
Add a check for a NULL devlink pointer and return NULL early to avoid
the crash.
Fixes: 34dd1710f5 ("net: marvell: prestera: Add basic devlink support")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Acked-by: Elad Nachman <enachman@marvell.com>
Link: https://patch.msgid.link/20251230052124.897012-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7811ba452402d58628e68faedf38745b3d485e3c ]
Currently last_gc is being updated everytime a new connection is
tracked, that means that it is updated even if a GC wasn't performed.
With a sufficiently high packet rate, it is possible to always bypass
the GC, causing the list to grow infinitely.
Update the last_gc value only when a GC has been actually performed.
Fixes: d265929930 ("netfilter: nf_conncount: reduce unnecessary GC")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d077e8119ddbb4fca67540f1a52453631a47f221 ]
In nf_tables_newrule(), if nft_use_inc() fails, the function jumps to
the err_release_rule label without freeing the allocated flow, leading
to a memory leak.
Fix this by adding a new label err_destroy_flow and jumping to it when
nft_use_inc() fails. This ensures that the flow is properly released
in this error case.
Fixes: 1689f25924 ("netfilter: nf_tables: report use refcount overflow")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 36a3200575642846a96436d503d46544533bb943 ]
During nft_synproxy eval we are reading nf_synproxy_info struct which
can be modified on update operation concurrently. As nf_synproxy_info
struct fits in 32 bits, use READ_ONCE/WRITE_ONCE annotations.
Fixes: ee394f96ad ("netfilter: nft_synproxy: add synproxy stateful object support")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c63749a7ddc59ac6ec0b05abfa0a21af9f2c1d38 ]
Add missing 'clocks' property to LAN8740Ai PHY node, to allow the PHY driver
to manage LAN8740Ai CLKIN reference clock supply. This fixes sporadic link
bouncing caused by interruptions on the PHY reference clock, by letting the
PHY driver manage the reference clock and assure there are no interruptions.
This follows the matching PHY driver recommendation described in commit
bedd8d78ab ("net: phy: smsc: LAN8710/20: add phy refclk in support")
Fixes: 8d6712695b ("arm64: dts: imx8mp: Add support for DH electronics i.MX8M Plus DHCOM and PDK2")
Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Tested-by: Christoph Niedermaier <cniedermaier@dh-electronics.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6a4eedd49ce27c16a80506c66a04707e0ee0116 ]
RTC interrupt level should be set to "LOW". This was revealed by the
introduction of commit:
f181987ef4 ("rtc: m41t80: use IRQ flags obtained from fwnode")
which changed the way IRQ type is obtained.
Fixes: 56c27310c1 ("ARM: dts: imx: Add Advantech BA-16 Qseven module")
Signed-off-by: Ian Ray <ian.ray@gehealthcare.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca643894a37a25713029b36cfe7d1bae515cac08 ]
For SD card, according to the spec requirement, for sd card power reset
operation, it need sd card supply voltage to be lower than 0.5v and keep
over 1ms, otherwise, next time power back the sd card supply voltage to
3.3v, sd card can't support SD3.0 mode again.
To match such requirement on imx8qm-mek board, add 4.8ms delay between
sd power off and power on.
Fixes: 307fd14d4b ("arm64: dts: imx: add imx8qm mek support")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 278712d20bc8ec29d1ad6ef9bdae9000ef2c220c ]
This reverts commit ab2068a6fb.
When probing the exp-attached sata device, libsas/libata will issue a
hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a
broadcast event will be received after the disk probe fails, and this
commit causes the probe will be re-executed on the disk, and a faulty
disk may get into an indefinite loop of probe.
Therefore, revert this commit, although it can fix some temporary issues
with disk probe failure.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251202065627.140361-1-yangxingui@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b4bb6daf4ac4d4560044ecdd81e93aa2f6acbb06 ]
When a W-LUN resume fails, its parent devices in the SCSI hierarchy,
including the scsi_target, may be runtime suspended. Subsequently, the
error handler in ufshcd_recover_pm_error() fails to set the W-LUN device
back to active because the parent target is not active. This results in
the following errors:
google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ...
ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000
ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5
...
ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
Address this by:
1. Ensuring the W-LUN's parent scsi_target is runtime resumed before
attempting to set the W-LUN to active within
ufshcd_recover_pm_error().
2. Explicitly checking for power.runtime_error on the HBA and W-LUN
devices before calling pm_runtime_set_active() to clear the error
state.
3. Adding pm_runtime_get_sync(hba->dev) in
ufshcd_err_handling_prepare() to ensure the HBA itself is active
during error recovery, even if a child device resume failed.
These changes ensure the device power states are managed correctly
during error recovery.
Signed-off-by: Brian Kao <powenkao@google.com>
Tested-by: Brian Kao <powenkao@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251112063214.1195761-1-powenkao@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a1237c203f1757480dc2f3b930608ee00072d3cc ]
This was reported by the KUnit tests in the later patches.
See MS-ERREF 2.3.1 STATUS_NO_DATA_DETECTED. Keep it consistent with the
value in the documentation.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>