From 308f9c47db3f94d1869f6af62f08e1938418d318 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:10:50 +0200 Subject: [PATCH 01/74] mm: page_poison: always declare __kernel_map_pages() function commit 8f14a96386b2676a1ccdd9d2f1732fbd7248fa98 upstream. The __kernel_map_pages() function is mainly used for CONFIG_DEBUG_PAGEALLOC, but has a number of architecture specific definitions that may also be used in other configurations, as well as a global fallback definition for architectures that do not support DEBUG_PAGEALLOC. When the option is disabled, any definitions without the prototype cause a warning: mm/page_poison.c:102:6: error: no previous prototype for '__kernel_map_pages' [-Werror=missing-prototypes] The function is a trivial nop here, so just declare it anyway to avoid the warning. Link: https://lkml.kernel.org/r/20230517131102.934196-3-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton Cc: Salvatore Bonaccorso Signed-off-by: Greg Kroah-Hartman --- include/linux/mm.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 97641304c455..cf0cc4a64887 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3318,13 +3318,12 @@ static inline bool debug_pagealloc_enabled_static(void) return static_branch_unlikely(&_debug_pagealloc_enabled); } -#ifdef CONFIG_DEBUG_PAGEALLOC /* * To support DEBUG_PAGEALLOC architecture must ensure that * __kernel_map_pages() never fails */ extern void __kernel_map_pages(struct page *page, int numpages, int enable); - +#ifdef CONFIG_DEBUG_PAGEALLOC static inline void debug_pagealloc_map_pages(struct page *page, int numpages) { if (debug_pagealloc_enabled_static()) From bc3c99d1fda46c9a0c18ba2bb05de716d0dfdcbc Mon Sep 17 00:00:00 2001 From: Thomas Fourier Date: Wed, 7 Jan 2026 10:01:36 +0100 Subject: [PATCH 02/74] atm: Fix dma_free_coherent() size commit 4d984b0574ff708e66152763fbfdef24ea40933f upstream. The size of the buffer is not the same when alloc'd with dma_alloc_coherent() in he_init_tpdrq() and freed. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: Signed-off-by: Thomas Fourier Link: https://patch.msgid.link/20260107090141.80900-2-fourier.thomas@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/atm/he.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/atm/he.c b/drivers/atm/he.c index ad91cc6a34fc..92a041d5387b 100644 --- a/drivers/atm/he.c +++ b/drivers/atm/he.c @@ -1587,7 +1587,8 @@ he_stop(struct he_dev *he_dev) he_dev->tbrq_base, he_dev->tbrq_phys); if (he_dev->tpdrq_base) - dma_free_coherent(&he_dev->pci_dev->dev, CONFIG_TBRQ_SIZE * sizeof(struct he_tbrq), + dma_free_coherent(&he_dev->pci_dev->dev, + CONFIG_TPDRQ_SIZE * sizeof(struct he_tpdrq), he_dev->tpdrq_base, he_dev->tpdrq_phys); dma_pool_destroy(he_dev->tpd_pool); From 606872c8e8bf96066730f6a2317502c5633c37f1 Mon Sep 17 00:00:00 2001 From: Thomas Fourier Date: Tue, 6 Jan 2026 10:47:21 +0100 Subject: [PATCH 03/74] net: 3com: 3c59x: fix possible null dereference in vortex_probe1() commit a4e305ed60f7c41bbf9aabc16dd75267194e0de3 upstream. pdev can be null and free_ring: can be called in 1297 with a null pdev. Fixes: 55c82617c3e8 ("3c59x: convert to generic DMA API") Cc: Signed-off-by: Thomas Fourier Link: https://patch.msgid.link/20260106094731.25819-2-fourier.thomas@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/3com/3c59x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/3com/3c59x.c b/drivers/net/ethernet/3com/3c59x.c index 082388bb6169..4a843c2ce111 100644 --- a/drivers/net/ethernet/3com/3c59x.c +++ b/drivers/net/ethernet/3com/3c59x.c @@ -1473,7 +1473,7 @@ static int vortex_probe1(struct device *gendev, void __iomem *ioaddr, int irq, return 0; free_ring: - dma_free_coherent(&pdev->dev, + dma_free_coherent(gendev, sizeof(struct boom_rx_desc) * RX_RING_SIZE + sizeof(struct boom_tx_desc) * TX_RING_SIZE, vp->rx_ring, vp->rx_ring_dma); From c7f0207db68d5a1b4af23acbef1a8e8ddc431ebb Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Thu, 11 Dec 2025 15:06:26 +0000 Subject: [PATCH 04/74] btrfs: always detect conflicting inodes when logging inode refs commit 7ba0b6461bc4edb3005ea6e00cdae189bcf908a5 upstream. After rename exchanging (either with the rename exchange operation or regular renames in multiple non-atomic steps) two inodes and at least one of them is a directory, we can end up with a log tree that contains only of the inodes and after a power failure that can result in an attempt to delete the other inode when it should not because it was not deleted before the power failure. In some case that delete attempt fails when the target inode is a directory that contains a subvolume inside it, since the log replay code is not prepared to deal with directory entries that point to root items (only inode items). 1) We have directories "dir1" (inode A) and "dir2" (inode B) under the same parent directory; 2) We have a file (inode C) under directory "dir1" (inode A); 3) We have a subvolume inside directory "dir2" (inode B); 4) All these inodes were persisted in a past transaction and we are currently at transaction N; 5) We rename the file (inode C), so at btrfs_log_new_name() we update inode C's last_unlink_trans to N; 6) We get a rename exchange for "dir1" (inode A) and "dir2" (inode B), so after the exchange "dir1" is inode B and "dir2" is inode A. During the rename exchange we call btrfs_log_new_name() for inodes A and B, but because they are directories, we don't update their last_unlink_trans to N; 7) An fsync against the file (inode C) is done, and because its inode has a last_unlink_trans with a value of N we log its parent directory (inode A) (through btrfs_log_all_parents(), called from btrfs_log_inode_parent()). 8) So we end up with inode B not logged, which now has the old name of inode A. At copy_inode_items_to_log(), when logging inode A, we did not check if we had any conflicting inode to log because inode A has a generation lower than the current transaction (created in a past transaction); 9) After a power failure, when replaying the log tree, since we find that inode A has a new name that conflicts with the name of inode B in the fs tree, we attempt to delete inode B... this is wrong since that directory was never deleted before the power failure, and because there is a subvolume inside that directory, attempting to delete it will fail since replay_dir_deletes() and btrfs_unlink_inode() are not prepared to deal with dir items that point to roots instead of inodes. When that happens the mount fails and we get a stack trace like the following: [87.2314] BTRFS info (device dm-0): start tree-log replay [87.2318] BTRFS critical (device dm-0): failed to delete reference to subvol, root 5 inode 256 parent 259 [87.2332] ------------[ cut here ]------------ [87.2338] BTRFS: Transaction aborted (error -2) [87.2346] WARNING: CPU: 1 PID: 638968 at fs/btrfs/inode.c:4345 __btrfs_unlink_inode+0x416/0x440 [btrfs] [87.2368] Modules linked in: btrfs loop dm_thin_pool (...) [87.2470] CPU: 1 UID: 0 PID: 638968 Comm: mount Tainted: G W 6.18.0-rc7-btrfs-next-218+ #2 PREEMPT(full) [87.2489] Tainted: [W]=WARN [87.2494] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [87.2514] RIP: 0010:__btrfs_unlink_inode+0x416/0x440 [btrfs] [87.2538] Code: c0 89 04 24 (...) [87.2568] RSP: 0018:ffffc0e741f4b9b8 EFLAGS: 00010286 [87.2574] RAX: 0000000000000000 RBX: ffff9d3ec8a6cf60 RCX: 0000000000000000 [87.2582] RDX: 0000000000000002 RSI: ffffffff84ab45a1 RDI: 00000000ffffffff [87.2591] RBP: ffff9d3ec8a6ef20 R08: 0000000000000000 R09: ffffc0e741f4b840 [87.2599] R10: ffff9d45dc1fffa8 R11: 0000000000000003 R12: ffff9d3ee26d77e0 [87.2608] R13: ffffc0e741f4ba98 R14: ffff9d4458040800 R15: ffff9d44b6b7ca10 [87.2618] FS: 00007f7b9603a840(0000) GS:ffff9d4658982000(0000) knlGS:0000000000000000 [87.2629] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [87.2637] CR2: 00007ffc9ec33b98 CR3: 000000011273e003 CR4: 0000000000370ef0 [87.2648] Call Trace: [87.2651] [87.2654] btrfs_unlink_inode+0x15/0x40 [btrfs] [87.2661] unlink_inode_for_log_replay+0x27/0xf0 [btrfs] [87.2669] check_item_in_log+0x1ea/0x2c0 [btrfs] [87.2676] replay_dir_deletes+0x16b/0x380 [btrfs] [87.2684] fixup_inode_link_count+0x34b/0x370 [btrfs] [87.2696] fixup_inode_link_counts+0x41/0x160 [btrfs] [87.2703] btrfs_recover_log_trees+0x1ff/0x7c0 [btrfs] [87.2711] ? __pfx_replay_one_buffer+0x10/0x10 [btrfs] [87.2719] open_ctree+0x10bb/0x15f0 [btrfs] [87.2726] btrfs_get_tree.cold+0xb/0x16c [btrfs] [87.2734] ? fscontext_read+0x15c/0x180 [87.2740] ? rw_verify_area+0x50/0x180 [87.2746] vfs_get_tree+0x25/0xd0 [87.2750] vfs_cmd_create+0x59/0xe0 [87.2755] __do_sys_fsconfig+0x4f6/0x6b0 [87.2760] do_syscall_64+0x50/0x1220 [87.2764] entry_SYSCALL_64_after_hwframe+0x76/0x7e [87.2770] RIP: 0033:0x7f7b9625f4aa [87.2775] Code: 73 01 c3 48 (...) [87.2803] RSP: 002b:00007ffc9ec35b08 EFLAGS: 00000246 ORIG_RAX: 00000000000001af [87.2817] RAX: ffffffffffffffda RBX: 0000558bfa91ac20 RCX: 00007f7b9625f4aa [87.2829] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 [87.2842] RBP: 0000558bfa91b120 R08: 0000000000000000 R09: 0000000000000000 [87.2854] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [87.2864] R13: 00007f7b963f1580 R14: 00007f7b963f326c R15: 00007f7b963d8a23 [87.2877] [87.2882] ---[ end trace 0000000000000000 ]--- [87.2891] BTRFS: error (device dm-0 state A) in __btrfs_unlink_inode:4345: errno=-2 No such entry [87.2904] BTRFS: error (device dm-0 state EAO) in do_abort_log_replay:191: errno=-2 No such entry [87.2915] BTRFS critical (device dm-0 state EAO): log tree (for root 5) leaf currently being processed (slot 7 key (258 12 257)): [87.2929] BTRFS info (device dm-0 state EAO): leaf 30736384 gen 10 total ptrs 7 free space 15712 owner 18446744073709551610 [87.2929] BTRFS info (device dm-0 state EAO): refs 3 lock_owner 0 current 638968 [87.2929] item 0 key (257 INODE_ITEM 0) itemoff 16123 itemsize 160 [87.2929] inode generation 9 transid 10 size 0 nbytes 0 [87.2929] block group 0 mode 40755 links 1 uid 0 gid 0 [87.2929] rdev 0 sequence 7 flags 0x0 [87.2929] atime 1765464494.678070921 [87.2929] ctime 1765464494.686606513 [87.2929] mtime 1765464494.686606513 [87.2929] otime 1765464494.678070921 [87.2929] item 1 key (257 INODE_REF 256) itemoff 16109 itemsize 14 [87.2929] index 4 name_len 4 [87.2929] item 2 key (257 DIR_LOG_INDEX 2) itemoff 16101 itemsize 8 [87.2929] dir log end 2 [87.2929] item 3 key (257 DIR_LOG_INDEX 3) itemoff 16093 itemsize 8 [87.2929] dir log end 18446744073709551615 [87.2930] item 4 key (257 DIR_INDEX 3) itemoff 16060 itemsize 33 [87.2930] location key (258 1 0) type 1 [87.2930] transid 10 data_len 0 name_len 3 [87.2930] item 5 key (258 INODE_ITEM 0) itemoff 15900 itemsize 160 [87.2930] inode generation 9 transid 10 size 0 nbytes 0 [87.2930] block group 0 mode 100644 links 1 uid 0 gid 0 [87.2930] rdev 0 sequence 2 flags 0x0 [87.2930] atime 1765464494.678456467 [87.2930] ctime 1765464494.686606513 [87.2930] mtime 1765464494.678456467 [87.2930] otime 1765464494.678456467 [87.2930] item 6 key (258 INODE_REF 257) itemoff 15887 itemsize 13 [87.2930] index 3 name_len 3 [87.2930] BTRFS critical (device dm-0 state EAO): log replay failed in unlink_inode_for_log_replay:1045 for root 5, stage 3, with error -2: failed to unlink inode 256 parent dir 259 name subvol root 5 [87.2963] BTRFS: error (device dm-0 state EAO) in btrfs_recover_log_trees:7743: errno=-2 No such entry [87.2981] BTRFS: error (device dm-0 state EAO) in btrfs_replay_log:2083: errno=-2 No such entry (Failed to recover log tr So fix this by changing copy_inode_items_to_log() to always detect if there are conflicting inodes for the ref/extref of the inode being logged even if the inode was created in a past transaction. A test case for fstests will follow soon. CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/tree-log.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 0d2acf292af7..ceb2a9556fe7 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5940,10 +5940,8 @@ again: * and no keys greater than that, so bail out. */ break; - } else if ((min_key->type == BTRFS_INODE_REF_KEY || - min_key->type == BTRFS_INODE_EXTREF_KEY) && - (inode->generation == trans->transid || - ctx->logging_conflict_inodes)) { + } else if (min_key->type == BTRFS_INODE_REF_KEY || + min_key->type == BTRFS_INODE_EXTREF_KEY) { u64 other_ino = 0; u64 other_parent = 0; From 3be4d2f6213b43dced80b7e0be7bf246279ba242 Mon Sep 17 00:00:00 2001 From: Alexander Usyskin Date: Mon, 15 Dec 2025 12:59:15 +0200 Subject: [PATCH 05/74] mei: me: add nova lake point S DID commit 420f423defcf6d0af2263d38da870ca4a20c0990 upstream. Add Nova Lake S device id. Cc: stable Co-developed-by: Tomas Winkler Signed-off-by: Tomas Winkler Signed-off-by: Alexander Usyskin Link: https://patch.msgid.link/20251215105915.1672659-1-alexander.usyskin@intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/misc/mei/hw-me-regs.h | 2 ++ drivers/misc/mei/pci-me.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/misc/mei/hw-me-regs.h b/drivers/misc/mei/hw-me-regs.h index fcfc6c7e6dc8..1992784faa09 100644 --- a/drivers/misc/mei/hw-me-regs.h +++ b/drivers/misc/mei/hw-me-regs.h @@ -122,6 +122,8 @@ #define MEI_DEV_ID_WCL_P 0x4D70 /* Wildcat Lake P */ +#define MEI_DEV_ID_NVL_S 0x6E68 /* Nova Lake Point S */ + /* * MEI HW Section */ diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c index 98fae8d0f2eb..ecb146ecc23c 100644 --- a/drivers/misc/mei/pci-me.c +++ b/drivers/misc/mei/pci-me.c @@ -129,6 +129,8 @@ static const struct pci_device_id mei_me_pci_tbl[] = { {MEI_PCI_DEVICE(MEI_DEV_ID_WCL_P, MEI_ME_PCH15_CFG)}, + {MEI_PCI_DEVICE(MEI_DEV_ID_NVL_S, MEI_ME_PCH15_CFG)}, + /* required last entry */ {0, } }; From 185002b7b9d68382318d072e698ed6fe5d3f0645 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 6 Jan 2026 21:20:23 -0800 Subject: [PATCH 06/74] lib/crypto: aes: Fix missing MMU protection for AES S-box commit 74d74bb78aeccc9edc10db216d6be121cf7ec176 upstream. __cacheline_aligned puts the data in the ".data..cacheline_aligned" section, which isn't marked read-only i.e. it doesn't receive MMU protection. Replace it with ____cacheline_aligned which does the right thing and just aligns the data while keeping it in ".rodata". Fixes: b5e0b032b6c3 ("crypto: aes - add generic time invariant AES cipher") Cc: stable@vger.kernel.org Reported-by: Qingfang Deng Closes: https://lore.kernel.org/r/20260105074712.498-1-dqfext@gmail.com/ Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260107052023.174620-1-ebiggers@kernel.org Signed-off-by: Eric Biggers Signed-off-by: Greg Kroah-Hartman --- lib/crypto/aes.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/crypto/aes.c b/lib/crypto/aes.c index 827fe89922ff..59a1d964497d 100644 --- a/lib/crypto/aes.c +++ b/lib/crypto/aes.c @@ -12,7 +12,7 @@ * Emit the sbox as volatile const to prevent the compiler from doing * constant folding on sbox references involving fixed indexes. */ -static volatile const u8 __cacheline_aligned aes_sbox[] = { +static volatile const u8 ____cacheline_aligned aes_sbox[] = { 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76, 0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0, @@ -47,7 +47,7 @@ static volatile const u8 __cacheline_aligned aes_sbox[] = { 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16, }; -static volatile const u8 __cacheline_aligned aes_inv_sbox[] = { +static volatile const u8 ____cacheline_aligned aes_inv_sbox[] = { 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38, 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb, 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87, From 51d2e5d6491447258cb39ff1deb93df15d3c23cb Mon Sep 17 00:00:00 2001 From: Alexander Sverdlin Date: Tue, 18 Nov 2025 09:35:48 +0100 Subject: [PATCH 07/74] counter: interrupt-cnt: Drop IRQF_NO_THREAD flag commit 23f9485510c338476b9735d516c1d4aacb810d46 upstream. An IRQ handler can either be IRQF_NO_THREAD or acquire spinlock_t, as CONFIG_PROVE_RAW_LOCK_NESTING warns: ============================= [ BUG: Invalid wait context ] 6.18.0-rc1+git... #1 ----------------------------- some-user-space-process/1251 is trying to lock: (&counter->events_list_lock){....}-{3:3}, at: counter_push_event [counter] other info that might help us debug this: context-{2:2} no locks held by some-user-space-process/.... stack backtrace: CPU: 0 UID: 0 PID: 1251 Comm: some-user-space-process 6.18.0-rc1+git... #1 PREEMPT Call trace: show_stack (C) dump_stack_lvl dump_stack __lock_acquire lock_acquire _raw_spin_lock_irqsave counter_push_event [counter] interrupt_cnt_isr [interrupt_cnt] __handle_irq_event_percpu handle_irq_event handle_simple_irq handle_irq_desc generic_handle_domain_irq gpio_irq_handler handle_irq_desc generic_handle_domain_irq gic_handle_irq call_on_irq_stack do_interrupt_handler el0_interrupt __el0_irq_handler_common el0t_64_irq_handler el0t_64_irq ... and Sebastian correctly points out. Remove IRQF_NO_THREAD as an alternative to switching to raw_spinlock_t, because the latter would limit all potential nested locks to raw_spinlock_t only. Cc: Sebastian Andrzej Siewior Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20251117151314.xwLAZrWY@linutronix.de/ Fixes: a55ebd47f21f ("counter: add IRQ or GPIO based counter") Signed-off-by: Alexander Sverdlin Reviewed-by: Sebastian Andrzej Siewior Reviewed-by: Oleksij Rempel Link: https://lore.kernel.org/r/20251118083603.778626-1-alexander.sverdlin@siemens.com Signed-off-by: William Breathitt Gray Signed-off-by: Greg Kroah-Hartman --- drivers/counter/interrupt-cnt.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/counter/interrupt-cnt.c b/drivers/counter/interrupt-cnt.c index bc762ba87a19..2a8259f0c6be 100644 --- a/drivers/counter/interrupt-cnt.c +++ b/drivers/counter/interrupt-cnt.c @@ -229,8 +229,7 @@ static int interrupt_cnt_probe(struct platform_device *pdev) irq_set_status_flags(priv->irq, IRQ_NOAUTOEN); ret = devm_request_irq(dev, priv->irq, interrupt_cnt_isr, - IRQF_TRIGGER_RISING | IRQF_NO_THREAD, - dev_name(dev), counter); + IRQF_TRIGGER_RISING, dev_name(dev), counter); if (ret) return ret; From 69e026b08367fcdf5de8fd54673d021391380038 Mon Sep 17 00:00:00 2001 From: Miaoqian Lin Date: Thu, 11 Dec 2025 16:33:44 +0400 Subject: [PATCH 08/74] drm/pl111: Fix error handling in pl111_amba_probe commit 0ddd3bb4b14c9102c0267b3fd916c81fe5ab89c1 upstream. Jump to the existing dev_put label when devm_request_irq() fails so drm_dev_put() and of_reserved_mem_device_release() run instead of returning early and leaking resources. Found via static analysis and code review. Fixes: bed41005e617 ("drm/pl111: Initial drm/kms driver for pl111") Cc: stable@vger.kernel.org Signed-off-by: Miaoqian Lin Reviewed-by: Javier Martinez Canillas Signed-off-by: Linus Walleij Link: https://patch.msgid.link/20251211123345.2392065-1-linmq006@gmail.com Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/pl111/pl111_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/pl111/pl111_drv.c b/drivers/gpu/drm/pl111/pl111_drv.c index eb25eedb5ee0..f85e45e51698 100644 --- a/drivers/gpu/drm/pl111/pl111_drv.c +++ b/drivers/gpu/drm/pl111/pl111_drv.c @@ -297,7 +297,7 @@ static int pl111_amba_probe(struct amba_device *amba_dev, variant->name, priv); if (ret != 0) { dev_err(dev, "%s failed irq %d\n", __func__, ret); - return ret; + goto dev_put; } ret = pl111_modeset_init(drm); From 0f2fb0791a6b85b8c3ea8f36167740a394905df7 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Tue, 6 Jan 2026 10:00:11 +0100 Subject: [PATCH 09/74] gpio: rockchip: mark the GPIO controller as sleeping commit 20cf2aed89ac6d78a0122e31c875228e15247194 upstream. The GPIO controller is configured as non-sleeping but it uses generic pinctrl helpers which use a mutex for synchronization. This can cause the following lockdep splat with shared GPIOs enabled on boards which have multiple devices using the same GPIO: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 12, name: kworker/u16:0 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 6 locks held by kworker/u16:0/12: #0: ffff0001f0018d48 ((wq_completion)events_unbound#2){+.+.}-{0:0}, at: process_one_work+0x18c/0x604 #1: ffff8000842dbdf0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1b4/0x604 #2: ffff0001f18498f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x38/0x1b0 #3: ffff0001f75f1e90 (&gdev->srcu){.+.?}-{0:0}, at: gpiod_direction_output_raw_commit+0x0/0x360 #4: ffff0001f46e3db8 (&shared_desc->spinlock){....}-{3:3}, at: gpio_shared_proxy_direction_output+0xd0/0x144 [gpio_shared_proxy] #5: ffff0001f180ee90 (&gdev->srcu){.+.?}-{0:0}, at: gpiod_direction_output_raw_commit+0x0/0x360 irq event stamp: 81450 hardirqs last enabled at (81449): [] _raw_spin_unlock_irqrestore+0x74/0x78 hardirqs last disabled at (81450): [] _raw_spin_lock_irqsave+0x84/0x88 softirqs last enabled at (79616): [] __alloc_skb+0x17c/0x1e8 softirqs last disabled at (79614): [] __alloc_skb+0x17c/0x1e8 CPU: 2 UID: 0 PID: 12 Comm: kworker/u16:0 Not tainted 6.19.0-rc4-next-20260105+ #11975 PREEMPT Hardware name: Hardkernel ODROID-M1 (DT) Workqueue: events_unbound deferred_probe_work_func Call trace: show_stack+0x18/0x24 (C) dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x24 __might_resched+0x144/0x248 __might_sleep+0x48/0x98 __mutex_lock+0x5c/0x894 mutex_lock_nested+0x24/0x30 pinctrl_get_device_gpio_range+0x44/0x128 pinctrl_gpio_direction+0x3c/0xe0 pinctrl_gpio_direction_output+0x14/0x20 rockchip_gpio_direction_output+0xb8/0x19c gpiochip_direction_output+0x38/0x94 gpiod_direction_output_raw_commit+0x1d8/0x360 gpiod_direction_output_nonotify+0x7c/0x230 gpiod_direction_output+0x34/0xf8 gpio_shared_proxy_direction_output+0xec/0x144 [gpio_shared_proxy] gpiochip_direction_output+0x38/0x94 gpiod_direction_output_raw_commit+0x1d8/0x360 gpiod_direction_output_nonotify+0x7c/0x230 gpiod_configure_flags+0xbc/0x480 gpiod_find_and_request+0x1a0/0x574 gpiod_get_index+0x58/0x84 devm_gpiod_get_index+0x20/0xb4 devm_gpiod_get_optional+0x18/0x30 rockchip_pcie_probe+0x98/0x380 platform_probe+0x5c/0xac really_probe+0xbc/0x298 Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio") Cc: stable@vger.kernel.org Reported-by: Marek Szyprowski Closes: https://lore.kernel.org/all/d035fc29-3b03-4cd6-b8ec-001f93540bc6@samsung.com/ Acked-by: Heiko Stuebner Link: https://lore.kernel.org/r/20260106090011.21603-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman --- drivers/gpio/gpio-rockchip.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpio/gpio-rockchip.c b/drivers/gpio/gpio-rockchip.c index 3c1e303aaca8..3c1abb02f562 100644 --- a/drivers/gpio/gpio-rockchip.c +++ b/drivers/gpio/gpio-rockchip.c @@ -584,6 +584,7 @@ static int rockchip_gpiolib_register(struct rockchip_pin_bank *bank) gc->ngpio = bank->nr_pins; gc->label = bank->name; gc->parent = bank->dev; + gc->can_sleep = true; ret = gpiochip_add_data(gc, bank); if (ret) { From 442ceac0393185e9982323f6682a52a53e8462b1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 8 Jan 2026 10:19:27 +0000 Subject: [PATCH 10/74] wifi: avoid kernel-infoleak from struct iw_point commit 21cbf883d073abbfe09e3924466aa5e0449e7261 upstream. struct iw_point has a 32bit hole on 64bit arches. struct iw_point { void __user *pointer; /* Pointer to the data (in user space) */ __u16 length; /* number of fields or size in bytes */ __u16 flags; /* Optional params */ }; Make sure to zero the structure to avoid disclosing 32bits of kernel data to user space. Fixes: 87de87d5e47f ("wext: Dispatch and handle compat ioctls entirely in net/wireless/wext.c") Reported-by: syzbot+bfc7323743ca6dbcc3d3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/695f83f3.050a0220.1c677c.0392.GAE@google.com/T/#u Signed-off-by: Eric Dumazet Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260108101927.857582-1-edumazet@google.com Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman --- net/wireless/wext-core.c | 4 ++++ net/wireless/wext-priv.c | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/net/wireless/wext-core.c b/net/wireless/wext-core.c index 8a4b85f96a13..f2a332516455 100644 --- a/net/wireless/wext-core.c +++ b/net/wireless/wext-core.c @@ -1084,6 +1084,10 @@ static int compat_standard_call(struct net_device *dev, return ioctl_standard_call(dev, iwr, cmd, info, handler); iwp_compat = (struct compat_iw_point *) &iwr->u.data; + + /* struct iw_point has a 32bit hole on 64bit arches. */ + memset(&iwp, 0, sizeof(iwp)); + iwp.pointer = compat_ptr(iwp_compat->pointer); iwp.length = iwp_compat->length; iwp.flags = iwp_compat->flags; diff --git a/net/wireless/wext-priv.c b/net/wireless/wext-priv.c index 674d426a9d24..37d1147019c2 100644 --- a/net/wireless/wext-priv.c +++ b/net/wireless/wext-priv.c @@ -228,6 +228,10 @@ int compat_private_call(struct net_device *dev, struct iwreq *iwr, struct iw_point iwp; iwp_compat = (struct compat_iw_point *) &iwr->u.data; + + /* struct iw_point has a 32bit hole on 64bit arches. */ + memset(&iwp, 0, sizeof(iwp)); + iwp.pointer = compat_ptr(iwp_compat->pointer); iwp.length = iwp_compat->length; iwp.flags = iwp_compat->flags; From 79fe3511db416d2f2edcfd93569807cb02736e5e Mon Sep 17 00:00:00 2001 From: ziming zhang Date: Thu, 11 Dec 2025 16:52:58 +0800 Subject: [PATCH 11/74] libceph: prevent potential out-of-bounds reads in handle_auth_done() commit 818156caffbf55cb4d368f9c3cac64e458fb49c9 upstream. Perform an explicit bounds check on payload_len to avoid a possible out-of-bounds access in the callout. [ idryomov: changelog ] Cc: stable@vger.kernel.org Signed-off-by: ziming zhang Reviewed-by: Ilya Dryomov Signed-off-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman --- net/ceph/messenger_v2.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c index 489baf2e6176..787cde055ab1 100644 --- a/net/ceph/messenger_v2.c +++ b/net/ceph/messenger_v2.c @@ -2187,7 +2187,9 @@ static int process_auth_done(struct ceph_connection *con, void *p, void *end) ceph_decode_64_safe(&p, end, global_id, bad); ceph_decode_32_safe(&p, end, con->v2.con_mode, bad); + ceph_decode_32_safe(&p, end, payload_len, bad); + ceph_decode_need(&p, end, payload_len, bad); dout("%s con %p global_id %llu con_mode %d payload_len %d\n", __func__, con, global_id, con->v2.con_mode, payload_len); From 6afd2a4213524bc742b709599a3663aeaf77193c Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Mon, 15 Dec 2025 11:53:31 +0100 Subject: [PATCH 12/74] libceph: replace overzealous BUG_ON in osdmap_apply_incremental() commit e00c3f71b5cf75681dbd74ee3f982a99cb690c2b upstream. If the osdmap is (maliciously) corrupted such that the incremental osdmap epoch is different from what is expected, there is no need to BUG. Instead, just declare the incremental osdmap to be invalid. Cc: stable@vger.kernel.org Reported-by: ziming zhang Signed-off-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman --- net/ceph/osdmap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c index f5f60deb680a..0722e9347a64 100644 --- a/net/ceph/osdmap.c +++ b/net/ceph/osdmap.c @@ -1979,11 +1979,13 @@ struct ceph_osdmap *osdmap_apply_incremental(void **p, void *end, bool msgr2, sizeof(u64) + sizeof(u32), e_inval); ceph_decode_copy(p, &fsid, sizeof(fsid)); epoch = ceph_decode_32(p); - BUG_ON(epoch != map->epoch+1); ceph_decode_copy(p, &modified, sizeof(modified)); new_pool_max = ceph_decode_64(p); new_flags = ceph_decode_32(p); + if (epoch != map->epoch + 1) + goto e_inval; + /* full map? */ ceph_decode_32_safe(p, end, len, e_inval); if (len > 0) { From ec1850f663da64842614c86b20fe734be070c2ba Mon Sep 17 00:00:00 2001 From: Tuo Li Date: Sun, 21 Dec 2025 02:11:49 +0800 Subject: [PATCH 13/74] libceph: make free_choose_arg_map() resilient to partial allocation commit e3fe30e57649c551757a02e1cad073c47e1e075e upstream. free_choose_arg_map() may dereference a NULL pointer if its caller fails after a partial allocation. For example, in decode_choose_args(), if allocation of arg_map->args fails, execution jumps to the fail label and free_choose_arg_map() is called. Since arg_map->size is updated to a non-zero value before memory allocation, free_choose_arg_map() will iterate over arg_map->args and dereference a NULL pointer. To prevent this potential NULL pointer dereference and make free_choose_arg_map() more resilient, add checks for pointers before iterating. Cc: stable@vger.kernel.org Co-authored-by: Ilya Dryomov Signed-off-by: Tuo Li Reviewed-by: Viacheslav Dubeyko Signed-off-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman --- net/ceph/osdmap.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c index 0722e9347a64..7c76eb9d6cee 100644 --- a/net/ceph/osdmap.c +++ b/net/ceph/osdmap.c @@ -241,22 +241,26 @@ static struct crush_choose_arg_map *alloc_choose_arg_map(void) static void free_choose_arg_map(struct crush_choose_arg_map *arg_map) { - if (arg_map) { - int i, j; + int i, j; - WARN_ON(!RB_EMPTY_NODE(&arg_map->node)); + if (!arg_map) + return; + WARN_ON(!RB_EMPTY_NODE(&arg_map->node)); + + if (arg_map->args) { for (i = 0; i < arg_map->size; i++) { struct crush_choose_arg *arg = &arg_map->args[i]; - - for (j = 0; j < arg->weight_set_size; j++) - kfree(arg->weight_set[j].weights); - kfree(arg->weight_set); + if (arg->weight_set) { + for (j = 0; j < arg->weight_set_size; j++) + kfree(arg->weight_set[j].weights); + kfree(arg->weight_set); + } kfree(arg->ids); } kfree(arg_map->args); - kfree(arg_map); } + kfree(arg_map); } DEFINE_RB_FUNCS(choose_arg_map, struct crush_choose_arg_map, choose_args_index, From 33908769248b38a5e77cf9292817bb28e641992d Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Mon, 29 Dec 2025 15:14:48 +0100 Subject: [PATCH 14/74] libceph: return the handler error from mon_handle_auth_done() commit e84b48d31b5008932c0a0902982809fbaa1d3b70 upstream. Currently any error from ceph_auth_handle_reply_done() is propagated via finish_auth() but isn't returned from mon_handle_auth_done(). This results in higher layers learning that (despite the monitor considering us to be successfully authenticated) something went wrong in the authentication phase and reacting accordingly, but msgr2 still trying to proceed with establishing the session in the background. In the case of secure mode this can trigger a WARN in setup_crypto() and later lead to a NULL pointer dereference inside of prepare_auth_signature(). Cc: stable@vger.kernel.org Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Signed-off-by: Ilya Dryomov Reviewed-by: Viacheslav Dubeyko Signed-off-by: Greg Kroah-Hartman --- net/ceph/mon_client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ceph/mon_client.c b/net/ceph/mon_client.c index 2cf1254fd452..4f80d586fddc 100644 --- a/net/ceph/mon_client.c +++ b/net/ceph/mon_client.c @@ -1417,7 +1417,7 @@ static int mon_handle_auth_done(struct ceph_connection *con, if (!ret) finish_hunting(monc); mutex_unlock(&monc->mutex); - return 0; + return ret; } static int mon_handle_auth_bad_method(struct ceph_connection *con, From 4d3399c52e0e61720ae898f5a0b5b75d4460ae24 Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Mon, 5 Jan 2026 19:23:19 +0100 Subject: [PATCH 15/74] libceph: make calc_target() set t->paused, not just clear it commit c0fe2994f9a9d0a2ec9e42441ea5ba74b6a16176 upstream. Currently calc_target() clears t->paused if the request shouldn't be paused anymore, but doesn't ever set t->paused even though it's able to determine when the request should be paused. Setting t->paused is left to __submit_request() which is fine for regular requests but doesn't work for linger requests -- since __submit_request() doesn't operate on linger requests, there is nowhere for lreq->t.paused to be set. One consequence of this is that watches don't get reestablished on paused -> unpaused transitions in cases where requests have been paused long enough for the (paused) unwatch request to time out and for the subsequent (re)watch request to enter the paused state. On top of the watch not getting reestablished, rbd_reregister_watch() gets stuck with rbd_dev->watch_mutex held: rbd_register_watch __rbd_register_watch ceph_osdc_watch linger_reg_commit_wait It's waiting for lreq->reg_commit_wait to be completed, but for that to happen the respective request needs to end up on need_resend_linger list and be kicked when requests are unpaused. There is no chance for that if the request in question is never marked paused in the first place. The fact that rbd_dev->watch_mutex remains taken out forever then prevents the image from getting unmapped -- "rbd unmap" would inevitably hang in D state on an attempt to grab the mutex. Cc: stable@vger.kernel.org Reported-by: Raphael Zimmer Signed-off-by: Ilya Dryomov Reviewed-by: Viacheslav Dubeyko Signed-off-by: Greg Kroah-Hartman --- net/ceph/osd_client.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index c22bb06b450e..a4ec9f61ba04 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -1529,6 +1529,7 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, struct ceph_pg_pool_info *pi; struct ceph_pg pgid, last_pgid; struct ceph_osds up, acting; + bool should_be_paused; bool is_read = t->flags & CEPH_OSD_FLAG_READ; bool is_write = t->flags & CEPH_OSD_FLAG_WRITE; bool force_resend = false; @@ -1597,10 +1598,16 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, &last_pgid)) force_resend = true; - if (t->paused && !target_should_be_paused(osdc, t, pi)) { - t->paused = false; + should_be_paused = target_should_be_paused(osdc, t, pi); + if (t->paused && !should_be_paused) { unpaused = true; } + if (t->paused != should_be_paused) { + dout("%s t %p paused %d -> %d\n", __func__, t, t->paused, + should_be_paused); + t->paused = should_be_paused; + } + legacy_change = ceph_pg_compare(&t->pgid, &pgid) || ceph_osds_changed(&t->acting, &acting, t->used_replica || any_change); From 5a1cf574607441886a85f2dbc4bd44d82dd0333d Mon Sep 17 00:00:00 2001 From: Ye Bin Date: Fri, 9 Jan 2026 16:23:13 +0100 Subject: [PATCH 16/74] ext4: introduce ITAIL helper MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 69f3a3039b0d0003de008659cafd5a1eaaa0a7a4 ] Introduce ITAIL helper to get the bound of xattr in inode. Signed-off-by: Ye Bin Reviewed-by: Jan Kara Link: https://patch.msgid.link/20250208063141.1539283-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: David Nyström Signed-off-by: Greg Kroah-Hartman --- fs/ext4/xattr.c | 10 +++++----- fs/ext4/xattr.h | 3 +++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 7bc47d9b97f9..4217363906e0 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -599,7 +599,7 @@ ext4_xattr_ibody_get(struct inode *inode, int name_index, const char *name, return error; raw_inode = ext4_raw_inode(&iloc); header = IHDR(inode, raw_inode); - end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; + end = ITAIL(inode, raw_inode); error = xattr_check_inode(inode, header, end); if (error) goto cleanup; @@ -744,7 +744,7 @@ ext4_xattr_ibody_list(struct dentry *dentry, char *buffer, size_t buffer_size) return error; raw_inode = ext4_raw_inode(&iloc); header = IHDR(inode, raw_inode); - end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; + end = ITAIL(inode, raw_inode); error = xattr_check_inode(inode, header, end); if (error) goto cleanup; @@ -830,7 +830,7 @@ int ext4_get_inode_usage(struct inode *inode, qsize_t *usage) goto out; raw_inode = ext4_raw_inode(&iloc); header = IHDR(inode, raw_inode); - end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; + end = ITAIL(inode, raw_inode); ret = xattr_check_inode(inode, header, end); if (ret) goto out; @@ -2195,7 +2195,7 @@ int ext4_xattr_ibody_find(struct inode *inode, struct ext4_xattr_info *i, header = IHDR(inode, raw_inode); is->s.base = is->s.first = IFIRST(header); is->s.here = is->s.first; - is->s.end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; + is->s.end = ITAIL(inode, raw_inode); if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) { error = xattr_check_inode(inode, header, is->s.end); if (error) @@ -2746,7 +2746,7 @@ retry: */ base = IFIRST(header); - end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; + end = ITAIL(inode, raw_inode); min_offs = end - base; total_ino = sizeof(struct ext4_xattr_ibody_header) + sizeof(u32); diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h index 824faf0b15a8..e7417fb0eb76 100644 --- a/fs/ext4/xattr.h +++ b/fs/ext4/xattr.h @@ -68,6 +68,9 @@ struct ext4_xattr_entry { ((void *)raw_inode + \ EXT4_GOOD_OLD_INODE_SIZE + \ EXT4_I(inode)->i_extra_isize)) +#define ITAIL(inode, raw_inode) \ + ((void *)(raw_inode) + \ + EXT4_SB((inode)->i_sb)->s_inode_size) #define IFIRST(hdr) ((struct ext4_xattr_entry *)((hdr)+1)) /* From c000a8a9b5343a5ef867df173c6349672dacbd0f Mon Sep 17 00:00:00 2001 From: Ye Bin Date: Fri, 9 Jan 2026 16:23:14 +0100 Subject: [PATCH 17/74] ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 5701875f9609b000d91351eaa6bfd97fe2f157f4 ] There's issue as follows: BUG: KASAN: use-after-free in ext4_xattr_inode_dec_ref_all+0x6ff/0x790 Read of size 4 at addr ffff88807b003000 by task syz-executor.0/15172 CPU: 3 PID: 15172 Comm: syz-executor.0 Call Trace: __dump_stack lib/dump_stack.c:82 [inline] dump_stack+0xbe/0xfd lib/dump_stack.c:123 print_address_description.constprop.0+0x1e/0x280 mm/kasan/report.c:400 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560 kasan_report+0x3a/0x50 mm/kasan/report.c:585 ext4_xattr_inode_dec_ref_all+0x6ff/0x790 fs/ext4/xattr.c:1137 ext4_xattr_delete_inode+0x4c7/0xda0 fs/ext4/xattr.c:2896 ext4_evict_inode+0xb3b/0x1670 fs/ext4/inode.c:323 evict+0x39f/0x880 fs/inode.c:622 iput_final fs/inode.c:1746 [inline] iput fs/inode.c:1772 [inline] iput+0x525/0x6c0 fs/inode.c:1758 ext4_orphan_cleanup fs/ext4/super.c:3298 [inline] ext4_fill_super+0x8c57/0xba40 fs/ext4/super.c:5300 mount_bdev+0x355/0x410 fs/super.c:1446 legacy_get_tree+0xfe/0x220 fs/fs_context.c:611 vfs_get_tree+0x8d/0x2f0 fs/super.c:1576 do_new_mount fs/namespace.c:2983 [inline] path_mount+0x119a/0x1ad0 fs/namespace.c:3316 do_mount+0xfc/0x110 fs/namespace.c:3329 __do_sys_mount fs/namespace.c:3540 [inline] __se_sys_mount+0x219/0x2e0 fs/namespace.c:3514 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Memory state around the buggy address: ffff88807b002f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88807b002f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88807b003000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88807b003080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88807b003100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Above issue happens as ext4_xattr_delete_inode() isn't check xattr is valid if xattr is in inode. To solve above issue call xattr_check_inode() check if xattr if valid in inode. In fact, we can directly verify in ext4_iget_extra_inode(), so that there is no divergent verification. Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Signed-off-by: Ye Bin Reviewed-by: Jan Kara Link: https://patch.msgid.link/20250208063141.1539283-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: David Nyström Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inode.c | 5 +++++ fs/ext4/xattr.c | 26 +------------------------- fs/ext4/xattr.h | 7 +++++++ 3 files changed, 13 insertions(+), 25 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a94447946ff5..bf1f8319e2d7 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4752,6 +4752,11 @@ static inline int ext4_iget_extra_inode(struct inode *inode, *magic == cpu_to_le32(EXT4_XATTR_MAGIC)) { int err; + err = xattr_check_inode(inode, IHDR(inode, raw_inode), + ITAIL(inode, raw_inode)); + if (err) + return err; + ext4_set_inode_state(inode, EXT4_STATE_XATTR); err = ext4_find_inline_data_nolock(inode); if (!err && ext4_has_inline_data(inode)) diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 4217363906e0..030a7725d215 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -263,7 +263,7 @@ errout: __ext4_xattr_check_block((inode), (bh), __func__, __LINE__) -static int +int __xattr_check_inode(struct inode *inode, struct ext4_xattr_ibody_header *header, void *end, const char *function, unsigned int line) { @@ -280,9 +280,6 @@ errout: return error; } -#define xattr_check_inode(inode, header, end) \ - __xattr_check_inode((inode), (header), (end), __func__, __LINE__) - static int xattr_find_entry(struct inode *inode, struct ext4_xattr_entry **pentry, void *end, int name_index, const char *name, int sorted) @@ -600,9 +597,6 @@ ext4_xattr_ibody_get(struct inode *inode, int name_index, const char *name, raw_inode = ext4_raw_inode(&iloc); header = IHDR(inode, raw_inode); end = ITAIL(inode, raw_inode); - error = xattr_check_inode(inode, header, end); - if (error) - goto cleanup; entry = IFIRST(header); error = xattr_find_entry(inode, &entry, end, name_index, name, 0); if (error) @@ -734,7 +728,6 @@ ext4_xattr_ibody_list(struct dentry *dentry, char *buffer, size_t buffer_size) struct ext4_xattr_ibody_header *header; struct ext4_inode *raw_inode; struct ext4_iloc iloc; - void *end; int error; if (!ext4_test_inode_state(inode, EXT4_STATE_XATTR)) @@ -744,14 +737,9 @@ ext4_xattr_ibody_list(struct dentry *dentry, char *buffer, size_t buffer_size) return error; raw_inode = ext4_raw_inode(&iloc); header = IHDR(inode, raw_inode); - end = ITAIL(inode, raw_inode); - error = xattr_check_inode(inode, header, end); - if (error) - goto cleanup; error = ext4_xattr_list_entries(dentry, IFIRST(header), buffer, buffer_size); -cleanup: brelse(iloc.bh); return error; } @@ -819,7 +807,6 @@ int ext4_get_inode_usage(struct inode *inode, qsize_t *usage) struct ext4_xattr_ibody_header *header; struct ext4_xattr_entry *entry; qsize_t ea_inode_refs = 0; - void *end; int ret; lockdep_assert_held_read(&EXT4_I(inode)->xattr_sem); @@ -830,10 +817,6 @@ int ext4_get_inode_usage(struct inode *inode, qsize_t *usage) goto out; raw_inode = ext4_raw_inode(&iloc); header = IHDR(inode, raw_inode); - end = ITAIL(inode, raw_inode); - ret = xattr_check_inode(inode, header, end); - if (ret) - goto out; for (entry = IFIRST(header); !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) @@ -2197,9 +2180,6 @@ int ext4_xattr_ibody_find(struct inode *inode, struct ext4_xattr_info *i, is->s.here = is->s.first; is->s.end = ITAIL(inode, raw_inode); if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) { - error = xattr_check_inode(inode, header, is->s.end); - if (error) - return error; /* Find the named attribute. */ error = xattr_find_entry(inode, &is->s.here, is->s.end, i->name_index, i->name, 0); @@ -2750,10 +2730,6 @@ retry: min_offs = end - base; total_ino = sizeof(struct ext4_xattr_ibody_header) + sizeof(u32); - error = xattr_check_inode(inode, header, end); - if (error) - goto cleanup; - ifree = ext4_xattr_free_space(base, &min_offs, base, &total_ino); if (ifree >= isize_diff) goto shift; diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h index e7417fb0eb76..17c0d6bb230b 100644 --- a/fs/ext4/xattr.h +++ b/fs/ext4/xattr.h @@ -210,6 +210,13 @@ extern int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode, extern struct mb_cache *ext4_xattr_create_cache(void); extern void ext4_xattr_destroy_cache(struct mb_cache *); +extern int +__xattr_check_inode(struct inode *inode, struct ext4_xattr_ibody_header *header, + void *end, const char *function, unsigned int line); + +#define xattr_check_inode(inode, header, end) \ + __xattr_check_inode((inode), (header), (end), __func__, __LINE__) + #ifdef CONFIG_EXT4_FS_SECURITY extern int ext4_init_security(handle_t *handle, struct inode *inode, struct inode *dir, const struct qstr *qstr); From d524beca7e87161cc4f7a59075403227f30e2756 Mon Sep 17 00:00:00 2001 From: Sharath Chandra Vurukala Date: Mon, 12 Jan 2026 06:35:45 +0000 Subject: [PATCH 18/74] net: Add locking to protect skb->dev access in ip_output [ Upstream commit 1dbf1d590d10a6d1978e8184f8dfe20af22d680a] In ip_output() skb->dev is updated from the skb_dst(skb)->dev this can become invalid when the interface is unregistered and freed, Introduced new skb_dst_dev_rcu() function to be used instead of skb_dst_dev() within rcu_locks in ip_output.This will ensure that all the skb's associated with the dev being deregistered will be transnmitted out first, before freeing the dev. Given that ip_output() is called within an rcu_read_lock() critical section or from a bottom-half context, it is safe to introduce an RCU read-side critical section within it. Multiple panic call stacks were observed when UL traffic was run in concurrency with device deregistration from different functions, pasting one sample for reference. [496733.627565][T13385] Call trace: [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0 [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498 [496733.627595][T13385] ip_finish_output+0xa4/0xf4 [496733.627605][T13385] ip_output+0x100/0x1a0 [496733.627613][T13385] ip_send_skb+0x68/0x100 [496733.627618][T13385] udp_send_skb+0x1c4/0x384 [496733.627625][T13385] udp_sendmsg+0x7b0/0x898 [496733.627631][T13385] inet_sendmsg+0x5c/0x7c [496733.627639][T13385] __sys_sendto+0x174/0x1e4 [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c [496733.627653][T13385] invoke_syscall+0x58/0x11c [496733.627662][T13385] el0_svc_common+0x88/0xf4 [496733.627669][T13385] do_el0_svc+0x2c/0xb0 [496733.627676][T13385] el0_svc+0x2c/0xa4 [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4 [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8 Changes in v3: - Replaced WARN_ON() with WARN_ON_ONCE(), as suggested by Willem de Bruijn. - Dropped legacy lines mistakenly pulled in from an outdated branch. Changes in v2: - Addressed review comments from Eric Dumazet - Used READ_ONCE() to prevent potential load/store tearing - Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output Signed-off-by: Sharath Chandra Vurukala Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20250730105118.GA26100@hu-sharathv-hyd.qualcomm.com Signed-off-by: Jakub Kicinski [ Keerthana: Backported the patch to v5.15-v6.1 ] Signed-off-by: Keerthana K Signed-off-by: Greg Kroah-Hartman --- include/net/dst.h | 12 ++++++++++++ net/ipv4/ip_output.c | 16 +++++++++++----- 2 files changed, 23 insertions(+), 5 deletions(-) diff --git a/include/net/dst.h b/include/net/dst.h index 3a1a6f94a809..20a76e532afb 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -555,6 +555,18 @@ static inline void skb_dst_update_pmtu_no_confirm(struct sk_buff *skb, u32 mtu) dst->ops->update_pmtu(dst, NULL, skb, mtu, false); } +static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst) +{ + /* In the future, use rcu_dereference(dst->dev) */ + WARN_ON_ONCE(!rcu_read_lock_held()); + return READ_ONCE(dst->dev); +} + +static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb) +{ + return dst_dev_rcu(skb_dst(skb)); +} + struct dst_entry *dst_blackhole_check(struct dst_entry *dst, u32 cookie); void dst_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu, bool confirm_neigh); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 543d029102cf..79cf1385e8d2 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -420,17 +420,23 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb) int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb) { - struct net_device *dev = skb_dst(skb)->dev, *indev = skb->dev; + struct net_device *dev, *indev = skb->dev; + int ret_val; + + rcu_read_lock(); + dev = skb_dst_dev_rcu(skb); IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len); skb->dev = dev; skb->protocol = htons(ETH_P_IP); - return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, - net, sk, skb, indev, dev, - ip_finish_output, - !(IPCB(skb)->flags & IPSKB_REROUTED)); + ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, + net, sk, skb, indev, dev, + ip_finish_output, + !(IPCB(skb)->flags & IPSKB_REROUTED)); + rcu_read_unlock(); + return ret_val; } EXPORT_SYMBOL(ip_output); From e37ca0092ddace60833790b4ad7a390408fb1be9 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Mon, 12 Jan 2026 06:35:46 +0000 Subject: [PATCH 19/74] tls: Use __sk_dst_get() and dst_dev_rcu() in get_netdev_for_sock(). [ Upstream commit c65f27b9c3be2269918e1cbad6d8884741f835c5 ] get_netdev_for_sock() is called during setsockopt(), so not under RCU. Using sk_dst_get(sk)->dev could trigger UAF. Let's use __sk_dst_get() and dst_dev_rcu(). Note that the only ->ndo_sk_get_lower_dev() user is bond_sk_get_lower_dev(), which uses RCU. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Reviewed-by: Sabrina Dubroca Link: https://patch.msgid.link/20250916214758.650211-6-kuniyu@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin [ Keerthana: Backport to v5.15-v6.1 ] Signed-off-by: Keerthana K Signed-off-by: Greg Kroah-Hartman --- net/tls/tls_device.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index c51377a159be..dd6def66bec6 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -125,17 +125,19 @@ static void tls_device_queue_ctx_destruction(struct tls_context *ctx) /* We assume that the socket is already connected */ static struct net_device *get_netdev_for_sock(struct sock *sk) { - struct dst_entry *dst = sk_dst_get(sk); - struct net_device *netdev = NULL; + struct net_device *dev, *lowest_dev = NULL; + struct dst_entry *dst; - if (likely(dst)) { - netdev = netdev_sk_get_lowest_dev(dst->dev, sk); - dev_hold(netdev); + rcu_read_lock(); + dst = __sk_dst_get(sk); + dev = dst ? dst_dev_rcu(dst) : NULL; + if (likely(dev)) { + lowest_dev = netdev_sk_get_lowest_dev(dev, sk); + dev_hold(lowest_dev); } + rcu_read_unlock(); - dst_release(dst); - - return netdev; + return lowest_dev; } static void destroy_record(struct tls_record_info *record) From 36804f7a4c8bab5823a27020b09fb08fd380d5f3 Mon Sep 17 00:00:00 2001 From: Yang Li Date: Wed, 16 Oct 2024 17:56:26 +0800 Subject: [PATCH 20/74] csky: fix csky_cmpxchg_fixup not working [ Upstream commit 809ef03d6d21d5fea016bbf6babeec462e37e68c ] In the csky_cmpxchg_fixup function, it is incorrect to use the global variable csky_cmpxchg_stw to determine the address where the exception occurred.The global variable csky_cmpxchg_stw stores the opcode at the time of the exception, while &csky_cmpxchg_stw shows the address where the exception occurred. Signed-off-by: Yang Li Signed-off-by: Guo Ren Signed-off-by: Sasha Levin --- arch/csky/mm/fault.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/csky/mm/fault.c b/arch/csky/mm/fault.c index a885518ce1dd..5226bc08c336 100644 --- a/arch/csky/mm/fault.c +++ b/arch/csky/mm/fault.c @@ -45,8 +45,8 @@ static inline void csky_cmpxchg_fixup(struct pt_regs *regs) if (trap_no(regs) != VEC_TLBMODIFIED) return; - if (instruction_pointer(regs) == csky_cmpxchg_stw) - instruction_pointer_set(regs, csky_cmpxchg_ldw); + if (instruction_pointer(regs) == (unsigned long)&csky_cmpxchg_stw) + instruction_pointer_set(regs, (unsigned long)&csky_cmpxchg_ldw); return; } #endif From 516e82a195beb4a55cfa6ff564c570b8bb390f0f Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Tue, 11 Nov 2025 16:54:37 +0100 Subject: [PATCH 21/74] ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels [ Upstream commit fedadc4137234c3d00c4785eeed3e747fe9036ae ] gup_pgd_range() is invoked with disabled interrupts and invokes __kmap_local_page_prot() via pte_offset_map(), gup_p4d_range(). With HIGHPTE enabled, __kmap_local_page_prot() invokes kmap_high_get() which uses a spinlock_t via lock_kmap_any(). This leads to an sleeping-while-atomic error on PREEMPT_RT because spinlock_t becomes a sleeping lock and must not be acquired in atomic context. The loop in map_new_virtual() uses wait_queue_head_t for wake up which also is using a spinlock_t. Since HIGHPTE is rarely needed at all, turn it off for PREEMPT_RT to allow the use of get_user_pages_fast(). [arnd: rework patch to turn off HIGHPTE instead of HAVE_PAST_GUP] Co-developed-by: Arnd Bergmann Acked-by: Linus Walleij Reviewed-by: Arnd Bergmann Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Russell King (Oracle) Signed-off-by: Sasha Levin --- arch/arm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 6d5afe2e6ba3..f0352e5675dd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1318,7 +1318,7 @@ config HIGHMEM config HIGHPTE bool "Allocate 2nd-level pagetables from highmem" if EXPERT - depends on HIGHMEM + depends on HIGHMEM && !PREEMPT_RT default y help The VM uses one page of physical memory for each page table. From 3af7369111c086b1b30dec7dbd657f945bc3fd65 Mon Sep 17 00:00:00 2001 From: Sam James Date: Fri, 5 Dec 2025 08:14:57 +0000 Subject: [PATCH 22/74] alpha: don't reference obsolete termio struct for TC* constants [ Upstream commit 9aeed9041929812a10a6d693af050846942a1d16 ] Similar in nature to ab107276607af90b13a5994997e19b7b9731e251. glibc-2.42 drops the legacy termio struct, but the ioctls.h header still defines some TC* constants in terms of termio (via sizeof). Hardcode the values instead. This fixes building Python for example, which falls over like: ./Modules/termios.c:1119:16: error: invalid application of 'sizeof' to incomplete type 'struct termio' Link: https://bugs.gentoo.org/961769 Link: https://bugs.gentoo.org/962600 Signed-off-by: Sam James Reviewed-by: Magnus Lindholm Link: https://lore.kernel.org/r/6ebd3451908785cad53b50ca6bc46cfe9d6bc03c.1764922497.git.sam@gentoo.org Signed-off-by: Magnus Lindholm Signed-off-by: Sasha Levin --- arch/alpha/include/uapi/asm/ioctls.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/alpha/include/uapi/asm/ioctls.h b/arch/alpha/include/uapi/asm/ioctls.h index 971311605288..a09d04b49cc6 100644 --- a/arch/alpha/include/uapi/asm/ioctls.h +++ b/arch/alpha/include/uapi/asm/ioctls.h @@ -23,10 +23,10 @@ #define TCSETSW _IOW('t', 21, struct termios) #define TCSETSF _IOW('t', 22, struct termios) -#define TCGETA _IOR('t', 23, struct termio) -#define TCSETA _IOW('t', 24, struct termio) -#define TCSETAW _IOW('t', 25, struct termio) -#define TCSETAF _IOW('t', 28, struct termio) +#define TCGETA 0x40127417 +#define TCSETA 0x80127418 +#define TCSETAW 0x80127419 +#define TCSETAF 0x8012741c #define TCSBRK _IO('t', 29) #define TCXONC _IO('t', 30) From a5c5e97945ff38626c0f71deea377d50d75dae4e Mon Sep 17 00:00:00 2001 From: Scott Mayhew Date: Mon, 3 Nov 2025 10:44:15 -0500 Subject: [PATCH 23/74] NFSv4: ensure the open stateid seqid doesn't go backwards [ Upstream commit 2e47c3cc64b44b0b06cd68c2801db92ff143f2b2 ] We have observed an NFSv4 client receiving a LOCK reply with a status of NFS4ERR_OLD_STATEID and subsequently retrying the LOCK request with an earlier seqid value in the stateid. As this was for a new lockowner, that would imply that nfs_set_open_stateid_locked() had updated the open stateid seqid with an earlier value. Looking at nfs_set_open_stateid_locked(), if the incoming seqid is out of sequence, the task will sleep on the state->waitq for up to 5 seconds. If the task waits for the full 5 seconds, then after finishing the wait it'll update the open stateid seqid with whatever value the incoming seqid has. If there are multiple waiters in this scenario, then the last one to perform said update may not be the one with the highest seqid. Add a check to ensure that the seqid can only be incremented, and add a tracepoint to indicate when old seqids are skipped. Signed-off-by: Scott Mayhew Reviewed-by: Benjamin Coddington Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin --- fs/nfs/nfs4proc.c | 13 +++++++++++-- fs/nfs/nfs4trace.h | 1 + 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index d4ae2ce56af4..8258bce82e5b 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1700,8 +1700,17 @@ static void nfs_set_open_stateid_locked(struct nfs4_state *state, if (nfs_stateid_is_sequential(state, stateid)) break; - if (status) - break; + if (status) { + if (nfs4_stateid_match_other(stateid, &state->open_stateid) && + !nfs4_stateid_is_newer(stateid, &state->open_stateid)) { + trace_nfs4_open_stateid_update_skip(state->inode, + stateid, status); + return; + } else { + break; + } + } + /* Rely on seqids for serialisation with NFSv4.0 */ if (!nfs4_has_session(NFS_SERVER(state->inode)->nfs_client)) break; diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h index c8a57cfde64b..0fc1b4a6eab9 100644 --- a/fs/nfs/nfs4trace.h +++ b/fs/nfs/nfs4trace.h @@ -1248,6 +1248,7 @@ DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_setattr); DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_delegreturn); DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_open_stateid_update); DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_open_stateid_update_wait); +DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_open_stateid_update_skip); DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_close_stateid_update_wait); DECLARE_EVENT_CLASS(nfs4_getattr_event, From 4b5c2d46daa051f33d79e776434010e77f301b56 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 28 Nov 2025 18:56:46 -0500 Subject: [PATCH 24/74] NFS: Fix up the automount fs_context to use the correct cred [ Upstream commit a2a8fc27dd668e7562b5326b5ed2f1604cb1e2e9 ] When automounting, the fs_context should be fixed up to use the cred from the parent filesystem, since the operation is just extending the namespace. Authorisation to enter that namespace will already have been provided by the preceding lookup. Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin --- fs/nfs/namespace.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c index 663f1a3f7cc3..5cbbe59e5623 100644 --- a/fs/nfs/namespace.c +++ b/fs/nfs/namespace.c @@ -170,6 +170,11 @@ struct vfsmount *nfs_d_automount(struct path *path) if (!ctx->clone_data.fattr) goto out_fc; + if (fc->cred != server->cred) { + put_cred(fc->cred); + fc->cred = get_cred(server->cred); + } + if (fc->net_ns != client->cl_net) { put_net(fc->net_ns); fc->net_ns = get_net(client->cl_net); From aca8e5cc3098f601b96eaee5dead1b895e011abe Mon Sep 17 00:00:00 2001 From: ChenXiaoSong Date: Sun, 7 Dec 2025 09:22:53 +0800 Subject: [PATCH 25/74] smb/client: fix NT_STATUS_UNABLE_TO_FREE_VM value [ Upstream commit 9f99caa8950a76f560a90074e3a4b93cfa8b3d84 ] This was reported by the KUnit tests in the later patches. See MS-ERREF 2.3.1 STATUS_UNABLE_TO_FREE_VM. Keep it consistent with the value in the documentation. Signed-off-by: ChenXiaoSong Acked-by: Paulo Alcantara (Red Hat) Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/smb/client/nterr.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/smb/client/nterr.h b/fs/smb/client/nterr.h index edd4741cab0a..7ce063a1dc3f 100644 --- a/fs/smb/client/nterr.h +++ b/fs/smb/client/nterr.h @@ -70,7 +70,7 @@ extern const struct nt_err_code_struct nt_errs[]; #define NT_STATUS_NO_MEMORY 0xC0000000 | 0x0017 #define NT_STATUS_CONFLICTING_ADDRESSES 0xC0000000 | 0x0018 #define NT_STATUS_NOT_MAPPED_VIEW 0xC0000000 | 0x0019 -#define NT_STATUS_UNABLE_TO_FREE_VM 0x80000000 | 0x001a +#define NT_STATUS_UNABLE_TO_FREE_VM 0xC0000000 | 0x001a #define NT_STATUS_UNABLE_TO_DELETE_SECTION 0xC0000000 | 0x001b #define NT_STATUS_INVALID_SYSTEM_SERVICE 0xC0000000 | 0x001c #define NT_STATUS_ILLEGAL_INSTRUCTION 0xC0000000 | 0x001d From dc13b0b404e28688bd81154d6ffcc11c183727a8 Mon Sep 17 00:00:00 2001 From: ChenXiaoSong Date: Sun, 7 Dec 2025 09:17:57 +0800 Subject: [PATCH 26/74] smb/client: fix NT_STATUS_DEVICE_DOOR_OPEN value [ Upstream commit b2b50fca34da5ec231008edba798ddf92986bd7f ] This was reported by the KUnit tests in the later patches. See MS-ERREF 2.3.1 STATUS_DEVICE_DOOR_OPEN. Keep it consistent with the value in the documentation. Signed-off-by: ChenXiaoSong Acked-by: Paulo Alcantara (Red Hat) Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/smb/client/nterr.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/smb/client/nterr.h b/fs/smb/client/nterr.h index 7ce063a1dc3f..d46d42559eea 100644 --- a/fs/smb/client/nterr.h +++ b/fs/smb/client/nterr.h @@ -44,7 +44,7 @@ extern const struct nt_err_code_struct nt_errs[]; #define NT_STATUS_NO_DATA_DETECTED 0x8000001c #define NT_STATUS_STOPPED_ON_SYMLINK 0x8000002d #define NT_STATUS_DEVICE_REQUIRES_CLEANING 0x80000288 -#define NT_STATUS_DEVICE_DOOR_OPEN 0x80000288 +#define NT_STATUS_DEVICE_DOOR_OPEN 0x80000289 #define NT_STATUS_UNSUCCESSFUL 0xC0000000 | 0x0001 #define NT_STATUS_NOT_IMPLEMENTED 0xC0000000 | 0x0002 #define NT_STATUS_INVALID_INFO_CLASS 0xC0000000 | 0x0003 From db50a99e1b68cb284cb30dc124c63217dd177525 Mon Sep 17 00:00:00 2001 From: ChenXiaoSong Date: Sun, 7 Dec 2025 09:13:06 +0800 Subject: [PATCH 27/74] smb/client: fix NT_STATUS_NO_DATA_DETECTED value [ Upstream commit a1237c203f1757480dc2f3b930608ee00072d3cc ] This was reported by the KUnit tests in the later patches. See MS-ERREF 2.3.1 STATUS_NO_DATA_DETECTED. Keep it consistent with the value in the documentation. Signed-off-by: ChenXiaoSong Acked-by: Paulo Alcantara (Red Hat) Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/smb/client/nterr.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/smb/client/nterr.h b/fs/smb/client/nterr.h index d46d42559eea..e3a341316a71 100644 --- a/fs/smb/client/nterr.h +++ b/fs/smb/client/nterr.h @@ -41,7 +41,7 @@ extern const struct nt_err_code_struct nt_errs[]; #define NT_STATUS_MEDIA_CHANGED 0x8000001c #define NT_STATUS_END_OF_MEDIA 0x8000001e #define NT_STATUS_MEDIA_CHECK 0x80000020 -#define NT_STATUS_NO_DATA_DETECTED 0x8000001c +#define NT_STATUS_NO_DATA_DETECTED 0x80000022 #define NT_STATUS_STOPPED_ON_SYMLINK 0x8000002d #define NT_STATUS_DEVICE_REQUIRES_CLEANING 0x80000288 #define NT_STATUS_DEVICE_DOOR_OPEN 0x80000289 From e6edaf7e41d7c3df183de0ffd74a41a6d54b3c06 Mon Sep 17 00:00:00 2001 From: Wen Xiong Date: Tue, 28 Oct 2025 09:24:26 -0500 Subject: [PATCH 28/74] scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset [ Upstream commit 6ac3484fb13b2fc7f31cfc7f56093e7d0ce646a5 ] A dynamic remove/add storage adapter test hits EEH on PowerPC: EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160 EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650 EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr] EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr] EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr] EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440 EEH: [c00000000017e558] process_one_work+0x298/0x590 EEH: [c00000000017eef8] worker_thread+0xa8/0x620 EEH: [c00000000018be34] kthread+0x124/0x130 EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 A PCIe bus trace reveals that a vector of MSI-X is cleared to 0 by irqbalance daemon. If we disable irqbalance daemon, we won't see the issue. With debug enabled in ipr driver: [ 44.103071] ipr: Entering __ipr_remove [ 44.103083] ipr: Entering ipr_initiate_ioa_bringdown [ 44.103091] ipr: Entering ipr_reset_shutdown_ioa [ 44.103099] ipr: Leaving ipr_reset_shutdown_ioa [ 44.103105] ipr: Leaving ipr_initiate_ioa_bringdown [ 44.149918] ipr: Entering ipr_reset_ucode_download [ 44.149935] ipr: Entering ipr_reset_alert [ 44.150032] ipr: Entering ipr_reset_start_timer [ 44.150038] ipr: Leaving ipr_reset_alert [ 44.244343] scsi 1:2:3:0: alua: Detached [ 44.254300] ipr: Entering ipr_reset_start_bist [ 44.254320] ipr: Entering ipr_reset_start_timer [ 44.254325] ipr: Leaving ipr_reset_start_bist [ 44.364329] scsi 1:2:4:0: alua: Detached [ 45.134341] scsi 1:2:5:0: alua: Detached [ 45.860949] ipr: Entering ipr_reset_shutdown_ioa [ 45.860962] ipr: Leaving ipr_reset_shutdown_ioa [ 45.860966] ipr: Entering ipr_reset_alert [ 45.861028] ipr: Entering ipr_reset_start_timer [ 45.861035] ipr: Leaving ipr_reset_alert [ 45.964302] ipr: Entering ipr_reset_start_bist [ 45.964309] ipr: Entering ipr_reset_start_timer [ 45.964313] ipr: Leaving ipr_reset_start_bist [ 46.264301] ipr: Entering ipr_reset_bist_done [ 46.264309] ipr: Leaving ipr_reset_bist_done During adapter reset, ipr device driver blocks config space access but can't block MMIO access for MSI-X entries. There is very small window: irqbalance daemon kicks in during adapter reset before ipr driver calls pci_restore_state(pdev) to restore MSI-X table. irqbalance daemon reads back all 0 for that MSI-X vector in __pci_read_msi_msg(). irqbalance daemon: msi_domain_set_affinity() ->irq_chip_set_affinity_patent() ->xive_irq_set_affinity() ->irq_chip_compose_msi_msg() ->pseries_msi_compose_msg() ->__pci_read_msi_msg(): read all 0 since didn't call pci_restore_state ->irq_chip_write_msi_msg() -> pci_write_msg_msi(): write 0 to the msix vector entry When ipr driver calls pci_restore_state(pdev) in ipr_reset_restore_cfg_space(), the MSI-X vector entry has been cleared by irqbalance daemon in pci_write_msg_msix(). pci_restore_state() ->__pci_restore_msix_state() Below is the MSI-X table for ipr adapter after irqbalance daemon kicked in during adapter reset: Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0 Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0 Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0 Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0 Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0 Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0 Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0 Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0 Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0 ---------> Hit EEH since msix vector of index=8 are 0 Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0 Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0 Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0 Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0 Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0 Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0 Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0 [ 46.264312] ipr: Entering ipr_reset_restore_cfg_space [ 46.267439] ipr: Entering ipr_fail_all_ops [ 46.267447] ipr: Leaving ipr_fail_all_ops [ 46.267451] ipr: Leaving ipr_reset_restore_cfg_space [ 46.267454] ipr: Entering ipr_ioa_bringdown_done [ 46.267458] ipr: Leaving ipr_ioa_bringdown_done [ 46.267467] ipr: Entering ipr_worker_thread [ 46.267470] ipr: Leaving ipr_worker_thread IRQ balancing is not required during adapter reset. Enable "IRQ_NO_BALANCING" flag before starting adapter reset and disable it after calling pci_restore_state(). The irqbalance daemon is disabled for this short period of time (~2s). Co-developed-by: Kyle Mahlkuch Signed-off-by: Kyle Mahlkuch Signed-off-by: Wen Xiong Link: https://patch.msgid.link/20251028142427.3969819-2-wenxiong@linux.ibm.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/ipr.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index 8c062afb2918..89b899020850 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -62,8 +62,8 @@ #include #include #include +#include #include -#include #include #include #include @@ -8669,6 +8669,30 @@ static int ipr_dump_mailbox_wait(struct ipr_cmnd *ipr_cmd) return IPR_RC_JOB_RETURN; } +/** + * ipr_set_affinity_nobalance + * @ioa_cfg: ipr_ioa_cfg struct for an ipr device + * @flag: bool + * true: ensable "IRQ_NO_BALANCING" bit for msix interrupt + * false: disable "IRQ_NO_BALANCING" bit for msix interrupt + * Description: This function will be called to disable/enable + * "IRQ_NO_BALANCING" to avoid irqbalance daemon + * kicking in during adapter reset. + **/ +static void ipr_set_affinity_nobalance(struct ipr_ioa_cfg *ioa_cfg, bool flag) +{ + int irq, i; + + for (i = 0; i < ioa_cfg->nvectors; i++) { + irq = pci_irq_vector(ioa_cfg->pdev, i); + + if (flag) + irq_set_status_flags(irq, IRQ_NO_BALANCING); + else + irq_clear_status_flags(irq, IRQ_NO_BALANCING); + } +} + /** * ipr_reset_restore_cfg_space - Restore PCI config space. * @ipr_cmd: ipr command struct @@ -8693,6 +8717,7 @@ static int ipr_reset_restore_cfg_space(struct ipr_cmnd *ipr_cmd) return IPR_RC_JOB_CONTINUE; } + ipr_set_affinity_nobalance(ioa_cfg, false); ipr_fail_all_ops(ioa_cfg); if (ioa_cfg->sis64) { @@ -8772,6 +8797,7 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd) rc = pci_write_config_byte(ioa_cfg->pdev, PCI_BIST, PCI_BIST_START); if (rc == PCIBIOS_SUCCESSFUL) { + ipr_set_affinity_nobalance(ioa_cfg, true); ipr_cmd->job_step = ipr_reset_bist_done; ipr_reset_start_timer(ipr_cmd, IPR_WAIT_FOR_BIST_TIMEOUT); rc = IPR_RC_JOB_RETURN; From 4bf9c3a7ac2787160580433f817d5838a14f04c7 Mon Sep 17 00:00:00 2001 From: Brian Kao Date: Wed, 12 Nov 2025 06:32:02 +0000 Subject: [PATCH 29/74] scsi: ufs: core: Fix EH failure after W-LUN resume error [ Upstream commit b4bb6daf4ac4d4560044ecdd81e93aa2f6acbb06 ] When a W-LUN resume fails, its parent devices in the SCSI hierarchy, including the scsi_target, may be runtime suspended. Subsequently, the error handler in ufshcd_recover_pm_error() fails to set the W-LUN device back to active because the parent target is not active. This results in the following errors: google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ... ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000 ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5 ... ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active Address this by: 1. Ensuring the W-LUN's parent scsi_target is runtime resumed before attempting to set the W-LUN to active within ufshcd_recover_pm_error(). 2. Explicitly checking for power.runtime_error on the HBA and W-LUN devices before calling pm_runtime_set_active() to clear the error state. 3. Adding pm_runtime_get_sync(hba->dev) in ufshcd_err_handling_prepare() to ensure the HBA itself is active during error recovery, even if a child device resume failed. These changes ensure the device power states are managed correctly during error recovery. Signed-off-by: Brian Kao Tested-by: Brian Kao Reviewed-by: Bart Van Assche Link: https://patch.msgid.link/20251112063214.1195761-1-powenkao@google.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/ufs/core/ufshcd.c | 40 +++++++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 2435ea7ec089..d553169b4d9a 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -6154,6 +6154,11 @@ static void ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend) static void ufshcd_err_handling_prepare(struct ufs_hba *hba) { + /* + * A WLUN resume failure could potentially lead to the HBA being + * runtime suspended, so take an extra reference on hba->dev. + */ + pm_runtime_get_sync(hba->dev); ufshcd_rpm_get_sync(hba); if (pm_runtime_status_suspended(&hba->ufs_device_wlun->sdev_gendev) || hba->is_sys_suspended) { @@ -6194,6 +6199,7 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba) if (ufshcd_is_clkscaling_supported(hba)) ufshcd_clk_scaling_suspend(hba, false); ufshcd_rpm_put(hba); + pm_runtime_put(hba->dev); } static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba) @@ -6208,28 +6214,42 @@ static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba) #ifdef CONFIG_PM static void ufshcd_recover_pm_error(struct ufs_hba *hba) { + struct scsi_target *starget = hba->ufs_device_wlun->sdev_target; struct Scsi_Host *shost = hba->host; struct scsi_device *sdev; struct request_queue *q; - int ret; + bool resume_sdev_queues = false; hba->is_sys_suspended = false; - /* - * Set RPM status of wlun device to RPM_ACTIVE, - * this also clears its runtime error. - */ - ret = pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev); - /* hba device might have a runtime error otherwise */ - if (ret) - ret = pm_runtime_set_active(hba->dev); + /* + * Ensure the parent's error status is cleared before proceeding + * to the child, as the parent must be active to activate the child. + */ + if (hba->dev->power.runtime_error) { + /* hba->dev has no functional parent thus simplily set RPM_ACTIVE */ + pm_runtime_set_active(hba->dev); + resume_sdev_queues = true; + } + + if (hba->ufs_device_wlun->sdev_gendev.power.runtime_error) { + /* + * starget, parent of wlun, might be suspended if wlun resume failed. + * Make sure parent is resumed before set child (wlun) active. + */ + pm_runtime_get_sync(&starget->dev); + pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev); + pm_runtime_put_sync(&starget->dev); + resume_sdev_queues = true; + } + /* * If wlun device had runtime error, we also need to resume those * consumer scsi devices in case any of them has failed to be * resumed due to supplier runtime resume failure. This is to unblock * blk_queue_enter in case there are bios waiting inside it. */ - if (!ret) { + if (resume_sdev_queues) { shost_for_each_device(sdev, shost) { q = sdev->request_queue; if (q->dev && (q->rpm_status == RPM_SUSPENDED || From c28f41538405e5b2c49b2228e3ca1e471ac6a09a Mon Sep 17 00:00:00 2001 From: Xingui Yang Date: Tue, 2 Dec 2025 14:56:27 +0800 Subject: [PATCH 30/74] scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" [ Upstream commit 278712d20bc8ec29d1ad6ef9bdae9000ef2c220c ] This reverts commit ab2068a6fb84751836a84c26ca72b3beb349619d. When probing the exp-attached sata device, libsas/libata will issue a hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a broadcast event will be received after the disk probe fails, and this commit causes the probe will be re-executed on the disk, and a faulty disk may get into an indefinite loop of probe. Therefore, revert this commit, although it can fix some temporary issues with disk probe failure. Signed-off-by: Xingui Yang Reviewed-by: Jason Yan Reviewed-by: John Garry Link: https://patch.msgid.link/20251202065627.140361-1-yangxingui@huawei.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/libsas/sas_internal.h | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h index 6ddccc67e808..a94bd0790b05 100644 --- a/drivers/scsi/libsas/sas_internal.h +++ b/drivers/scsi/libsas/sas_internal.h @@ -119,20 +119,6 @@ static inline void sas_fail_probe(struct domain_device *dev, const char *func, i func, dev->parent ? "exp-attached" : "direct-attached", SAS_ADDR(dev->sas_addr), err); - - /* - * If the device probe failed, the expander phy attached address - * needs to be reset so that the phy will not be treated as flutter - * in the next revalidation - */ - if (dev->parent && !dev_is_expander(dev->dev_type)) { - struct sas_phy *phy = dev->phy; - struct domain_device *parent = dev->parent; - struct ex_phy *ex_phy = &parent->ex_dev.ex_phy[phy->number]; - - memset(ex_phy->attached_sas_addr, 0, SAS_ADDR_SIZE); - } - sas_unregister_dev(dev->port, dev); } From 8b199bd6c4290478b111bfadcde7cad1689e8a1b Mon Sep 17 00:00:00 2001 From: Haibo Chen Date: Wed, 19 Nov 2025 11:22:40 +0800 Subject: [PATCH 31/74] arm64: dts: add off-on-delay-us for usdhc2 regulator [ Upstream commit ca643894a37a25713029b36cfe7d1bae515cac08 ] For SD card, according to the spec requirement, for sd card power reset operation, it need sd card supply voltage to be lower than 0.5v and keep over 1ms, otherwise, next time power back the sd card supply voltage to 3.3v, sd card can't support SD3.0 mode again. To match such requirement on imx8qm-mek board, add 4.8ms delay between sd power off and power on. Fixes: 307fd14d4b14 ("arm64: dts: imx: add imx8qm mek support") Reviewed-by: Frank Li Signed-off-by: Haibo Chen Signed-off-by: Shawn Guo Signed-off-by: Sasha Levin --- arch/arm64/boot/dts/freescale/imx8qm-mek.dts | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/freescale/imx8qm-mek.dts b/arch/arm64/boot/dts/freescale/imx8qm-mek.dts index 470e4e4aa8c7..059f8c0ab93d 100644 --- a/arch/arm64/boot/dts/freescale/imx8qm-mek.dts +++ b/arch/arm64/boot/dts/freescale/imx8qm-mek.dts @@ -34,6 +34,7 @@ regulator-max-microvolt = <3000000>; gpio = <&lsio_gpio4 7 GPIO_ACTIVE_HIGH>; enable-active-high; + off-on-delay-us = <4800>; }; }; From d0a705f10a6f0825fbc7d9d9f61aae040a4689ef Mon Sep 17 00:00:00 2001 From: Ian Ray Date: Mon, 1 Dec 2025 11:56:05 +0200 Subject: [PATCH 32/74] ARM: dts: imx6q-ba16: fix RTC interrupt level [ Upstream commit e6a4eedd49ce27c16a80506c66a04707e0ee0116 ] RTC interrupt level should be set to "LOW". This was revealed by the introduction of commit: f181987ef477 ("rtc: m41t80: use IRQ flags obtained from fwnode") which changed the way IRQ type is obtained. Fixes: 56c27310c1b4 ("ARM: dts: imx: Add Advantech BA-16 Qseven module") Signed-off-by: Ian Ray Signed-off-by: Shawn Guo Signed-off-by: Sasha Levin --- arch/arm/boot/dts/imx6q-ba16.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/imx6q-ba16.dtsi b/arch/arm/boot/dts/imx6q-ba16.dtsi index f266f1b7e0cf..0c033e69ecc0 100644 --- a/arch/arm/boot/dts/imx6q-ba16.dtsi +++ b/arch/arm/boot/dts/imx6q-ba16.dtsi @@ -335,7 +335,7 @@ pinctrl-0 = <&pinctrl_rtc>; reg = <0x32>; interrupt-parent = <&gpio4>; - interrupts = <10 IRQ_TYPE_LEVEL_HIGH>; + interrupts = <10 IRQ_TYPE_LEVEL_LOW>; }; }; From efe15ec22825a2c4563e48e869ed26a99935720d Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Tue, 2 Dec 2025 14:41:51 +0100 Subject: [PATCH 33/74] arm64: dts: imx8mp: Fix LAN8740Ai PHY reference clock on DH electronics i.MX8M Plus DHCOM [ Upstream commit c63749a7ddc59ac6ec0b05abfa0a21af9f2c1d38 ] Add missing 'clocks' property to LAN8740Ai PHY node, to allow the PHY driver to manage LAN8740Ai CLKIN reference clock supply. This fixes sporadic link bouncing caused by interruptions on the PHY reference clock, by letting the PHY driver manage the reference clock and assure there are no interruptions. This follows the matching PHY driver recommendation described in commit bedd8d78aba3 ("net: phy: smsc: LAN8710/20: add phy refclk in support") Fixes: 8d6712695bc8 ("arm64: dts: imx8mp: Add support for DH electronics i.MX8M Plus DHCOM and PDK2") Signed-off-by: Marek Vasut Tested-by: Christoph Niedermaier Signed-off-by: Shawn Guo Signed-off-by: Sasha Levin --- arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi b/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi index 2fd50b5890af..0b81b85887f4 100644 --- a/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi @@ -97,6 +97,7 @@ ethphy0f: ethernet-phy@1 { /* SMSC LAN8740Ai */ compatible = "ethernet-phy-id0007.c110", "ethernet-phy-ieee802.3-c22"; + clocks = <&clk IMX8MP_CLK_ENET_QOS>; interrupt-parent = <&gpio3>; interrupts = <19 IRQ_TYPE_LEVEL_LOW>; pinctrl-0 = <&pinctrl_ethphy0>; From 8ba9091d6db6c348b6681b06068ae721b14c8dfe Mon Sep 17 00:00:00 2001 From: Fernando Fernandez Mancera Date: Wed, 17 Dec 2025 21:21:59 +0100 Subject: [PATCH 34/74] netfilter: nft_synproxy: avoid possible data-race on update operation [ Upstream commit 36a3200575642846a96436d503d46544533bb943 ] During nft_synproxy eval we are reading nf_synproxy_info struct which can be modified on update operation concurrently. As nf_synproxy_info struct fits in 32 bits, use READ_ONCE/WRITE_ONCE annotations. Fixes: ee394f96ad75 ("netfilter: nft_synproxy: add synproxy stateful object support") Signed-off-by: Fernando Fernandez Mancera Signed-off-by: Florian Westphal Signed-off-by: Sasha Levin --- net/netfilter/nft_synproxy.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nft_synproxy.c b/net/netfilter/nft_synproxy.c index a450f28a5ef6..0cc638553aef 100644 --- a/net/netfilter/nft_synproxy.c +++ b/net/netfilter/nft_synproxy.c @@ -48,7 +48,7 @@ static void nft_synproxy_eval_v4(const struct nft_synproxy *priv, struct tcphdr *_tcph, struct synproxy_options *opts) { - struct nf_synproxy_info info = priv->info; + struct nf_synproxy_info info = READ_ONCE(priv->info); struct net *net = nft_net(pkt); struct synproxy_net *snet = synproxy_pernet(net); struct sk_buff *skb = pkt->skb; @@ -79,7 +79,7 @@ static void nft_synproxy_eval_v6(const struct nft_synproxy *priv, struct tcphdr *_tcph, struct synproxy_options *opts) { - struct nf_synproxy_info info = priv->info; + struct nf_synproxy_info info = READ_ONCE(priv->info); struct net *net = nft_net(pkt); struct synproxy_net *snet = synproxy_pernet(net); struct sk_buff *skb = pkt->skb; @@ -340,7 +340,7 @@ static void nft_synproxy_obj_update(struct nft_object *obj, struct nft_synproxy *newpriv = nft_obj_data(newobj); struct nft_synproxy *priv = nft_obj_data(obj); - priv->info = newpriv->info; + WRITE_ONCE(priv->info, newpriv->info); } static struct nft_object_type nft_synproxy_obj_type; From d7bc1101a0c7cab1b195fe3bd8027925218d136f Mon Sep 17 00:00:00 2001 From: Zilin Guan Date: Wed, 24 Dec 2025 12:48:26 +0000 Subject: [PATCH 35/74] netfilter: nf_tables: fix memory leak in nf_tables_newrule() [ Upstream commit d077e8119ddbb4fca67540f1a52453631a47f221 ] In nf_tables_newrule(), if nft_use_inc() fails, the function jumps to the err_release_rule label without freeing the allocated flow, leading to a memory leak. Fix this by adding a new label err_destroy_flow and jumping to it when nft_use_inc() fails. This ensures that the flow is properly released in this error case. Fixes: 1689f25924ada ("netfilter: nf_tables: report use refcount overflow") Signed-off-by: Zilin Guan Signed-off-by: Florian Westphal Signed-off-by: Sasha Levin --- net/netfilter/nf_tables_api.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index b278f493cc93..d154e3e0c980 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3811,7 +3811,7 @@ static int nf_tables_newrule(struct sk_buff *skb, const struct nfnl_info *info, if (!nft_use_inc(&chain->use)) { err = -EMFILE; - goto err_release_rule; + goto err_destroy_flow; } if (info->nlh->nlmsg_flags & NLM_F_REPLACE) { @@ -3861,6 +3861,7 @@ static int nf_tables_newrule(struct sk_buff *skb, const struct nfnl_info *info, err_destroy_flow_rule: nft_use_dec_restore(&chain->use); +err_destroy_flow: if (flow) nft_flow_rule_destroy(flow); err_release_rule: From 9f45588993d7f115280fc726119ca86fba32a811 Mon Sep 17 00:00:00 2001 From: Fernando Fernandez Mancera Date: Wed, 17 Dec 2025 15:46:40 +0100 Subject: [PATCH 36/74] netfilter: nf_conncount: update last_gc only when GC has been performed [ Upstream commit 7811ba452402d58628e68faedf38745b3d485e3c ] Currently last_gc is being updated everytime a new connection is tracked, that means that it is updated even if a GC wasn't performed. With a sufficiently high packet rate, it is possible to always bypass the GC, causing the list to grow infinitely. Update the last_gc value only when a GC has been actually performed. Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC") Signed-off-by: Fernando Fernandez Mancera Signed-off-by: Florian Westphal Signed-off-by: Sasha Levin --- net/netfilter/nf_conncount.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c index c00b8e522c5a..a2c5a7ba0c6f 100644 --- a/net/netfilter/nf_conncount.c +++ b/net/netfilter/nf_conncount.c @@ -229,6 +229,7 @@ static int __nf_conncount_add(struct net *net, nf_ct_put(found_ct); } + list->last_gc = (u32)jiffies; add_new_node: if (WARN_ON_ONCE(list->count > INT_MAX)) { @@ -248,7 +249,6 @@ add_new_node: conn->jiffies32 = (u32)jiffies; list_add_tail(&conn->node, &list->head); list->count++; - list->last_gc = (u32)jiffies; out_put: if (refcounted) From 325aea74be7e192b5c947c782da23b0d19a5fda2 Mon Sep 17 00:00:00 2001 From: Alok Tiwari Date: Mon, 29 Dec 2025 21:21:18 -0800 Subject: [PATCH 37/74] net: marvell: prestera: fix NULL dereference on devlink_alloc() failure [ Upstream commit a428e0da1248c353557970848994f35fd3f005e2 ] devlink_alloc() may return NULL on allocation failure, but prestera_devlink_alloc() unconditionally calls devlink_priv() on the returned pointer. This leads to a NULL pointer dereference if devlink allocation fails. Add a check for a NULL devlink pointer and return NULL early to avoid the crash. Fixes: 34dd1710f5a3 ("net: marvell: prestera: Add basic devlink support") Signed-off-by: Alok Tiwari Acked-by: Elad Nachman Link: https://patch.msgid.link/20251230052124.897012-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/marvell/prestera/prestera_devlink.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/marvell/prestera/prestera_devlink.c b/drivers/net/ethernet/marvell/prestera/prestera_devlink.c index 06279cd6da67..8f0ae62d4a89 100644 --- a/drivers/net/ethernet/marvell/prestera/prestera_devlink.c +++ b/drivers/net/ethernet/marvell/prestera/prestera_devlink.c @@ -392,6 +392,8 @@ struct prestera_switch *prestera_devlink_alloc(struct prestera_device *dev) dl = devlink_alloc(&prestera_dl_ops, sizeof(struct prestera_switch), dev->dev); + if (!dl) + return NULL; return devlink_priv(dl); } From 04440b7068ca98de7821de9736c6dfc62e57e594 Mon Sep 17 00:00:00 2001 From: Alexandre Knecht Date: Sun, 28 Dec 2025 03:00:57 +0100 Subject: [PATCH 38/74] bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress [ Upstream commit 3128df6be147768fe536986fbb85db1d37806a9f ] When using an 802.1ad bridge with vlan_tunnel, the C-VLAN tag is incorrectly stripped from frames during egress processing. br_handle_egress_vlan_tunnel() uses skb_vlan_pop() to remove the S-VLAN from hwaccel before VXLAN encapsulation. However, skb_vlan_pop() also moves any "next" VLAN from the payload into hwaccel: /* move next vlan tag to hw accel tag */ __skb_vlan_pop(skb, &vlan_tci); __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci); For QinQ frames where the C-VLAN sits in the payload, this moves it to hwaccel where it gets lost during VXLAN encapsulation. Fix by calling __vlan_hwaccel_clear_tag() directly, which clears only the hwaccel S-VLAN and leaves the payload untouched. This path is only taken when vlan_tunnel is enabled and tunnel_info is configured, so 802.1Q bridges are unaffected. Tested with 802.1ad bridge + VXLAN vlan_tunnel, verified C-VLAN preserved in VXLAN payload via tcpdump. Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") Signed-off-by: Alexandre Knecht Reviewed-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Link: https://patch.msgid.link/20251228020057.2788865-1-knecht.alexandre@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/bridge/br_vlan_tunnel.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/net/bridge/br_vlan_tunnel.c b/net/bridge/br_vlan_tunnel.c index 6399a8a69d07..0f03572d89d0 100644 --- a/net/bridge/br_vlan_tunnel.c +++ b/net/bridge/br_vlan_tunnel.c @@ -187,7 +187,6 @@ int br_handle_egress_vlan_tunnel(struct sk_buff *skb, { struct metadata_dst *tunnel_dst; __be64 tunnel_id; - int err; if (!vlan) return 0; @@ -197,9 +196,13 @@ int br_handle_egress_vlan_tunnel(struct sk_buff *skb, return 0; skb_dst_drop(skb); - err = skb_vlan_pop(skb); - if (err) - return err; + /* For 802.1ad (QinQ), skb_vlan_pop() incorrectly moves the C-VLAN + * from payload to hwaccel after clearing S-VLAN. We only need to + * clear the hwaccel S-VLAN; the C-VLAN must stay in payload for + * correct VXLAN encapsulation. This is also correct for 802.1Q + * where no C-VLAN exists in payload. + */ + __vlan_hwaccel_clear_tag(skb); tunnel_dst = rcu_dereference(vlan->tinfo.tunnel_dst); if (tunnel_dst && dst_hold_safe(&tunnel_dst->dst)) From b17818307446c5a8d925a39a792261dbfa930041 Mon Sep 17 00:00:00 2001 From: Jerry Wu Date: Thu, 25 Dec 2025 20:36:17 +0000 Subject: [PATCH 39/74] net: mscc: ocelot: Fix crash when adding interface under a lag [ Upstream commit 34f3ff52cb9fa7dbf04f5c734fcc4cb6ed5d1a95 ] Commit 15faa1f67ab4 ("lan966x: Fix crash when adding interface under a lag") fixed a similar issue in the lan966x driver caused by a NULL pointer dereference. The ocelot_set_aggr_pgids() function in the ocelot driver has similar logic and is susceptible to the same crash. This issue specifically affects the ocelot_vsc7514.c frontend, which leaves unused ports as NULL pointers. The felix_vsc9959.c frontend is unaffected as it uses the DSA framework which registers all ports. Fix this by checking if the port pointer is valid before accessing it. Fixes: 528d3f190c98 ("net: mscc: ocelot: drop the use of the "lags" array") Signed-off-by: Jerry Wu Reviewed-by: Vladimir Oltean Link: https://patch.msgid.link/tencent_75EF812B305E26B0869C673DD1160866C90A@qq.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/mscc/ocelot.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 203cb4978544..01417c2a61e2 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -2202,14 +2202,16 @@ static void ocelot_set_aggr_pgids(struct ocelot *ocelot) /* Now, set PGIDs for each active LAG */ for (lag = 0; lag < ocelot->num_phys_ports; lag++) { - struct net_device *bond = ocelot->ports[lag]->bond; + struct ocelot_port *ocelot_port = ocelot->ports[lag]; int num_active_ports = 0; + struct net_device *bond; unsigned long bond_mask; u8 aggr_idx[16]; - if (!bond || (visited & BIT(lag))) + if (!ocelot_port || !ocelot_port->bond || (visited & BIT(lag))) continue; + bond = ocelot_port->bond; bond_mask = ocelot_get_bond_mask(ocelot, bond); for_each_set_bit(port, &bond_mask, ocelot->num_phys_ports) { From 305e2c40ce33a065e1758d91f732ad9e4a3fb75c Mon Sep 17 00:00:00 2001 From: "yuan.gao" Date: Wed, 24 Dec 2025 14:31:45 +0800 Subject: [PATCH 40/74] inet: ping: Fix icmp out counting [ Upstream commit 4c0856c225b39b1def6c9a6bc56faca79550da13 ] When the ping program uses an IPPROTO_ICMP socket to send ICMP_ECHO messages, ICMP_MIB_OUTMSGS is counted twice. ping_v4_sendmsg ping_v4_push_pending_frames ip_push_pending_frames ip_finish_skb __ip_make_skb icmp_out_count(net, icmp_type); // first count icmp_out_count(sock_net(sk), user_icmph.type); // second count However, when the ping program uses an IPPROTO_RAW socket, ICMP_MIB_OUTMSGS is counted correctly only once. Therefore, the first count should be removed. Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: yuan.gao Reviewed-by: Ido Schimmel Tested-by: Ido Schimmel Link: https://patch.msgid.link/20251224063145.3615282-1-yuan.gao@ucloud.cn Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv4/ping.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 5178a3f3cb53..cadf743ab4f5 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -848,10 +848,8 @@ out: out_free: if (free) kfree(ipc.opt); - if (!err) { - icmp_out_count(sock_net(sk), user_icmph.type); + if (!err) return len; - } return err; do_confirm: From 8c6901aa29626e35045130bac09b75f791acca85 Mon Sep 17 00:00:00 2001 From: Weiming Shi Date: Wed, 24 Dec 2025 04:35:35 +0800 Subject: [PATCH 41/74] net: sock: fix hardened usercopy panic in sock_recv_errqueue [ Upstream commit 2a71a1a8d0ed718b1c7a9ac61f07e5755c47ae20 ] skbuff_fclone_cache was created without defining a usercopy region, [1] unlike skbuff_head_cache which properly whitelists the cb[] field. [2] This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is enabled and the kernel attempts to copy sk_buff.cb data to userspace via sock_recv_errqueue() -> put_cmsg(). The crash occurs when: 1. TCP allocates an skb using alloc_skb_fclone() (from skbuff_fclone_cache) [1] 2. The skb is cloned via skb_clone() using the pre-allocated fclone [3] 3. The cloned skb is queued to sk_error_queue for timestamp reporting 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE) 5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb [4] 6. __check_heap_object() fails because skbuff_fclone_cache has no usercopy whitelist [5] When cloned skbs allocated from skbuff_fclone_cache are used in the socket error queue, accessing the sock_exterr_skb structure in skb->cb via put_cmsg() triggers a usercopy hardening violation: [ 5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)! [ 5.382796] kernel BUG at mm/usercopy.c:102! [ 5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI [ 5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7 [ 5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 5.384903] RIP: 0010:usercopy_abort+0x6c/0x80 [ 5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490 [ 5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246 [ 5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74 [ 5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0 [ 5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74 [ 5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001 [ 5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00 [ 5.384903] FS: 0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000 [ 5.384903] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0 [ 5.384903] PKRU: 55555554 [ 5.384903] Call Trace: [ 5.384903] [ 5.384903] __check_heap_object+0x9a/0xd0 [ 5.384903] __check_object_size+0x46c/0x690 [ 5.384903] put_cmsg+0x129/0x5e0 [ 5.384903] sock_recv_errqueue+0x22f/0x380 [ 5.384903] tls_sw_recvmsg+0x7ed/0x1960 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5.384903] ? schedule+0x6d/0x270 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5.384903] ? mutex_unlock+0x81/0xd0 [ 5.384903] ? __pfx_mutex_unlock+0x10/0x10 [ 5.384903] ? __pfx_tls_sw_recvmsg+0x10/0x10 [ 5.384903] ? _raw_spin_lock_irqsave+0x8f/0xf0 [ 5.384903] ? _raw_read_unlock_irqrestore+0x20/0x40 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 The crash offset 296 corresponds to skb2->cb within skbuff_fclones: - sizeof(struct sk_buff) = 232 - offsetof(struct sk_buff, cb) = 40 - offset of skb2.cb in fclones = 232 + 40 = 272 - crash offset 296 = 272 + 24 (inside sock_exterr_skb.ee) This patch uses a local stack variable as a bounce buffer to avoid the hardened usercopy check failure. [1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885 [2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104 [3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566 [4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491 [5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719 Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0") Reported-by: Xiang Mei Signed-off-by: Weiming Shi Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20251223203534.1392218-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/core/sock.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 7702033680e7..6c178b474266 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3614,7 +3614,7 @@ void sock_enable_timestamp(struct sock *sk, enum sock_flags flag) int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len, int level, int type) { - struct sock_exterr_skb *serr; + struct sock_extended_err ee; struct sk_buff *skb; int copied, err; @@ -3634,8 +3634,9 @@ int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len, sock_recv_timestamp(msg, sk, skb); - serr = SKB_EXT_ERR(skb); - put_cmsg(msg, level, type, sizeof(serr->ee), &serr->ee); + /* We must use a bounce buffer for CONFIG_HARDENED_USERCOPY=y */ + ee = SKB_EXT_ERR(skb)->ee; + put_cmsg(msg, level, type, sizeof(ee), &ee); msg->msg_flags |= MSG_ERRQUEUE; err = copied; From 806a0d964e27681dd8b82370382484bddf60474c Mon Sep 17 00:00:00 2001 From: Di Zhu Date: Wed, 24 Dec 2025 09:22:24 +0800 Subject: [PATCH 42/74] netdev: preserve NETIF_F_ALL_FOR_ALL across TSO updates [ Upstream commit 02d1e1a3f9239cdb3ecf2c6d365fb959d1bf39df ] Directly increment the TSO features incurs a side effect: it will also directly clear the flags in NETIF_F_ALL_FOR_ALL on the master device, which can cause issues such as the inability to enable the nocache copy feature on the bonding driver. The fix is to include NETIF_F_ALL_FOR_ALL in the update mask, thereby preventing it from being cleared. Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master") Signed-off-by: Di Zhu Link: https://patch.msgid.link/20251224012224.56185-1-zhud@hygon.cn Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- include/linux/netdevice.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f44701b82ea8..1c47ab59a2c7 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4951,7 +4951,8 @@ netdev_features_t netdev_increment_features(netdev_features_t all, static inline netdev_features_t netdev_add_tso_features(netdev_features_t features, netdev_features_t mask) { - return netdev_increment_features(features, NETIF_F_ALL_TSO, mask); + return netdev_increment_features(features, NETIF_F_ALL_TSO | + NETIF_F_ALL_FOR_ALL, mask); } int __netdev_update_features(struct net_device *dev); From 98f4d7e2c6dd2f27d7798ca68e418cb27b3dac6e Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Thu, 25 Dec 2025 15:27:16 +0200 Subject: [PATCH 43/74] net/mlx5e: Don't print error message due to invalid module [ Upstream commit 144297e2a24e3e54aee1180ec21120ea38822b97 ] Dumping module EEPROM on newer modules is supported through the netlink interface only. Querying with old userspace ethtool (or other tools, such as 'lshw') which still uses the ioctl interface results in an error message that could flood dmesg (in addition to the expected error return value). The original message was added under the assumption that the driver should be able to handle all module types, but now that such flows are easily triggered from userspace, it doesn't serve its purpose. Change the log level of the print in mlx5_query_module_eeprom() to debug. Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM") Signed-off-by: Gal Pressman Reviewed-by: Tariq Toukan Signed-off-by: Mark Bloch Link: https://patch.msgid.link/20251225132717.358820-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/mellanox/mlx5/core/port.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index a1548e6bfb35..28ec31722ec2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -432,7 +432,8 @@ int mlx5_query_module_eeprom(struct mlx5_core_dev *dev, mlx5_qsfp_eeprom_params_set(&query.i2c_address, &query.page, &offset); break; default: - mlx5_core_err(dev, "Module ID not recognized: 0x%x\n", module_id); + mlx5_core_dbg(dev, "Module ID not recognized: 0x%x\n", + module_id); return -EINVAL; } From f64c1623bb080c08d3342f38b3e7a365f8fa6ffd Mon Sep 17 00:00:00 2001 From: Zilin Guan Date: Tue, 30 Dec 2025 07:18:53 +0000 Subject: [PATCH 44/74] net: wwan: iosm: Fix memory leak in ipc_mux_deinit() [ Upstream commit 92e6e0a87f6860a4710f9494f8c704d498ae60f8 ] Commit 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support") allocated memory for pp_qlt in ipc_mux_init() but did not free it in ipc_mux_deinit(). This results in a memory leak when the driver is unloaded. Free the allocated memory in ipc_mux_deinit() to fix the leak. Fixes: 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support") Co-developed-by: Jianhao Xu Signed-off-by: Jianhao Xu Signed-off-by: Zilin Guan Reviewed-by: Loic Poulain Link: https://patch.msgid.link/20251230071853.1062223-1-zilin@seu.edu.cn Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/wwan/iosm/iosm_ipc_mux.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/wwan/iosm/iosm_ipc_mux.c b/drivers/net/wwan/iosm/iosm_ipc_mux.c index fc928b298a98..b846889fcb09 100644 --- a/drivers/net/wwan/iosm/iosm_ipc_mux.c +++ b/drivers/net/wwan/iosm/iosm_ipc_mux.c @@ -456,6 +456,7 @@ void ipc_mux_deinit(struct iosm_mux *ipc_mux) struct sk_buff_head *free_list; union mux_msg mux_msg; struct sk_buff *skb; + int i; if (!ipc_mux->initialized) return; @@ -479,5 +480,10 @@ void ipc_mux_deinit(struct iosm_mux *ipc_mux) ipc_mux->channel->dl_pipe.is_open = false; } + if (ipc_mux->protocol != MUX_LITE) { + for (i = 0; i < IPC_MEM_MUX_IP_SESSION_ENTRIES; i++) + kfree(ipc_mux->ul_adb.pp_qlt[i]); + } + kfree(ipc_mux); } From 134f7bcda274e0bf901aa3bba1307e0c5e79a422 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 19 Jul 2023 18:04:38 -0700 Subject: [PATCH 45/74] eth: bnxt: move and rename reset helpers [ Upstream commit fea2993aecd74d5d11ede1ebbd60e478ebfed996 ] Move the reset helpers, subsequent patches will need some of them on the Tx path. While at it rename bnxt_sched_reset(), on more recent chips it schedules a queue reset, instead of a fuller reset. Link: https://lore.kernel.org/r/20230720010440.1967136-2-kuba@kernel.org Reviewed-by: Michael Chan Signed-off-by: Jakub Kicinski Stable-dep-of: ffeafa65b2b2 ("bnxt_en: Fix potential data corruption with HW GRO/LRO") Signed-off-by: Sasha Levin --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 72 +++++++++++------------ 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 6b1245a3ab4b..2540402030bf 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -293,6 +293,38 @@ static void bnxt_db_cq(struct bnxt *bp, struct bnxt_db_info *db, u32 idx) BNXT_DB_CQ(db, idx); } +static void bnxt_queue_fw_reset_work(struct bnxt *bp, unsigned long delay) +{ + if (!(test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))) + return; + + if (BNXT_PF(bp)) + queue_delayed_work(bnxt_pf_wq, &bp->fw_reset_task, delay); + else + schedule_delayed_work(&bp->fw_reset_task, delay); +} + +static void bnxt_queue_sp_work(struct bnxt *bp) +{ + if (BNXT_PF(bp)) + queue_work(bnxt_pf_wq, &bp->sp_task); + else + schedule_work(&bp->sp_task); +} + +static void bnxt_sched_reset_rxr(struct bnxt *bp, struct bnxt_rx_ring_info *rxr) +{ + if (!rxr->bnapi->in_reset) { + rxr->bnapi->in_reset = true; + if (bp->flags & BNXT_FLAG_CHIP_P5) + set_bit(BNXT_RESET_TASK_SP_EVENT, &bp->sp_event); + else + set_bit(BNXT_RST_RING_SP_EVENT, &bp->sp_event); + bnxt_queue_sp_work(bp); + } + rxr->rx_next_cons = 0xffff; +} + const u16 bnxt_lhint_arr[] = { TX_BD_FLAGS_LHINT_512_AND_SMALLER, TX_BD_FLAGS_LHINT_512_TO_1023, @@ -1269,38 +1301,6 @@ static int bnxt_discard_rx(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, return 0; } -static void bnxt_queue_fw_reset_work(struct bnxt *bp, unsigned long delay) -{ - if (!(test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))) - return; - - if (BNXT_PF(bp)) - queue_delayed_work(bnxt_pf_wq, &bp->fw_reset_task, delay); - else - schedule_delayed_work(&bp->fw_reset_task, delay); -} - -static void bnxt_queue_sp_work(struct bnxt *bp) -{ - if (BNXT_PF(bp)) - queue_work(bnxt_pf_wq, &bp->sp_task); - else - schedule_work(&bp->sp_task); -} - -static void bnxt_sched_reset(struct bnxt *bp, struct bnxt_rx_ring_info *rxr) -{ - if (!rxr->bnapi->in_reset) { - rxr->bnapi->in_reset = true; - if (bp->flags & BNXT_FLAG_CHIP_P5) - set_bit(BNXT_RESET_TASK_SP_EVENT, &bp->sp_event); - else - set_bit(BNXT_RST_RING_SP_EVENT, &bp->sp_event); - bnxt_queue_sp_work(bp); - } - rxr->rx_next_cons = 0xffff; -} - static u16 bnxt_alloc_agg_idx(struct bnxt_rx_ring_info *rxr, u16 agg_id) { struct bnxt_tpa_idx_map *map = rxr->rx_tpa_idx_map; @@ -1355,7 +1355,7 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, netdev_warn(bp->dev, "TPA cons %x, expected cons %x, error code %x\n", cons, rxr->rx_next_cons, TPA_START_ERROR_CODE(tpa_start1)); - bnxt_sched_reset(bp, rxr); + bnxt_sched_reset_rxr(bp, rxr); return; } /* Store cfa_code in tpa_info to use in tpa_end @@ -1895,7 +1895,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, if (rxr->rx_next_cons != 0xffff) netdev_warn(bp->dev, "RX cons %x != expected cons %x\n", cons, rxr->rx_next_cons); - bnxt_sched_reset(bp, rxr); + bnxt_sched_reset_rxr(bp, rxr); if (rc1) return rc1; goto next_rx_no_prod_no_len; @@ -1933,7 +1933,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, !(bp->fw_cap & BNXT_FW_CAP_RING_MONITOR)) { netdev_warn_once(bp->dev, "RX buffer error %x\n", rx_err); - bnxt_sched_reset(bp, rxr); + bnxt_sched_reset_rxr(bp, rxr); } } goto next_rx_no_len; @@ -2371,7 +2371,7 @@ static int bnxt_async_event_process(struct bnxt *bp, goto async_event_process_exit; } rxr = bp->bnapi[grp_idx]->rx_ring; - bnxt_sched_reset(bp, rxr); + bnxt_sched_reset_rxr(bp, rxr); goto async_event_process_exit; } case ASYNC_EVENT_CMPL_EVENT_ID_ECHO_REQUEST: { From 43297ed063541485e9dedcda930c181a4fae8a22 Mon Sep 17 00:00:00 2001 From: Srijit Bose Date: Wed, 31 Dec 2025 00:36:25 -0800 Subject: [PATCH 46/74] bnxt_en: Fix potential data corruption with HW GRO/LRO [ Upstream commit ffeafa65b2b26df2f5b5a6118d3174f17bd12ec5 ] Fix the max number of bits passed to find_first_zero_bit() in bnxt_alloc_agg_idx(). We were incorrectly passing the number of long words. find_first_zero_bit() may fail to find a zero bit and cause a wrong ID to be used. If the wrong ID is already in use, this can cause data corruption. Sometimes an error like this can also be seen: bnxt_en 0000:83:00.0 enp131s0np0: TPA end agg_buf 2 != expected agg_bufs 1 Fix it by passing the correct number of bits MAX_TPA_P5. Use DECLARE_BITMAP() to more cleanly define the bitmap. Add a sanity check to warn if a bit cannot be found and reset the ring [MChan]. Fixes: ec4d8e7cf024 ("bnxt_en: Add TPA ID mapping logic for 57500 chips.") Reviewed-by: Ray Jui Signed-off-by: Srijit Bose Signed-off-by: Michael Chan Reviewed-by: Vadim Fedorenko Link: https://patch.msgid.link/20251231083625.3911652-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 15 ++++++++++++--- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 +--- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 2540402030bf..a70870393b65 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1306,9 +1306,11 @@ static u16 bnxt_alloc_agg_idx(struct bnxt_rx_ring_info *rxr, u16 agg_id) struct bnxt_tpa_idx_map *map = rxr->rx_tpa_idx_map; u16 idx = agg_id & MAX_TPA_P5_MASK; - if (test_bit(idx, map->agg_idx_bmap)) - idx = find_first_zero_bit(map->agg_idx_bmap, - BNXT_AGG_IDX_BMAP_SIZE); + if (test_bit(idx, map->agg_idx_bmap)) { + idx = find_first_zero_bit(map->agg_idx_bmap, MAX_TPA_P5); + if (idx >= MAX_TPA_P5) + return INVALID_HW_RING_ID; + } __set_bit(idx, map->agg_idx_bmap); map->agg_id_tbl[agg_id] = idx; return idx; @@ -1341,6 +1343,13 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, if (bp->flags & BNXT_FLAG_CHIP_P5) { agg_id = TPA_START_AGG_ID_P5(tpa_start); agg_id = bnxt_alloc_agg_idx(rxr, agg_id); + if (unlikely(agg_id == INVALID_HW_RING_ID)) { + netdev_warn(bp->dev, "Unable to allocate agg ID for ring %d, agg 0x%x\n", + rxr->bnapi->index, + TPA_START_AGG_ID_P5(tpa_start)); + bnxt_sched_reset_rxr(bp, rxr); + return; + } } else { agg_id = TPA_START_AGG_ID(tpa_start); } diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 111098b4b606..4d27636aa200 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -897,11 +897,9 @@ struct bnxt_tpa_info { struct rx_agg_cmp *agg_arr; }; -#define BNXT_AGG_IDX_BMAP_SIZE (MAX_TPA_P5 / BITS_PER_LONG) - struct bnxt_tpa_idx_map { u16 agg_id_tbl[1024]; - unsigned long agg_idx_bmap[BNXT_AGG_IDX_BMAP_SIZE]; + DECLARE_BITMAP(agg_idx_bmap, MAX_TPA_P5); }; struct bnxt_rx_ring_info { From 0b27828ebd1ed3107d7929c3737adbe862e99e74 Mon Sep 17 00:00:00 2001 From: Mohammad Heib Date: Sun, 4 Jan 2026 23:31:01 +0200 Subject: [PATCH 47/74] net: fix memory leak in skb_segment_list for GRO packets [ Upstream commit 238e03d0466239410b72294b79494e43d4fabe77 ] When skb_segment_list() is called during packet forwarding, it handles packets that were aggregated by the GRO engine. Historically, the segmentation logic in skb_segment_list assumes that individual segments are split from a parent SKB and may need to carry their own socket memory accounting. Accordingly, the code transfers truesize from the parent to the newly created segments. Prior to commit ed4cccef64c1 ("gro: fix ownership transfer"), this truesize subtraction in skb_segment_list() was valid because fragments still carry a reference to the original socket. However, commit ed4cccef64c1 ("gro: fix ownership transfer") changed this behavior by ensuring that fraglist entries are explicitly orphaned (skb->sk = NULL) to prevent illegal orphaning later in the stack. This change meant that the entire socket memory charge remained with the head SKB, but the corresponding accounting logic in skb_segment_list() was never updated. As a result, the current code unconditionally adds each fragment's truesize to delta_truesize and subtracts it from the parent SKB. Since the fragments are no longer charged to the socket, this subtraction results in an effective under-count of memory when the head is freed. This causes sk_wmem_alloc to remain non-zero, preventing socket destruction and leading to a persistent memory leak. The leak can be observed via KMEMLEAK when tearing down the networking environment: unreferenced object 0xffff8881e6eb9100 (size 2048): comm "ping", pid 6720, jiffies 4295492526 backtrace: kmem_cache_alloc_noprof+0x5c6/0x800 sk_prot_alloc+0x5b/0x220 sk_alloc+0x35/0xa00 inet6_create.part.0+0x303/0x10d0 __sock_create+0x248/0x640 __sys_socket+0x11b/0x1d0 Since skb_segment_list() is exclusively used for SKB_GSO_FRAGLIST packets constructed by GRO, the truesize adjustment is removed. The call to skb_release_head_state() must be preserved. As documented in commit cf673ed0e057 ("net: fix fraglist segmentation reference count leak"), it is still required to correctly drop references to SKB extensions that may be overwritten during __copy_skb_header(). Fixes: ed4cccef64c1 ("gro: fix ownership transfer") Signed-off-by: Mohammad Heib Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20260104213101.352887-1-mheib@redhat.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/core/skbuff.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index d8a3ada886ff..ef24911af05a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4045,12 +4045,14 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, { struct sk_buff *list_skb = skb_shinfo(skb)->frag_list; unsigned int tnl_hlen = skb_tnl_header_len(skb); - unsigned int delta_truesize = 0; unsigned int delta_len = 0; struct sk_buff *tail = NULL; struct sk_buff *nskb, *tmp; int len_diff, err; + /* Only skb_gro_receive_list generated skbs arrive here */ + DEBUG_NET_WARN_ON_ONCE(!(skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST)); + skb_push(skb, -skb_network_offset(skb) + offset); /* Ensure the head is writeable before touching the shared info */ @@ -4064,8 +4066,9 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, nskb = list_skb; list_skb = list_skb->next; + DEBUG_NET_WARN_ON_ONCE(nskb->sk); + err = 0; - delta_truesize += nskb->truesize; if (skb_shared(nskb)) { tmp = skb_clone(nskb, GFP_ATOMIC); if (tmp) { @@ -4108,7 +4111,6 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, goto err_linearize; } - skb->truesize = skb->truesize - delta_truesize; skb->data_len = skb->data_len - delta_len; skb->len = skb->len - delta_len; From 8a029d657becb7eb089e15a97695a1cd537d7aa2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Rebe?= Date: Fri, 28 Nov 2025 13:46:41 +0100 Subject: [PATCH 48/74] HID: quirks: work around VID/PID conflict for appledisplay MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit c7fabe4ad9219866c203164a214c474c95b36bf2 ] For years I wondered why the Apple Cinema Display driver would not just work for me. Turns out the hidraw driver instantly takes it over. Fix by adding appledisplay VID/PIDs to hid_have_special_driver. Fixes: 069e8a65cd79 ("Driver for Apple Cinema Display") Signed-off-by: René Rebe Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin --- drivers/hid/hid-quirks.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c index 249e626d1c6a..b6bec3614cfe 100644 --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -220,6 +220,15 @@ static const struct hid_device_id hid_quirks[] = { * used as a driver. See hid_scan_report(). */ static const struct hid_device_id hid_have_special_driver[] = { +#if IS_ENABLED(CONFIG_APPLEDISPLAY) + { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, 0x9218) }, + { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, 0x9219) }, + { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, 0x921c) }, + { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, 0x921d) }, + { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, 0x9222) }, + { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, 0x9226) }, + { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, 0x9236) }, +#endif #if IS_ENABLED(CONFIG_HID_A4TECH) { HID_USB_DEVICE(USB_VENDOR_ID_A4TECH, USB_DEVICE_ID_A4TECH_WCP32PU) }, { HID_USB_DEVICE(USB_VENDOR_ID_A4TECH, USB_DEVICE_ID_A4TECH_X5_005D) }, From cdb24200b043438a144df501f1ebbd926bb1a2c7 Mon Sep 17 00:00:00 2001 From: Xiang Mei Date: Mon, 5 Jan 2026 20:41:00 -0700 Subject: [PATCH 49/74] net/sched: sch_qfq: Fix NULL deref when deactivating inactive aggregate in qfq_reset [ Upstream commit c1d73b1480235731e35c81df70b08f4714a7d095 ] `qfq_class->leaf_qdisc->q.qlen > 0` does not imply that the class itself is active. Two qfq_class objects may point to the same leaf_qdisc. This happens when: 1. one QFQ qdisc is attached to the dev as the root qdisc, and 2. another QFQ qdisc is temporarily referenced (e.g., via qdisc_get() / qdisc_put()) and is pending to be destroyed, as in function tc_new_tfilter. When packets are enqueued through the root QFQ qdisc, the shared leaf_qdisc->q.qlen increases. At the same time, the second QFQ qdisc triggers qdisc_put and qdisc_destroy: the qdisc enters qfq_reset() with its own q->q.qlen == 0, but its class's leaf qdisc->q.qlen > 0. Therefore, the qfq_reset would wrongly deactivate an inactive aggregate and trigger a null-deref in qfq_deactivate_agg: [ 0.903172] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 0.903571] #PF: supervisor write access in kernel mode [ 0.903860] #PF: error_code(0x0002) - not-present page [ 0.904177] PGD 10299b067 P4D 10299b067 PUD 10299c067 PMD 0 [ 0.904502] Oops: Oops: 0002 [#1] SMP NOPTI [ 0.904737] CPU: 0 UID: 0 PID: 135 Comm: exploit Not tainted 6.19.0-rc3+ #2 NONE [ 0.905157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 [ 0.905754] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2)) [ 0.906046] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0 Code starting with the faulting instruction =========================================== 0: 0f 84 4d 01 00 00 je 0x153 6: 48 89 70 18 mov %rsi,0x18(%rax) a: 8b 4b 10 mov 0x10(%rbx),%ecx d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 14: 48 8b 78 08 mov 0x8(%rax),%rdi 18: 48 d3 e2 shl %cl,%rdx 1b: 48 21 f2 and %rsi,%rdx 1e: 48 2b 13 sub (%rbx),%rdx 21: 48 8b 30 mov (%rax),%rsi 24: 48 d3 ea shr %cl,%rdx 27: 8b 4b 18 mov 0x18(%rbx),%ecx ... [ 0.907095] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246 [ 0.907368] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000 [ 0.907723] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.908100] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000 [ 0.908451] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000 [ 0.908804] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880 [ 0.909179] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000 [ 0.909572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.909857] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0 [ 0.910247] PKRU: 55555554 [ 0.910391] Call Trace: [ 0.910527] [ 0.910638] qfq_reset_qdisc (net/sched/sch_qfq.c:357 net/sched/sch_qfq.c:1485) [ 0.910826] qdisc_reset (include/linux/skbuff.h:2195 include/linux/skbuff.h:2501 include/linux/skbuff.h:3424 include/linux/skbuff.h:3430 net/sched/sch_generic.c:1036) [ 0.911040] __qdisc_destroy (net/sched/sch_generic.c:1076) [ 0.911236] tc_new_tfilter (net/sched/cls_api.c:2447) [ 0.911447] rtnetlink_rcv_msg (net/core/rtnetlink.c:6958) [ 0.911663] ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6861) [ 0.911894] netlink_rcv_skb (net/netlink/af_netlink.c:2550) [ 0.912100] netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) [ 0.912296] ? __alloc_skb (net/core/skbuff.c:706) [ 0.912484] netlink_sendmsg (net/netlink/af_netlink.c:1894) [ 0.912682] sock_write_iter (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1) net/socket.c:1195 (discriminator 1)) [ 0.912880] vfs_write (fs/read_write.c:593 fs/read_write.c:686) [ 0.913077] ksys_write (fs/read_write.c:738) [ 0.913252] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) [ 0.913438] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131) [ 0.913687] RIP: 0033:0x424c34 [ 0.913844] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 2d 44 09 00 00 74 13 b8 01 00 00 00 0f 05 9 Code starting with the faulting instruction =========================================== 0: 89 02 mov %eax,(%rdx) 2: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax 9: eb bd jmp 0xffffffffffffffc8 b: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 12: 00 00 00 15: 90 nop 16: f3 0f 1e fa endbr64 1a: 80 3d 2d 44 09 00 00 cmpb $0x0,0x9442d(%rip) # 0x9444e 21: 74 13 je 0x36 23: b8 01 00 00 00 mov $0x1,%eax 28: 0f 05 syscall 2a: 09 .byte 0x9 [ 0.914807] RSP: 002b:00007ffea1938b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 0.915197] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000424c34 [ 0.915556] RDX: 000000000000003c RSI: 000000002af378c0 RDI: 0000000000000003 [ 0.915912] RBP: 00007ffea1938bc0 R08: 00000000004b8820 R09: 0000000000000000 [ 0.916297] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffea1938d28 [ 0.916652] R13: 00007ffea1938d38 R14: 00000000004b3828 R15: 0000000000000001 [ 0.917039] [ 0.917158] Modules linked in: [ 0.917316] CR2: 0000000000000000 [ 0.917484] ---[ end trace 0000000000000000 ]--- [ 0.917717] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2)) [ 0.917978] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0 Code starting with the faulting instruction =========================================== 0: 0f 84 4d 01 00 00 je 0x153 6: 48 89 70 18 mov %rsi,0x18(%rax) a: 8b 4b 10 mov 0x10(%rbx),%ecx d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 14: 48 8b 78 08 mov 0x8(%rax),%rdi 18: 48 d3 e2 shl %cl,%rdx 1b: 48 21 f2 and %rsi,%rdx 1e: 48 2b 13 sub (%rbx),%rdx 21: 48 8b 30 mov (%rax),%rsi 24: 48 d3 ea shr %cl,%rdx 27: 8b 4b 18 mov 0x18(%rbx),%ecx ... [ 0.918902] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246 [ 0.919198] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000 [ 0.919559] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.919908] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000 [ 0.920289] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000 [ 0.920648] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880 [ 0.921014] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000 [ 0.921424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.921710] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0 [ 0.922097] PKRU: 55555554 [ 0.922240] Kernel panic - not syncing: Fatal exception [ 0.922590] Kernel Offset: disabled Fixes: 0545a3037773 ("pkt_sched: QFQ - quick fair queue scheduler") Signed-off-by: Xiang Mei Link: https://patch.msgid.link/20260106034100.1780779-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/sched/sch_qfq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index 896ff7c74111..80a7173843b9 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -1483,7 +1483,7 @@ static void qfq_reset_qdisc(struct Qdisc *sch) for (i = 0; i < q->clhash.hashsize; i++) { hlist_for_each_entry(cl, &q->clhash.hash[i], common.hnode) { - if (cl->qdisc->q.qlen > 0) + if (cl_is_active(cl)) qfq_deactivate_class(q, cl); qdisc_reset(cl->qdisc); From ac5d92d2826dec51e5d4c6854865bc5817277452 Mon Sep 17 00:00:00 2001 From: Petko Manolov Date: Tue, 6 Jan 2026 10:48:21 +0200 Subject: [PATCH 50/74] net: usb: pegasus: fix memory leak in update_eth_regs_async() [ Upstream commit afa27621a28af317523e0836dad430bec551eb54 ] When asynchronously writing to the device registers and if usb_submit_urb() fail, the code fail to release allocated to this point resources. Fixes: 323b34963d11 ("drivers: net: usb: pegasus: fix control urb submission") Signed-off-by: Petko Manolov Link: https://patch.msgid.link/20260106084821.3746677-1-petko.manolov@konsulko.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/usb/pegasus.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c index 81ca64debc5b..c514483134f0 100644 --- a/drivers/net/usb/pegasus.c +++ b/drivers/net/usb/pegasus.c @@ -168,6 +168,8 @@ static int update_eth_regs_async(pegasus_t *pegasus) netif_device_detach(pegasus->net); netif_err(pegasus, drv, pegasus->net, "%s returned %d\n", __func__, ret); + usb_free_urb(async_urb); + kfree(req); } return ret; } From e2f0ceee265df7741190bd7ff8c480b06278eb7f Mon Sep 17 00:00:00 2001 From: Wei Fang Date: Wed, 7 Jan 2026 17:12:04 +0800 Subject: [PATCH 51/74] net: enetc: fix build warning when PAGE_SIZE is greater than 128K [ Upstream commit 4b5bdabb5449b652122e43f507f73789041d4abe ] The max buffer size of ENETC RX BD is 0xFFFF bytes, so if the PAGE_SIZE is greater than 128K, ENETC_RXB_DMA_SIZE and ENETC_RXB_DMA_SIZE_XDP will be greater than 0xFFFF, thus causing a build warning. This will not cause any practical issues because ENETC is currently only used on the ARM64 platform, and the max PAGE_SIZE is 64K. So this patch is only for fixing the build warning that occurs when compiling ENETC drivers for other platforms. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202601050637.kHEKKOG7-lkp@intel.com/ Fixes: e59bc32df2e9 ("net: enetc: correct the value of ENETC_RXB_TRUESIZE") Signed-off-by: Wei Fang Reviewed-by: Frank Li Link: https://patch.msgid.link/20260107091204.1980222-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/freescale/enetc/enetc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h index aacdfe98b65a..d2ad5be02a0d 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.h +++ b/drivers/net/ethernet/freescale/enetc/enetc.h @@ -43,9 +43,9 @@ struct enetc_tx_swbd { #define ENETC_RXB_TRUESIZE (PAGE_SIZE >> 1) #define ENETC_RXB_PAD NET_SKB_PAD /* add extra space if needed */ #define ENETC_RXB_DMA_SIZE \ - (SKB_WITH_OVERHEAD(ENETC_RXB_TRUESIZE) - ENETC_RXB_PAD) + min(SKB_WITH_OVERHEAD(ENETC_RXB_TRUESIZE) - ENETC_RXB_PAD, 0xffff) #define ENETC_RXB_DMA_SIZE_XDP \ - (SKB_WITH_OVERHEAD(ENETC_RXB_TRUESIZE) - XDP_PACKET_HEADROOM) + min(SKB_WITH_OVERHEAD(ENETC_RXB_TRUESIZE) - XDP_PACKET_HEADROOM, 0xffff) struct enetc_rx_swbd { dma_addr_t dma; From 70bddc16491ef4681f3569b3a2c80309a3edcdd1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 7 Jan 2026 21:22:50 +0000 Subject: [PATCH 52/74] arp: do not assume dev_hard_header() does not change skb->head [ Upstream commit c92510f5e3f82ba11c95991824a41e59a9c5ed81 ] arp_create() is the only dev_hard_header() caller making assumption about skb->head being unchanged. A recent commit broke this assumption. Initialize @arp pointer after dev_hard_header() call. Fixes: db5b4e39c4e6 ("ip6_gre: make ip6gre_header() robust") Reported-by: syzbot+58b44a770a1585795351@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet Link: https://patch.msgid.link/20260107212250.384552-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv4/arp.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 50e2b4939d8e..fc8c7a34b53e 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -563,7 +563,7 @@ struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, skb_reserve(skb, hlen); skb_reset_network_header(skb); - arp = skb_put(skb, arp_hdr_len(dev)); + skb_put(skb, arp_hdr_len(dev)); skb->dev = dev; skb->protocol = htons(ETH_P_ARP); if (!src_hw) @@ -571,12 +571,13 @@ struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, if (!dest_hw) dest_hw = dev->broadcast; - /* - * Fill the device header for the ARP frame + /* Fill the device header for the ARP frame. + * Note: skb->head can be changed. */ if (dev_hard_header(skb, dev, ptype, dest_hw, src_hw, skb->len) < 0) goto out; + arp = arp_hdr(skb); /* * Fill out the arp protocol part. * From 5d27af3d5c6e3b5e6e70867467a78c4b3052b6ef Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Wed, 26 Nov 2025 13:22:19 +0100 Subject: [PATCH 53/74] pinctrl: qcom: lpass-lpi: mark the GPIO controller as sleeping commit ebc18e9854e5a2b62a041fb57b216a903af45b85 upstream. The gpio_chip settings in this driver say the controller can't sleep but it actually uses a mutex for synchronization. This triggers the following BUG(): [ 9.233659] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [ 9.233665] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 554, name: (udev-worker) [ 9.233669] preempt_count: 1, expected: 0 [ 9.233673] RCU nest depth: 0, expected: 0 [ 9.233688] Tainted: [W]=WARN [ 9.233690] Hardware name: Dell Inc. Latitude 7455/0FK7MX, BIOS 2.10.1 05/20/2025 [ 9.233694] Call trace: [ 9.233696] show_stack+0x24/0x38 (C) [ 9.233709] dump_stack_lvl+0x40/0x88 [ 9.233716] dump_stack+0x18/0x24 [ 9.233722] __might_resched+0x148/0x160 [ 9.233731] __might_sleep+0x38/0x98 [ 9.233736] mutex_lock+0x30/0xd8 [ 9.233749] lpi_config_set+0x2e8/0x3c8 [pinctrl_lpass_lpi] [ 9.233757] lpi_gpio_direction_output+0x58/0x90 [pinctrl_lpass_lpi] [ 9.233761] gpiod_direction_output_raw_commit+0x110/0x428 [ 9.233772] gpiod_direction_output_nonotify+0x234/0x358 [ 9.233779] gpiod_direction_output+0x38/0xd0 [ 9.233786] gpio_shared_proxy_direction_output+0xb8/0x2a8 [gpio_shared_proxy] [ 9.233792] gpiod_direction_output_raw_commit+0x110/0x428 [ 9.233799] gpiod_direction_output_nonotify+0x234/0x358 [ 9.233806] gpiod_configure_flags+0x2c0/0x580 [ 9.233812] gpiod_find_and_request+0x358/0x4f8 [ 9.233819] gpiod_get_index+0x7c/0x98 [ 9.233826] devm_gpiod_get+0x34/0xb0 [ 9.233829] reset_gpio_probe+0x58/0x128 [reset_gpio] [ 9.233836] auxiliary_bus_probe+0xb0/0xf0 [ 9.233845] really_probe+0x14c/0x450 [ 9.233853] __driver_probe_device+0xb0/0x188 [ 9.233858] driver_probe_device+0x4c/0x250 [ 9.233863] __driver_attach+0xf8/0x2a0 [ 9.233868] bus_for_each_dev+0xf8/0x158 [ 9.233872] driver_attach+0x30/0x48 [ 9.233876] bus_add_driver+0x158/0x2b8 [ 9.233880] driver_register+0x74/0x118 [ 9.233886] __auxiliary_driver_register+0x94/0xe8 [ 9.233893] init_module+0x34/0xfd0 [reset_gpio] [ 9.233898] do_one_initcall+0xec/0x300 [ 9.233903] do_init_module+0x64/0x260 [ 9.233910] load_module+0x16c4/0x1900 [ 9.233915] __arm64_sys_finit_module+0x24c/0x378 [ 9.233919] invoke_syscall+0x4c/0xe8 [ 9.233925] el0_svc_common+0x8c/0xf0 [ 9.233929] do_el0_svc+0x28/0x40 [ 9.233934] el0_svc+0x38/0x100 [ 9.233938] el0t_64_sync_handler+0x84/0x130 [ 9.233943] el0t_64_sync+0x17c/0x180 Mark the controller as sleeping. Fixes: 6e261d1090d6 ("pinctrl: qcom: Add sm8250 lpass lpi pinctrl driver") Cc: stable@vger.kernel.org Reported-by: Val Packett Closes: https://lore.kernel.org/all/98c0f185-b0e0-49ea-896c-f3972dd011ca@packett.cool/ Signed-off-by: Bartosz Golaszewski Reviewed-by: Dmitry Baryshkov Reviewed-by: Bjorn Andersson Signed-off-by: Linus Walleij Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/qcom/pinctrl-lpass-lpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pinctrl/qcom/pinctrl-lpass-lpi.c b/drivers/pinctrl/qcom/pinctrl-lpass-lpi.c index bfcc5c45b8fa..35f46ca4cf9f 100644 --- a/drivers/pinctrl/qcom/pinctrl-lpass-lpi.c +++ b/drivers/pinctrl/qcom/pinctrl-lpass-lpi.c @@ -435,7 +435,7 @@ int lpi_pinctrl_probe(struct platform_device *pdev) pctrl->chip.ngpio = data->npins; pctrl->chip.label = dev_name(dev); pctrl->chip.of_gpio_n_cells = 2; - pctrl->chip.can_sleep = false; + pctrl->chip.can_sleep = true; mutex_init(&pctrl->lock); From 3d6219f2380b897d04a07e8c9e17957e919643cc Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 13 Jan 2026 10:01:55 -0300 Subject: [PATCH 54/74] mm/pagewalk: add walk_page_range_vma() [ Upstream commit e07cda5f232fac4de0925d8a4c92e51e41fa2f6e ] Let's add walk_page_range_vma(), which is similar to walk_page_vma(), however, is only interested in a subset of the VMA range. To be used in KSM code to stop using follow_page() next. Link: https://lkml.kernel.org/r/20221021101141.84170-8-david@redhat.com Signed-off-by: David Hildenbrand Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Jason Gunthorpe Cc: John Hubbard Cc: Matthew Wilcox (Oracle) Cc: Peter Xu Cc: Shuah Khan Cc: Vlastimil Babka Signed-off-by: Andrew Morton Stable-dep-of: f5548c318d6 ("ksm: use range-walk function to jump over holes in scan_get_next_rmap_item") Signed-off-by: Pedro Demarchi Gomes Signed-off-by: Greg Kroah-Hartman --- include/linux/pagewalk.h | 3 +++ mm/pagewalk.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index f3fafb731ffd..2f8f6cc980b4 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -99,6 +99,9 @@ int walk_page_range_novma(struct mm_struct *mm, unsigned long start, unsigned long end, const struct mm_walk_ops *ops, pgd_t *pgd, void *private); +int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start, + unsigned long end, const struct mm_walk_ops *ops, + void *private); int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops, void *private); int walk_page_mapping(struct address_space *mapping, pgoff_t first_index, diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 2ff3a5bebceb..5295f6d1a4fa 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -517,6 +517,26 @@ int walk_page_range_novma(struct mm_struct *mm, unsigned long start, return walk_pgd_range(start, end, &walk); } +int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start, + unsigned long end, const struct mm_walk_ops *ops, + void *private) +{ + struct mm_walk walk = { + .ops = ops, + .mm = vma->vm_mm, + .vma = vma, + .private = private, + }; + + if (start >= end || !walk.mm) + return -EINVAL; + if (start < vma->vm_start || end > vma->vm_end) + return -EINVAL; + + mmap_assert_locked(walk.mm); + return __walk_page_range(start, end, &walk); +} + int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops, void *private) { From 67137b715b7db28d82e4ed07a7092c2fa6ba7adb Mon Sep 17 00:00:00 2001 From: Pedro Demarchi Gomes Date: Tue, 13 Jan 2026 10:01:56 -0300 Subject: [PATCH 55/74] ksm: use range-walk function to jump over holes in scan_get_next_rmap_item [ Upstream commit f5548c318d6520d4fa3c5ed6003eeb710763cbc5 ] Currently, scan_get_next_rmap_item() walks every page address in a VMA to locate mergeable pages. This becomes highly inefficient when scanning large virtual memory areas that contain mostly unmapped regions, causing ksmd to use large amount of cpu without deduplicating much pages. This patch replaces the per-address lookup with a range walk using walk_page_range(). The range walker allows KSM to skip over entire unmapped holes in a VMA, avoiding unnecessary lookups. This problem was previously discussed in [1]. Consider the following test program which creates a 32 TiB mapping in the virtual address space but only populates a single page: /* 32 TiB */ const size_t size = 32ul * 1024 * 1024 * 1024 * 1024; int main() { char *area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0); if (area == MAP_FAILED) { perror("mmap() failed\n"); return -1; } /* Populate a single page such that we get an anon_vma. */ *area = 0; /* Enable KSM. */ madvise(area, size, MADV_MERGEABLE); pause(); return 0; } $ ./ksm-sparse & $ echo 1 > /sys/kernel/mm/ksm/run Without this patch ksmd uses 100% of the cpu for a long time (more then 1 hour in my test machine) scanning all the 32 TiB virtual address space that contain only one mapped page. This makes ksmd essentially deadlocked not able to deduplicate anything of value. With this patch ksmd walks only the one mapped page and skips the rest of the 32 TiB virtual address space, making the scan fast using little cpu. Link: https://lkml.kernel.org/r/20251023035841.41406-1-pedrodemargomes@gmail.com Link: https://lkml.kernel.org/r/20251022153059.22763-1-pedrodemargomes@gmail.com Link: https://lore.kernel.org/linux-mm/423de7a3-1c62-4e72-8e79-19a6413e420c@redhat.com/ [1] Fixes: 31dbd01f3143 ("ksm: Kernel SamePage Merging") Signed-off-by: Pedro Demarchi Gomes Co-developed-by: David Hildenbrand Signed-off-by: David Hildenbrand Reported-by: craftfever Closes: https://lkml.kernel.org/r/020cf8de6e773bb78ba7614ef250129f11a63781@murena.io Suggested-by: David Hildenbrand Acked-by: David Hildenbrand Cc: Chengming Zhou Cc: xu xin Cc: Signed-off-by: Andrew Morton [ replace pmdp_get_lockless with pmd_read_atomic and pmdp_get with READ_ONCE(*pmdp) ] Signed-off-by: Pedro Demarchi Gomes Signed-off-by: Greg Kroah-Hartman --- mm/ksm.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 113 insertions(+), 13 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index cb272b6fde59..616a8f7c5c60 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include "internal.h" @@ -2223,6 +2224,94 @@ static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot, return rmap_item; } +struct ksm_next_page_arg { + struct folio *folio; + struct page *page; + unsigned long addr; +}; + +static int ksm_next_page_pmd_entry(pmd_t *pmdp, unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct ksm_next_page_arg *private = walk->private; + struct vm_area_struct *vma = walk->vma; + pte_t *start_ptep = NULL, *ptep, pte; + struct mm_struct *mm = walk->mm; + struct folio *folio; + struct page *page; + spinlock_t *ptl; + pmd_t pmd; + + if (ksm_test_exit(mm)) + return 0; + + cond_resched(); + + pmd = pmd_read_atomic(pmdp); + if (!pmd_present(pmd)) + return 0; + + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && pmd_leaf(pmd)) { + ptl = pmd_lock(mm, pmdp); + pmd = READ_ONCE(*pmdp); + + if (!pmd_present(pmd)) { + goto not_found_unlock; + } else if (pmd_leaf(pmd)) { + page = vm_normal_page_pmd(vma, addr, pmd); + if (!page) + goto not_found_unlock; + folio = page_folio(page); + + if (folio_is_zone_device(folio) || !folio_test_anon(folio)) + goto not_found_unlock; + + page += ((addr & (PMD_SIZE - 1)) >> PAGE_SHIFT); + goto found_unlock; + } + spin_unlock(ptl); + } + + start_ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); + if (!start_ptep) + return 0; + + for (ptep = start_ptep; addr < end; ptep++, addr += PAGE_SIZE) { + pte = ptep_get(ptep); + + if (!pte_present(pte)) + continue; + + page = vm_normal_page(vma, addr, pte); + if (!page) + continue; + folio = page_folio(page); + + if (folio_is_zone_device(folio) || !folio_test_anon(folio)) + continue; + goto found_unlock; + } + +not_found_unlock: + spin_unlock(ptl); + if (start_ptep) + pte_unmap(start_ptep); + return 0; +found_unlock: + folio_get(folio); + spin_unlock(ptl); + if (start_ptep) + pte_unmap(start_ptep); + private->page = page; + private->folio = folio; + private->addr = addr; + return 1; +} + +static struct mm_walk_ops ksm_next_page_ops = { + .pmd_entry = ksm_next_page_pmd_entry, +}; + static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page) { struct mm_struct *mm; @@ -2307,32 +2396,43 @@ next_mm: ksm_scan.address = vma->vm_end; while (ksm_scan.address < vma->vm_end) { + struct ksm_next_page_arg ksm_next_page_arg; + struct page *tmp_page = NULL; + struct folio *folio; + if (ksm_test_exit(mm)) break; - *page = follow_page(vma, ksm_scan.address, FOLL_GET); - if (IS_ERR_OR_NULL(*page)) { - ksm_scan.address += PAGE_SIZE; - cond_resched(); - continue; + + int found; + + found = walk_page_range_vma(vma, ksm_scan.address, + vma->vm_end, + &ksm_next_page_ops, + &ksm_next_page_arg); + + if (found > 0) { + folio = ksm_next_page_arg.folio; + tmp_page = ksm_next_page_arg.page; + ksm_scan.address = ksm_next_page_arg.addr; + } else { + VM_WARN_ON_ONCE(found < 0); + ksm_scan.address = vma->vm_end - PAGE_SIZE; } - if (is_zone_device_page(*page)) - goto next_page; - if (PageAnon(*page)) { - flush_anon_page(vma, *page, ksm_scan.address); - flush_dcache_page(*page); + if (tmp_page) { + flush_anon_page(vma, tmp_page, ksm_scan.address); + flush_dcache_page(tmp_page); rmap_item = get_next_rmap_item(mm_slot, ksm_scan.rmap_list, ksm_scan.address); if (rmap_item) { ksm_scan.rmap_list = &rmap_item->rmap_list; ksm_scan.address += PAGE_SIZE; + *page = tmp_page; } else - put_page(*page); + folio_put(folio); mmap_read_unlock(mm); return rmap_item; } -next_page: - put_page(*page); ksm_scan.address += PAGE_SIZE; cond_resched(); } From 92c54886470488527b8528e3be95ad618ab8123b Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 12 Jan 2026 12:31:08 -0500 Subject: [PATCH 56/74] ALSA: ac97bus: Use guard() for mutex locks [ Upstream commit c07824a14d99c10edd4ec4c389d219af336ecf20 ] Replace the manual mutex lock/unlock pairs with guard() for code simplification. Only code refactoring, and no behavior change. Signed-off-by: Takashi Iwai Link: https://patch.msgid.link/20250829151335.7342-18-tiwai@suse.de Stable-dep-of: 830988b6cf19 ("ALSA: ac97: fix a double free in snd_ac97_controller_register()") Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/ac97/bus.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/sound/ac97/bus.c b/sound/ac97/bus.c index 045330883a96..a46653eeb220 100644 --- a/sound/ac97/bus.c +++ b/sound/ac97/bus.c @@ -242,10 +242,9 @@ static ssize_t cold_reset_store(struct device *dev, { struct ac97_controller *ac97_ctrl; - mutex_lock(&ac97_controllers_mutex); + guard(mutex)(&ac97_controllers_mutex); ac97_ctrl = to_ac97_controller(dev); ac97_ctrl->ops->reset(ac97_ctrl); - mutex_unlock(&ac97_controllers_mutex); return len; } static DEVICE_ATTR_WO(cold_reset); @@ -259,10 +258,9 @@ static ssize_t warm_reset_store(struct device *dev, if (!dev) return -ENODEV; - mutex_lock(&ac97_controllers_mutex); + guard(mutex)(&ac97_controllers_mutex); ac97_ctrl = to_ac97_controller(dev); ac97_ctrl->ops->warm_reset(ac97_ctrl); - mutex_unlock(&ac97_controllers_mutex); return len; } static DEVICE_ATTR_WO(warm_reset); @@ -285,10 +283,10 @@ static const struct attribute_group *ac97_adapter_groups[] = { static void ac97_del_adapter(struct ac97_controller *ac97_ctrl) { - mutex_lock(&ac97_controllers_mutex); - ac97_ctrl_codecs_unregister(ac97_ctrl); - list_del(&ac97_ctrl->controllers); - mutex_unlock(&ac97_controllers_mutex); + scoped_guard(mutex, &ac97_controllers_mutex) { + ac97_ctrl_codecs_unregister(ac97_ctrl); + list_del(&ac97_ctrl->controllers); + } device_unregister(&ac97_ctrl->adap); } @@ -312,7 +310,7 @@ static int ac97_add_adapter(struct ac97_controller *ac97_ctrl) { int ret; - mutex_lock(&ac97_controllers_mutex); + guard(mutex)(&ac97_controllers_mutex); ret = idr_alloc(&ac97_adapter_idr, ac97_ctrl, 0, 0, GFP_KERNEL); ac97_ctrl->nr = ret; if (ret >= 0) { @@ -323,13 +321,11 @@ static int ac97_add_adapter(struct ac97_controller *ac97_ctrl) if (ret) put_device(&ac97_ctrl->adap); } - if (!ret) + if (!ret) { list_add(&ac97_ctrl->controllers, &ac97_controllers); - mutex_unlock(&ac97_controllers_mutex); - - if (!ret) dev_dbg(&ac97_ctrl->adap, "adapter registered by %s\n", dev_name(ac97_ctrl->parent)); + } return ret; } From c80f9b3349a99a9d5b295f5bbc23f544c5995ad7 Mon Sep 17 00:00:00 2001 From: Haoxiang Li Date: Mon, 12 Jan 2026 12:31:09 -0500 Subject: [PATCH 57/74] ALSA: ac97: fix a double free in snd_ac97_controller_register() [ Upstream commit 830988b6cf197e6dcffdfe2008c5738e6c6c3c0f ] If ac97_add_adapter() fails, put_device() is the correct way to drop the device reference. kfree() is not required. Add kfree() if idr_alloc() fails and in ac97_adapter_release() to do the cleanup. Found by code review. Fixes: 74426fbff66e ("ALSA: ac97: add an ac97 bus") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li Link: https://patch.msgid.link/20251219162845.657525-1-lihaoxiang@isrc.iscas.ac.cn Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/ac97/bus.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sound/ac97/bus.c b/sound/ac97/bus.c index a46653eeb220..e4873cc680be 100644 --- a/sound/ac97/bus.c +++ b/sound/ac97/bus.c @@ -299,6 +299,7 @@ static void ac97_adapter_release(struct device *dev) idr_remove(&ac97_adapter_idr, ac97_ctrl->nr); dev_dbg(&ac97_ctrl->adap, "adapter unregistered by %s\n", dev_name(ac97_ctrl->parent)); + kfree(ac97_ctrl); } static const struct device_type ac97_adapter_type = { @@ -320,7 +321,9 @@ static int ac97_add_adapter(struct ac97_controller *ac97_ctrl) ret = device_register(&ac97_ctrl->adap); if (ret) put_device(&ac97_ctrl->adap); - } + } else + kfree(ac97_ctrl); + if (!ret) { list_add(&ac97_ctrl->controllers, &ac97_controllers); dev_dbg(&ac97_ctrl->adap, "adapter registered by %s\n", @@ -361,14 +364,11 @@ struct ac97_controller *snd_ac97_controller_register( ret = ac97_add_adapter(ac97_ctrl); if (ret) - goto err; + return ERR_PTR(ret); ac97_bus_reset(ac97_ctrl); ac97_bus_scan(ac97_ctrl); return ac97_ctrl; -err: - kfree(ac97_ctrl); - return ERR_PTR(ret); } EXPORT_SYMBOL_GPL(snd_ac97_controller_register); From 34eb22836e0cdba093baac66599d68c4cd245a9d Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 12 Jan 2026 10:33:51 -0500 Subject: [PATCH 58/74] nfsd: provide locking for v4_end_grace [ Upstream commit 2857bd59feb63fcf40fe4baf55401baea6b4feb4 ] Writing to v4_end_grace can race with server shutdown and result in memory being accessed after it was freed - reclaim_str_hashtbl in particularly. We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is held while client_tracking_op->init() is called and that can wait for an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a deadlock. nfsd4_end_grace() is also called by the landromat work queue and this doesn't require locking as server shutdown will stop the work and wait for it before freeing anything that nfsd4_end_grace() might access. However, we must be sure that writing to v4_end_grace doesn't restart the work item after shutdown has already waited for it. For this we add a new flag protected with nn->client_lock. It is set only while it is safe to make client tracking calls, and v4_end_grace only schedules work while the flag is set with the spinlock held. So this patch adds a nfsd_net field "client_tracking_active" which is set as described. Another field "grace_end_forced", is set when v4_end_grace is written. After this is set, and providing client_tracking_active is set, the laundromat is scheduled. This "grace_end_forced" field bypasses other checks for whether the grace period has finished. This resolves a race which can result in use-after-free. Reported-by: Li Lingfeng Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/T/#t Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd") Cc: stable@vger.kernel.org Signed-off-by: NeilBrown Tested-by: Li Lingfeng Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever [ Adjust context ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/nfsd/netns.h | 2 ++ fs/nfsd/nfs4state.c | 42 ++++++++++++++++++++++++++++++++++++++++-- fs/nfsd/nfsctl.c | 3 +-- fs/nfsd/state.h | 2 +- 4 files changed, 44 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 41c750f34473..be0c8f1ce4e3 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -64,6 +64,8 @@ struct nfsd_net { struct lock_manager nfsd4_manager; bool grace_ended; + bool grace_end_forced; + bool client_tracking_active; time64_t boot_time; struct dentry *nfsd_client_dir; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2ba7ce076e73..ed5cc5b2330a 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -84,7 +84,7 @@ static u64 current_sessionid = 1; /* forward declarations */ static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner); static void nfs4_free_ol_stateid(struct nfs4_stid *stid); -void nfsd4_end_grace(struct nfsd_net *nn); +static void nfsd4_end_grace(struct nfsd_net *nn); static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps); static void nfsd4_file_hash_remove(struct nfs4_file *fi); @@ -5882,7 +5882,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return nfs_ok; } -void +static void nfsd4_end_grace(struct nfsd_net *nn) { /* do nothing if grace period already ended */ @@ -5915,6 +5915,33 @@ nfsd4_end_grace(struct nfsd_net *nn) */ } +/** + * nfsd4_force_end_grace - forcibly end the NFSv4 grace period + * @nn: network namespace for the server instance to be updated + * + * Forces bypass of normal grace period completion, then schedules + * the laundromat to end the grace period immediately. Does not wait + * for the grace period to fully terminate before returning. + * + * Return values: + * %true: Grace termination schedule + * %false: No action was taken + */ +bool nfsd4_force_end_grace(struct nfsd_net *nn) +{ + if (!nn->client_tracking_ops) + return false; + spin_lock(&nn->client_lock); + if (nn->grace_ended || !nn->client_tracking_active) { + spin_unlock(&nn->client_lock); + return false; + } + WRITE_ONCE(nn->grace_end_forced, true); + mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); + spin_unlock(&nn->client_lock); + return true; +} + /* * If we've waited a lease period but there are still clients trying to * reclaim, wait a little longer to give them a chance to finish. @@ -5924,6 +5951,8 @@ static bool clients_still_reclaiming(struct nfsd_net *nn) time64_t double_grace_period_end = nn->boot_time + 2 * nn->nfsd4_lease; + if (READ_ONCE(nn->grace_end_forced)) + return false; if (nn->track_reclaim_completes && atomic_read(&nn->nr_reclaim_complete) == nn->reclaim_str_hashtbl_size) @@ -8131,6 +8160,8 @@ static int nfs4_state_create_net(struct net *net) nn->unconf_name_tree = RB_ROOT; nn->boot_time = ktime_get_real_seconds(); nn->grace_ended = false; + nn->grace_end_forced = false; + nn->client_tracking_active = false; nn->nfsd4_manager.block_opens = true; INIT_LIST_HEAD(&nn->nfsd4_manager.list); INIT_LIST_HEAD(&nn->client_lru); @@ -8207,6 +8238,10 @@ nfs4_state_start_net(struct net *net) return ret; locks_start_grace(net, &nn->nfsd4_manager); nfsd4_client_tracking_init(net); + /* safe for laundromat to run now */ + spin_lock(&nn->client_lock); + nn->client_tracking_active = true; + spin_unlock(&nn->client_lock); if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0) goto skip_grace; printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n", @@ -8253,6 +8288,9 @@ nfs4_state_shutdown_net(struct net *net) unregister_shrinker(&nn->nfsd_client_shrinker); cancel_work_sync(&nn->nfsd_shrinker_work); + spin_lock(&nn->client_lock); + nn->client_tracking_active = false; + spin_unlock(&nn->client_lock); cancel_delayed_work_sync(&nn->laundromat_work); locks_end_grace(&nn->nfsd4_manager); diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 2feaa49fb9fe..07e5b1b23c91 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1117,9 +1117,8 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size) case 'Y': case 'y': case '1': - if (!nn->nfsd_serv) + if (!nfsd4_force_end_grace(nn)) return -EBUSY; - nfsd4_end_grace(nn); break; default: return -EINVAL; diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index e94634d30591..477828dbfc66 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -719,7 +719,7 @@ static inline void get_nfs4_file(struct nfs4_file *fi) struct nfsd_file *find_any_file(struct nfs4_file *f); /* grace period management */ -void nfsd4_end_grace(struct nfsd_net *nn); +bool nfsd4_force_end_grace(struct nfsd_net *nn); /* nfs4recover operations */ extern int nfsd4_client_tracking_init(struct net *net); From 2e5b886c8516747057725d65d38e435a5db1245e Mon Sep 17 00:00:00 2001 From: Chen Hanxiao Date: Mon, 12 Jan 2026 09:39:08 -0500 Subject: [PATCH 59/74] NFS: trace: show TIMEDOUT instead of 0x6e [ Upstream commit cef48236dfe55fa266d505e8a497963a7bc5ef2a ] __nfs_revalidate_inode may return ETIMEDOUT. print symbol of ETIMEDOUT in nfs trace: before: cat-5191 [005] 119.331127: nfs_revalidate_inode_exit: error=-110 (0x6e) after: cat-1738 [004] 44.365509: nfs_revalidate_inode_exit: error=-110 (TIMEDOUT) Signed-off-by: Chen Hanxiao Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Stable-dep-of: c6c209ceb87f ("NFSD: Remove NFSERR_EAGAIN") Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/trace/misc/nfs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/trace/misc/nfs.h b/include/trace/misc/nfs.h index 0d9d48dca38a..5b6c36fe9cdf 100644 --- a/include/trace/misc/nfs.h +++ b/include/trace/misc/nfs.h @@ -52,6 +52,7 @@ TRACE_DEFINE_ENUM(NFSERR_JUKEBOX); { NFSERR_IO, "IO" }, \ { NFSERR_NXIO, "NXIO" }, \ { ECHILD, "CHILD" }, \ + { ETIMEDOUT, "TIMEDOUT" }, \ { NFSERR_EAGAIN, "AGAIN" }, \ { NFSERR_ACCES, "ACCES" }, \ { NFSERR_EXIST, "EXIST" }, \ From 41daec7ee7e9c6068628b651f5779c5b22837287 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Mon, 12 Jan 2026 09:39:09 -0500 Subject: [PATCH 60/74] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno [ Upstream commit 4806ded4c14c5e8fdc6ce885d83221a78c06a428 ] Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and fs/nfs/nfs3xdr.c Will also be used by fs/nfsd/localio.c Signed-off-by: Mike Snitzer Reviewed-by: Jeff Layton Reviewed-by: NeilBrown Signed-off-by: Anna Schumaker Stable-dep-of: c6c209ceb87f ("NFSD: Remove NFSERR_EAGAIN") Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/nfs/Kconfig | 1 + fs/nfs/nfs2xdr.c | 70 +----------------------- fs/nfs/nfs3xdr.c | 108 +++++++------------------------------ fs/nfs/nfs4xdr.c | 4 +- fs/nfs_common/Makefile | 2 + fs/nfs_common/common.c | 67 +++++++++++++++++++++++ fs/nfsd/Kconfig | 1 + include/linux/nfs_common.h | 16 ++++++ 8 files changed, 109 insertions(+), 160 deletions(-) create mode 100644 fs/nfs_common/common.c create mode 100644 include/linux/nfs_common.h diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig index 899e25e9b4eb..9c1d1aff61ee 100644 --- a/fs/nfs/Kconfig +++ b/fs/nfs/Kconfig @@ -5,6 +5,7 @@ config NFS_FS select CRC32 select LOCKD select SUNRPC + select NFS_COMMON select NFS_ACL_SUPPORT if NFS_V3_ACL help Choose Y here if you want to access files residing on other diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c index c19093814296..6e75c6c2d234 100644 --- a/fs/nfs/nfs2xdr.c +++ b/fs/nfs/nfs2xdr.c @@ -22,14 +22,12 @@ #include #include #include +#include #include "nfstrace.h" #include "internal.h" #define NFSDBG_FACILITY NFSDBG_XDR -/* Mapping from NFS error code to "errno" error code. */ -#define errno_NFSERR_IO EIO - /* * Declare the space requirements for NFS arguments and replies as * number of 32bit-words @@ -64,8 +62,6 @@ #define NFS_readdirres_sz (1+NFS_pagepad_sz) #define NFS_statfsres_sz (1+NFS_info_sz) -static int nfs_stat_to_errno(enum nfs_stat); - /* * Encode/decode NFSv2 basic data types * @@ -1054,70 +1050,6 @@ out_default: return nfs_stat_to_errno(status); } - -/* - * We need to translate between nfs status return values and - * the local errno values which may not be the same. - */ -static const struct { - int stat; - int errno; -} nfs_errtbl[] = { - { NFS_OK, 0 }, - { NFSERR_PERM, -EPERM }, - { NFSERR_NOENT, -ENOENT }, - { NFSERR_IO, -errno_NFSERR_IO}, - { NFSERR_NXIO, -ENXIO }, -/* { NFSERR_EAGAIN, -EAGAIN }, */ - { NFSERR_ACCES, -EACCES }, - { NFSERR_EXIST, -EEXIST }, - { NFSERR_XDEV, -EXDEV }, - { NFSERR_NODEV, -ENODEV }, - { NFSERR_NOTDIR, -ENOTDIR }, - { NFSERR_ISDIR, -EISDIR }, - { NFSERR_INVAL, -EINVAL }, - { NFSERR_FBIG, -EFBIG }, - { NFSERR_NOSPC, -ENOSPC }, - { NFSERR_ROFS, -EROFS }, - { NFSERR_MLINK, -EMLINK }, - { NFSERR_NAMETOOLONG, -ENAMETOOLONG }, - { NFSERR_NOTEMPTY, -ENOTEMPTY }, - { NFSERR_DQUOT, -EDQUOT }, - { NFSERR_STALE, -ESTALE }, - { NFSERR_REMOTE, -EREMOTE }, -#ifdef EWFLUSH - { NFSERR_WFLUSH, -EWFLUSH }, -#endif - { NFSERR_BADHANDLE, -EBADHANDLE }, - { NFSERR_NOT_SYNC, -ENOTSYNC }, - { NFSERR_BAD_COOKIE, -EBADCOOKIE }, - { NFSERR_NOTSUPP, -ENOTSUPP }, - { NFSERR_TOOSMALL, -ETOOSMALL }, - { NFSERR_SERVERFAULT, -EREMOTEIO }, - { NFSERR_BADTYPE, -EBADTYPE }, - { NFSERR_JUKEBOX, -EJUKEBOX }, - { -1, -EIO } -}; - -/** - * nfs_stat_to_errno - convert an NFS status code to a local errno - * @status: NFS status code to convert - * - * Returns a local errno value, or -EIO if the NFS status code is - * not recognized. This function is used jointly by NFSv2 and NFSv3. - */ -static int nfs_stat_to_errno(enum nfs_stat status) -{ - int i; - - for (i = 0; nfs_errtbl[i].stat != -1; i++) { - if (nfs_errtbl[i].stat == (int)status) - return nfs_errtbl[i].errno; - } - dprintk("NFS: Unrecognized nfs status value: %u\n", status); - return nfs_errtbl[i].errno; -} - #define PROC(proc, argtype, restype, timer) \ [NFSPROC_##proc] = { \ .p_proc = NFSPROC_##proc, \ diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c index 60f032be805a..4ae01c10b7e2 100644 --- a/fs/nfs/nfs3xdr.c +++ b/fs/nfs/nfs3xdr.c @@ -21,14 +21,13 @@ #include #include #include +#include + #include "nfstrace.h" #include "internal.h" #define NFSDBG_FACILITY NFSDBG_XDR -/* Mapping from NFS error code to "errno" error code. */ -#define errno_NFSERR_IO EIO - /* * Declare the space requirements for NFS arguments and replies as * number of 32bit-words @@ -91,8 +90,6 @@ NFS3_pagepad_sz) #define ACL3_setaclres_sz (1+NFS3_post_op_attr_sz) -static int nfs3_stat_to_errno(enum nfs_stat); - /* * Map file type to S_IFMT bits */ @@ -1406,7 +1403,7 @@ static int nfs3_xdr_dec_getattr3res(struct rpc_rqst *req, out: return error; out_default: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1445,7 +1442,7 @@ static int nfs3_xdr_dec_setattr3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1495,7 +1492,7 @@ out_default: error = decode_post_op_attr(xdr, result->dir_attr, userns); if (unlikely(error)) goto out; - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1537,7 +1534,7 @@ static int nfs3_xdr_dec_access3res(struct rpc_rqst *req, out: return error; out_default: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1578,7 +1575,7 @@ static int nfs3_xdr_dec_readlink3res(struct rpc_rqst *req, out: return error; out_default: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1658,7 +1655,7 @@ static int nfs3_xdr_dec_read3res(struct rpc_rqst *req, struct xdr_stream *xdr, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1728,7 +1725,7 @@ static int nfs3_xdr_dec_write3res(struct rpc_rqst *req, struct xdr_stream *xdr, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1795,7 +1792,7 @@ out_default: error = decode_wcc_data(xdr, result->dir_attr, userns); if (unlikely(error)) goto out; - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1835,7 +1832,7 @@ static int nfs3_xdr_dec_remove3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1881,7 +1878,7 @@ static int nfs3_xdr_dec_rename3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1926,7 +1923,7 @@ static int nfs3_xdr_dec_link3res(struct rpc_rqst *req, struct xdr_stream *xdr, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /** @@ -2101,7 +2098,7 @@ out_default: error = decode_post_op_attr(xdr, result->dir_attr, rpc_rqst_userns(req)); if (unlikely(error)) goto out; - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -2167,7 +2164,7 @@ static int nfs3_xdr_dec_fsstat3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -2243,7 +2240,7 @@ static int nfs3_xdr_dec_fsinfo3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -2304,7 +2301,7 @@ static int nfs3_xdr_dec_pathconf3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -2350,7 +2347,7 @@ static int nfs3_xdr_dec_commit3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } #ifdef CONFIG_NFS_V3_ACL @@ -2416,7 +2413,7 @@ static int nfs3_xdr_dec_getacl3res(struct rpc_rqst *req, out: return error; out_default: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } static int nfs3_xdr_dec_setacl3res(struct rpc_rqst *req, @@ -2435,76 +2432,11 @@ static int nfs3_xdr_dec_setacl3res(struct rpc_rqst *req, out: return error; out_default: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } #endif /* CONFIG_NFS_V3_ACL */ - -/* - * We need to translate between nfs status return values and - * the local errno values which may not be the same. - */ -static const struct { - int stat; - int errno; -} nfs_errtbl[] = { - { NFS_OK, 0 }, - { NFSERR_PERM, -EPERM }, - { NFSERR_NOENT, -ENOENT }, - { NFSERR_IO, -errno_NFSERR_IO}, - { NFSERR_NXIO, -ENXIO }, -/* { NFSERR_EAGAIN, -EAGAIN }, */ - { NFSERR_ACCES, -EACCES }, - { NFSERR_EXIST, -EEXIST }, - { NFSERR_XDEV, -EXDEV }, - { NFSERR_NODEV, -ENODEV }, - { NFSERR_NOTDIR, -ENOTDIR }, - { NFSERR_ISDIR, -EISDIR }, - { NFSERR_INVAL, -EINVAL }, - { NFSERR_FBIG, -EFBIG }, - { NFSERR_NOSPC, -ENOSPC }, - { NFSERR_ROFS, -EROFS }, - { NFSERR_MLINK, -EMLINK }, - { NFSERR_NAMETOOLONG, -ENAMETOOLONG }, - { NFSERR_NOTEMPTY, -ENOTEMPTY }, - { NFSERR_DQUOT, -EDQUOT }, - { NFSERR_STALE, -ESTALE }, - { NFSERR_REMOTE, -EREMOTE }, -#ifdef EWFLUSH - { NFSERR_WFLUSH, -EWFLUSH }, -#endif - { NFSERR_BADHANDLE, -EBADHANDLE }, - { NFSERR_NOT_SYNC, -ENOTSYNC }, - { NFSERR_BAD_COOKIE, -EBADCOOKIE }, - { NFSERR_NOTSUPP, -ENOTSUPP }, - { NFSERR_TOOSMALL, -ETOOSMALL }, - { NFSERR_SERVERFAULT, -EREMOTEIO }, - { NFSERR_BADTYPE, -EBADTYPE }, - { NFSERR_JUKEBOX, -EJUKEBOX }, - { -1, -EIO } -}; - -/** - * nfs3_stat_to_errno - convert an NFS status code to a local errno - * @status: NFS status code to convert - * - * Returns a local errno value, or -EIO if the NFS status code is - * not recognized. This function is used jointly by NFSv2 and NFSv3. - */ -static int nfs3_stat_to_errno(enum nfs_stat status) -{ - int i; - - for (i = 0; nfs_errtbl[i].stat != -1; i++) { - if (nfs_errtbl[i].stat == (int)status) - return nfs_errtbl[i].errno; - } - dprintk("NFS: Unrecognized nfs status value: %u\n", status); - return nfs_errtbl[i].errno; -} - - #define PROC(proc, argtype, restype, timer) \ [NFS3PROC_##proc] = { \ .p_proc = NFS3PROC_##proc, \ diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index deec76cf5afe..a9d57fcdf9b4 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -52,6 +52,7 @@ #include #include #include +#include #include "nfs4_fs.h" #include "nfs4trace.h" @@ -63,9 +64,6 @@ #define NFSDBG_FACILITY NFSDBG_XDR -/* Mapping from NFS error code to "errno" error code. */ -#define errno_NFSERR_IO EIO - struct compound_hdr; static int nfs4_stat_to_errno(int); static void encode_layoutget(struct xdr_stream *xdr, diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile index 119c75ab9fd0..e58b01bb8dda 100644 --- a/fs/nfs_common/Makefile +++ b/fs/nfs_common/Makefile @@ -8,3 +8,5 @@ nfs_acl-objs := nfsacl.o obj-$(CONFIG_GRACE_PERIOD) += grace.o obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o + +obj-$(CONFIG_NFS_COMMON) += common.o diff --git a/fs/nfs_common/common.c b/fs/nfs_common/common.c new file mode 100644 index 000000000000..a4ee95da2174 --- /dev/null +++ b/fs/nfs_common/common.c @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include + +/* + * We need to translate between nfs status return values and + * the local errno values which may not be the same. + */ +static const struct { + int stat; + int errno; +} nfs_errtbl[] = { + { NFS_OK, 0 }, + { NFSERR_PERM, -EPERM }, + { NFSERR_NOENT, -ENOENT }, + { NFSERR_IO, -errno_NFSERR_IO}, + { NFSERR_NXIO, -ENXIO }, +/* { NFSERR_EAGAIN, -EAGAIN }, */ + { NFSERR_ACCES, -EACCES }, + { NFSERR_EXIST, -EEXIST }, + { NFSERR_XDEV, -EXDEV }, + { NFSERR_NODEV, -ENODEV }, + { NFSERR_NOTDIR, -ENOTDIR }, + { NFSERR_ISDIR, -EISDIR }, + { NFSERR_INVAL, -EINVAL }, + { NFSERR_FBIG, -EFBIG }, + { NFSERR_NOSPC, -ENOSPC }, + { NFSERR_ROFS, -EROFS }, + { NFSERR_MLINK, -EMLINK }, + { NFSERR_NAMETOOLONG, -ENAMETOOLONG }, + { NFSERR_NOTEMPTY, -ENOTEMPTY }, + { NFSERR_DQUOT, -EDQUOT }, + { NFSERR_STALE, -ESTALE }, + { NFSERR_REMOTE, -EREMOTE }, +#ifdef EWFLUSH + { NFSERR_WFLUSH, -EWFLUSH }, +#endif + { NFSERR_BADHANDLE, -EBADHANDLE }, + { NFSERR_NOT_SYNC, -ENOTSYNC }, + { NFSERR_BAD_COOKIE, -EBADCOOKIE }, + { NFSERR_NOTSUPP, -ENOTSUPP }, + { NFSERR_TOOSMALL, -ETOOSMALL }, + { NFSERR_SERVERFAULT, -EREMOTEIO }, + { NFSERR_BADTYPE, -EBADTYPE }, + { NFSERR_JUKEBOX, -EJUKEBOX }, + { -1, -EIO } +}; + +/** + * nfs_stat_to_errno - convert an NFS status code to a local errno + * @status: NFS status code to convert + * + * Returns a local errno value, or -EIO if the NFS status code is + * not recognized. This function is used jointly by NFSv2 and NFSv3. + */ +int nfs_stat_to_errno(enum nfs_stat status) +{ + int i; + + for (i = 0; nfs_errtbl[i].stat != -1; i++) { + if (nfs_errtbl[i].stat == (int)status) + return nfs_errtbl[i].errno; + } + return nfs_errtbl[i].errno; +} +EXPORT_SYMBOL_GPL(nfs_stat_to_errno); diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index 4f704f868d9c..03b728d851db 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -8,6 +8,7 @@ config NFSD select LOCKD select SUNRPC select EXPORTFS + select NFS_COMMON select NFS_ACL_SUPPORT if NFSD_V2_ACL select NFS_ACL_SUPPORT if NFSD_V3_ACL depends on MULTIUSER diff --git a/include/linux/nfs_common.h b/include/linux/nfs_common.h new file mode 100644 index 000000000000..3395c4a4d372 --- /dev/null +++ b/include/linux/nfs_common.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * This file contains constants and methods used by both NFS client and server. + */ +#ifndef _LINUX_NFS_COMMON_H +#define _LINUX_NFS_COMMON_H + +#include +#include + +/* Mapping from NFS error code to "errno" error code. */ +#define errno_NFSERR_IO EIO + +int nfs_stat_to_errno(enum nfs_stat status); + +#endif /* _LINUX_NFS_COMMON_H */ From 5bbaed525e4876cf5679b22cd5fbad2e742110b8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Jan 2026 09:39:10 -0500 Subject: [PATCH 61/74] NFSD: Remove NFSERR_EAGAIN [ Upstream commit c6c209ceb87f64a6ceebe61761951dcbbf4a0baa ] I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881. None of these RFCs have an NFS status code that match the numeric value "11". Based on the meaning of the EAGAIN errno, I presume the use of this status in NFSD means NFS4ERR_DELAY. So replace the one usage of nfserr_eagain, and remove it from NFSD's NFS status conversion tables. As far as I can tell, NFSERR_EAGAIN has existed since the pre-git era, but was not actually used by any code until commit f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed."), at which time it become possible for NFSD to return a status code of 11 (which is not valid NFS protocol). Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Cc: stable@vger.kernel.org Reviewed-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/nfs_common/common.c | 1 - fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfsd.h | 1 - include/trace/misc/nfs.h | 2 -- include/uapi/linux/nfs.h | 1 - 5 files changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/nfs_common/common.c b/fs/nfs_common/common.c index a4ee95da2174..5cb0781e918f 100644 --- a/fs/nfs_common/common.c +++ b/fs/nfs_common/common.c @@ -16,7 +16,6 @@ static const struct { { NFSERR_NOENT, -ENOENT }, { NFSERR_IO, -errno_NFSERR_IO}, { NFSERR_NXIO, -ENXIO }, -/* { NFSERR_EAGAIN, -EAGAIN }, */ { NFSERR_ACCES, -EACCES }, { NFSERR_EXIST, -EEXIST }, { NFSERR_XDEV, -EXDEV }, diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f78a01102c67..714e4c471e86 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1321,7 +1321,7 @@ try_again: (schedule_timeout(20*HZ) == 0)) { finish_wait(&nn->nfsd_ssc_waitq, &wait); kfree(work); - return nfserr_eagain; + return nfserr_jukebox; } finish_wait(&nn->nfsd_ssc_waitq, &wait); goto try_again; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 996f3f62335b..1116e5fcddc5 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -201,7 +201,6 @@ void nfsd_lockd_shutdown(void); #define nfserr_noent cpu_to_be32(NFSERR_NOENT) #define nfserr_io cpu_to_be32(NFSERR_IO) #define nfserr_nxio cpu_to_be32(NFSERR_NXIO) -#define nfserr_eagain cpu_to_be32(NFSERR_EAGAIN) #define nfserr_acces cpu_to_be32(NFSERR_ACCES) #define nfserr_exist cpu_to_be32(NFSERR_EXIST) #define nfserr_xdev cpu_to_be32(NFSERR_XDEV) diff --git a/include/trace/misc/nfs.h b/include/trace/misc/nfs.h index 5b6c36fe9cdf..7d336ba1c34f 100644 --- a/include/trace/misc/nfs.h +++ b/include/trace/misc/nfs.h @@ -16,7 +16,6 @@ TRACE_DEFINE_ENUM(NFSERR_PERM); TRACE_DEFINE_ENUM(NFSERR_NOENT); TRACE_DEFINE_ENUM(NFSERR_IO); TRACE_DEFINE_ENUM(NFSERR_NXIO); -TRACE_DEFINE_ENUM(NFSERR_EAGAIN); TRACE_DEFINE_ENUM(NFSERR_ACCES); TRACE_DEFINE_ENUM(NFSERR_EXIST); TRACE_DEFINE_ENUM(NFSERR_XDEV); @@ -53,7 +52,6 @@ TRACE_DEFINE_ENUM(NFSERR_JUKEBOX); { NFSERR_NXIO, "NXIO" }, \ { ECHILD, "CHILD" }, \ { ETIMEDOUT, "TIMEDOUT" }, \ - { NFSERR_EAGAIN, "AGAIN" }, \ { NFSERR_ACCES, "ACCES" }, \ { NFSERR_EXIST, "EXIST" }, \ { NFSERR_XDEV, "XDEV" }, \ diff --git a/include/uapi/linux/nfs.h b/include/uapi/linux/nfs.h index 946cb62d64b0..5dc726070b51 100644 --- a/include/uapi/linux/nfs.h +++ b/include/uapi/linux/nfs.h @@ -49,7 +49,6 @@ NFSERR_NOENT = 2, /* v2 v3 v4 */ NFSERR_IO = 5, /* v2 v3 v4 */ NFSERR_NXIO = 6, /* v2 v3 v4 */ - NFSERR_EAGAIN = 11, /* v2 v3 */ NFSERR_ACCES = 13, /* v2 v3 v4 */ NFSERR_EXIST = 17, /* v2 v3 v4 */ NFSERR_XDEV = 18, /* v3 v4 */ From 2cc929275d9edcfa67c09384af0ef7ac0745089c Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Wed, 11 Jun 2025 20:50:32 -0700 Subject: [PATCH 62/74] bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4K [ Upstream commit 4fc012daf9c074772421c904357abf586336b1ca ] The bpf selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow failed on arm64 with 64KB page: xdp_adjust_tail/xdp_adjust_frags_tail_grow:FAIL In bpf_prog_test_run_xdp(), the xdp->frame_sz is set to 4K, but later on when constructing frags, with 64K page size, the frag data_len could be more than 4K. This will cause problems in bpf_xdp_frags_increase_tail(). To fix the failure, the xdp->frame_sz is set to be PAGE_SIZE so kernel can test different page size properly. With the kernel change, the user space and bpf prog needs adjustment. Currently, the MAX_SKB_FRAGS default value is 17, so for 4K page, the maximum packet size will be less than 68K. To test 64K page, a bigger maximum packet size than 68K is desired. So two different functions are implemented for subtest xdp_adjust_frags_tail_grow. Depending on different page size, different data input/output sizes are used to adapt with different page size. Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20250612035032.2207498-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov Stable-dep-of: e558cca21779 ("bpf, test_run: Subtract size of xdp_frame from allowed metadata size") Signed-off-by: Sasha Levin --- net/bpf/test_run.c | 2 +- .../bpf/prog_tests/xdp_adjust_tail.c | 96 +++++++++++++++++-- .../bpf/progs/test_xdp_adjust_tail_grow.c | 8 +- 3 files changed, 97 insertions(+), 9 deletions(-) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 77b386b76d46..939bd0d866ab 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -1322,7 +1322,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, headroom -= ctx->data; } - max_data_sz = 4096 - headroom - tailroom; + max_data_sz = PAGE_SIZE - headroom - tailroom; if (size > max_data_sz) { /* disallow live data mode for jumbo frames */ if (do_live) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c index 89366913a251..df907798d59d 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c @@ -37,21 +37,26 @@ static void test_xdp_adjust_tail_shrink(void) bpf_object__close(obj); } -static void test_xdp_adjust_tail_grow(void) +static void test_xdp_adjust_tail_grow(bool is_64k_pagesize) { const char *file = "./test_xdp_adjust_tail_grow.bpf.o"; struct bpf_object *obj; - char buf[4096]; /* avoid segfault: large buf to hold grow results */ + char buf[8192]; /* avoid segfault: large buf to hold grow results */ __u32 expect_sz; int err, prog_fd; LIBBPF_OPTS(bpf_test_run_opts, topts, .data_in = &pkt_v4, - .data_size_in = sizeof(pkt_v4), .data_out = buf, .data_size_out = sizeof(buf), .repeat = 1, ); + /* topts.data_size_in as a special signal to bpf prog */ + if (is_64k_pagesize) + topts.data_size_in = sizeof(pkt_v4) - 1; + else + topts.data_size_in = sizeof(pkt_v4); + err = bpf_prog_test_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd); if (!ASSERT_OK(err, "test_xdp_adjust_tail_grow")) return; @@ -201,7 +206,7 @@ out: bpf_object__close(obj); } -static void test_xdp_adjust_frags_tail_grow(void) +static void test_xdp_adjust_frags_tail_grow_4k(void) { const char *file = "./test_xdp_adjust_tail_grow.bpf.o"; __u32 exp_size; @@ -266,16 +271,93 @@ out: bpf_object__close(obj); } +static void test_xdp_adjust_frags_tail_grow_64k(void) +{ + const char *file = "./test_xdp_adjust_tail_grow.bpf.o"; + __u32 exp_size; + struct bpf_program *prog; + struct bpf_object *obj; + int err, i, prog_fd; + __u8 *buf; + LIBBPF_OPTS(bpf_test_run_opts, topts); + + obj = bpf_object__open(file); + if (libbpf_get_error(obj)) + return; + + prog = bpf_object__next_program(obj, NULL); + if (bpf_object__load(obj)) + goto out; + + prog_fd = bpf_program__fd(prog); + + buf = malloc(262144); + if (!ASSERT_OK_PTR(buf, "alloc buf 256Kb")) + goto out; + + /* Test case add 10 bytes to last frag */ + memset(buf, 1, 262144); + exp_size = 90000 + 10; + + topts.data_in = buf; + topts.data_out = buf; + topts.data_size_in = 90000; + topts.data_size_out = 262144; + err = bpf_prog_test_run_opts(prog_fd, &topts); + + ASSERT_OK(err, "90Kb+10b"); + ASSERT_EQ(topts.retval, XDP_TX, "90Kb+10b retval"); + ASSERT_EQ(topts.data_size_out, exp_size, "90Kb+10b size"); + + for (i = 0; i < 90000; i++) { + if (buf[i] != 1) + ASSERT_EQ(buf[i], 1, "90Kb+10b-old"); + } + + for (i = 90000; i < 90010; i++) { + if (buf[i] != 0) + ASSERT_EQ(buf[i], 0, "90Kb+10b-new"); + } + + for (i = 90010; i < 262144; i++) { + if (buf[i] != 1) + ASSERT_EQ(buf[i], 1, "90Kb+10b-untouched"); + } + + /* Test a too large grow */ + memset(buf, 1, 262144); + exp_size = 90001; + + topts.data_in = topts.data_out = buf; + topts.data_size_in = 90001; + topts.data_size_out = 262144; + err = bpf_prog_test_run_opts(prog_fd, &topts); + + ASSERT_OK(err, "90Kb+10b"); + ASSERT_EQ(topts.retval, XDP_DROP, "90Kb+10b retval"); + ASSERT_EQ(topts.data_size_out, exp_size, "90Kb+10b size"); + + free(buf); +out: + bpf_object__close(obj); +} + void test_xdp_adjust_tail(void) { + int page_size = getpagesize(); + if (test__start_subtest("xdp_adjust_tail_shrink")) test_xdp_adjust_tail_shrink(); if (test__start_subtest("xdp_adjust_tail_grow")) - test_xdp_adjust_tail_grow(); + test_xdp_adjust_tail_grow(page_size == 65536); if (test__start_subtest("xdp_adjust_tail_grow2")) test_xdp_adjust_tail_grow2(); if (test__start_subtest("xdp_adjust_frags_tail_shrink")) test_xdp_adjust_frags_tail_shrink(); - if (test__start_subtest("xdp_adjust_frags_tail_grow")) - test_xdp_adjust_frags_tail_grow(); + if (test__start_subtest("xdp_adjust_frags_tail_grow")) { + if (page_size == 65536) + test_xdp_adjust_frags_tail_grow_64k(); + else + test_xdp_adjust_frags_tail_grow_4k(); + } } diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c index 53b64c999450..706a8d4c6e2c 100644 --- a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c +++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c @@ -13,7 +13,9 @@ int _xdp_adjust_tail_grow(struct xdp_md *xdp) /* Data length determine test case */ if (data_len == 54) { /* sizeof(pkt_v4) */ - offset = 4096; /* test too large offset */ + offset = 4096; /* test too large offset, 4k page size */ + } else if (data_len == 53) { /* sizeof(pkt_v4) - 1 */ + offset = 65536; /* test too large offset, 64k page size */ } else if (data_len == 74) { /* sizeof(pkt_v6) */ offset = 40; } else if (data_len == 64) { @@ -25,6 +27,10 @@ int _xdp_adjust_tail_grow(struct xdp_md *xdp) offset = 10; } else if (data_len == 9001) { offset = 4096; + } else if (data_len == 90000) { + offset = 10; /* test a small offset, 64k page size */ + } else if (data_len == 90001) { + offset = 65536; /* test too large offset, 64k page size */ } else { return XDP_ABORTED; /* No matching test */ } From 80c178906d06e34ab47505b874395f8508906d2a Mon Sep 17 00:00:00 2001 From: Amery Hung Date: Mon, 22 Sep 2025 16:33:53 -0700 Subject: [PATCH 63/74] bpf: Make variables in bpf_prog_test_run_xdp less confusing [ Upstream commit 7eb83bff02ad5e82e8c456c58717ef181c220870 ] Change the variable naming in bpf_prog_test_run_xdp() to make the overall logic less confusing. As different modes were added to the function over the time, some variables got overloaded, making it hard to understand and changing the code becomes error-prone. Replace "size" with "linear_sz" where it refers to the size of metadata and data. If "size" refers to input data size, use test.data_size_in directly. Replace "max_data_sz" with "max_linear_sz" to better reflect the fact that it is the maximum size of metadata and data (i.e., linear_sz). Also, xdp_rxq.frags_size is always PAGE_SIZE, so just set it directly instead of subtracting headroom and tailroom and adding them back. Signed-off-by: Amery Hung Signed-off-by: Martin KaFai Lau Link: https://patch.msgid.link/20250922233356.3356453-6-ameryhung@gmail.com Stable-dep-of: e558cca21779 ("bpf, test_run: Subtract size of xdp_frame from allowed metadata size") Signed-off-by: Sasha Levin --- net/bpf/test_run.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 939bd0d866ab..397736cc2d78 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -1277,9 +1277,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, { bool do_live = (kattr->test.flags & BPF_F_TEST_XDP_LIVE_FRAMES); u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + u32 retval = 0, duration, max_linear_sz, size; + u32 linear_sz = kattr->test.data_size_in; u32 batch_size = kattr->test.batch_size; - u32 retval = 0, duration, max_data_sz; - u32 size = kattr->test.data_size_in; u32 headroom = XDP_PACKET_HEADROOM; u32 repeat = kattr->test.repeat; struct netdev_rx_queue *rxqueue; @@ -1313,7 +1313,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, if (ctx) { /* There can't be user provided data before the meta data */ - if (ctx->data_meta || ctx->data_end != size || + if (ctx->data_meta || ctx->data_end != kattr->test.data_size_in || ctx->data > ctx->data_end || unlikely(xdp_metalen_invalid(ctx->data)) || (do_live && (kattr->test.data_out || kattr->test.ctx_out))) @@ -1322,30 +1322,30 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, headroom -= ctx->data; } - max_data_sz = PAGE_SIZE - headroom - tailroom; - if (size > max_data_sz) { - /* disallow live data mode for jumbo frames */ - if (do_live) - goto free_ctx; - size = max_data_sz; - } + max_linear_sz = PAGE_SIZE - headroom - tailroom; + linear_sz = min_t(u32, linear_sz, max_linear_sz); - data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom); + /* disallow live data mode for jumbo frames */ + if (do_live && kattr->test.data_size_in > linear_sz) + goto free_ctx; + + data = bpf_test_init(kattr, linear_sz, max_linear_sz, headroom, tailroom); if (IS_ERR(data)) { ret = PTR_ERR(data); goto free_ctx; } rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0); - rxqueue->xdp_rxq.frag_size = headroom + max_data_sz + tailroom; + rxqueue->xdp_rxq.frag_size = PAGE_SIZE; xdp_init_buff(&xdp, rxqueue->xdp_rxq.frag_size, &rxqueue->xdp_rxq); - xdp_prepare_buff(&xdp, data, headroom, size, true); + xdp_prepare_buff(&xdp, data, headroom, linear_sz, true); sinfo = xdp_get_shared_info_from_buff(&xdp); ret = xdp_convert_md_to_buff(ctx, &xdp); if (ret) goto free_data; + size = linear_sz; if (unlikely(kattr->test.data_size_in > size)) { void __user *data_in = u64_to_user_ptr(kattr->test.data_in); From 1f31ac503c189e2b3c80507268469632991d1703 Mon Sep 17 00:00:00 2001 From: Amery Hung Date: Mon, 22 Sep 2025 16:33:54 -0700 Subject: [PATCH 64/74] bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN [ Upstream commit fe9544ed1a2e9217b2c5285c3a4ac0dc5a38bd7b ] To test bpf_xdp_pull_data(), an xdp packet containing fragments as well as free linear data area after xdp->data_end needs to be created. However, bpf_prog_test_run_xdp() always fills the linear area with data_in before creating fragments, leaving no space to pull data. This patch will allow users to specify the linear data size through ctx->data_end. Currently, ctx_in->data_end must match data_size_in and will not be the final ctx->data_end seen by xdp programs. This is because ctx->data_end is populated according to the xdp_buff passed to test_run. The linear data area available in an xdp_buff, max_linear_sz, is alawys filled up before copying data_in into fragments. This patch will allow users to specify the size of data that goes into the linear area. When ctx_in->data_end is different from data_size_in, only ctx_in->data_end bytes of data will be put into the linear area when creating the xdp_buff. While ctx_in->data_end will be allowed to be different from data_size_in, it cannot be larger than the data_size_in as there will be no data to copy from user space. If it is larger than the maximum linear data area size, the layout suggested by the user will not be honored. Data beyond max_linear_sz bytes will still be copied into fragments. Finally, since it is possible for a NIC to produce a xdp_buff with empty linear data area, allow it when calling bpf_test_init() from bpf_prog_test_run_xdp() so that we can test XDP kfuncs with such xdp_buff. This is done by moving lower-bound check to callers as most of them already do except bpf_prog_test_run_skb(). The change also fixes a bug that allows passing an xdp_buff with data < ETH_HLEN. This can happen when ctx is used and metadata is at least ETH_HLEN. Signed-off-by: Amery Hung Signed-off-by: Martin KaFai Lau Link: https://patch.msgid.link/20250922233356.3356453-7-ameryhung@gmail.com Stable-dep-of: e558cca21779 ("bpf, test_run: Subtract size of xdp_frame from allowed metadata size") Signed-off-by: Sasha Levin --- net/bpf/test_run.c | 15 ++++++++++++--- .../bpf/prog_tests/xdp_context_test_run.c | 4 +--- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 397736cc2d78..2bc83525986f 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -768,7 +768,7 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, void __user *data_in = u64_to_user_ptr(kattr->test.data_in); void *data; - if (user_size < ETH_HLEN || user_size > PAGE_SIZE - headroom - tailroom) + if (user_size > PAGE_SIZE - headroom - tailroom) return ERR_PTR(-EINVAL); size = SKB_DATA_ALIGN(size); @@ -1097,6 +1097,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr, if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; + if (size < ETH_HLEN) + return -EINVAL; + data = bpf_test_init(kattr, kattr->test.data_size_in, size, NET_SKB_PAD + NET_IP_ALIGN, SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); @@ -1277,7 +1280,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, { bool do_live = (kattr->test.flags & BPF_F_TEST_XDP_LIVE_FRAMES); u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - u32 retval = 0, duration, max_linear_sz, size; + u32 retval = 0, meta_sz = 0, duration, max_linear_sz, size; u32 linear_sz = kattr->test.data_size_in; u32 batch_size = kattr->test.batch_size; u32 headroom = XDP_PACKET_HEADROOM; @@ -1313,13 +1316,16 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, if (ctx) { /* There can't be user provided data before the meta data */ - if (ctx->data_meta || ctx->data_end != kattr->test.data_size_in || + if (ctx->data_meta || ctx->data_end > kattr->test.data_size_in || ctx->data > ctx->data_end || unlikely(xdp_metalen_invalid(ctx->data)) || (do_live && (kattr->test.data_out || kattr->test.ctx_out))) goto free_ctx; /* Meta data is allocated from the headroom */ headroom -= ctx->data; + + meta_sz = ctx->data; + linear_sz = ctx->data_end; } max_linear_sz = PAGE_SIZE - headroom - tailroom; @@ -1329,6 +1335,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, if (do_live && kattr->test.data_size_in > linear_sz) goto free_ctx; + if (kattr->test.data_size_in - meta_sz < ETH_HLEN) + return -EINVAL; + data = bpf_test_init(kattr, linear_sz, max_linear_sz, headroom, tailroom); if (IS_ERR(data)) { ret = PTR_ERR(data); diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c index ab4952b9fb1d..eab8625aad3b 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c @@ -80,9 +80,7 @@ void test_xdp_context_test_run(void) /* Meta data must be 32 bytes or smaller */ test_xdp_context_error(prog_fd, opts, 0, 36, sizeof(data), 0, 0, 0); - /* Total size of data must match data_end - data_meta */ - test_xdp_context_error(prog_fd, opts, 0, sizeof(__u32), - sizeof(data) - 1, 0, 0, 0); + /* Total size of data must be data_end - data_meta or larger */ test_xdp_context_error(prog_fd, opts, 0, sizeof(__u32), sizeof(data) + 1, 0, 0, 0); From e7440935063949d6f2c10f7328d960d0ff4bce90 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Mon, 5 Jan 2026 12:47:45 +0100 Subject: [PATCH 65/74] bpf, test_run: Subtract size of xdp_frame from allowed metadata size MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit e558cca217790286e799a8baacd1610bda31b261 ] The xdp_frame structure takes up part of the XDP frame headroom, limiting the size of the metadata. However, in bpf_test_run, we don't take this into account, which makes it possible for userspace to supply a metadata size that is too large (taking up the entire headroom). If userspace supplies such a large metadata size in live packet mode, the xdp_update_frame_from_buff() call in xdp_test_run_init_page() call will fail, after which packet transmission proceeds with an uninitialised frame structure, leading to the usual Bad Stuff. The commit in the Fixes tag fixed a related bug where the second check in xdp_update_frame_from_buff() could fail, but did not add any additional constraints on the metadata size. Complete the fix by adding an additional check on the metadata size. Reorder the checks slightly to make the logic clearer and add a comment. Link: https://lore.kernel.org/r/fa2be179-bad7-4ee3-8668-4903d1853461@hust.edu.cn Fixes: b6f1f780b393 ("bpf, test_run: Fix packet size check for live packet mode") Reported-by: Yinhao Hu Reported-by: Kaiyan Mei Signed-off-by: Toke Høiland-Jørgensen Reviewed-by: Amery Hung Link: https://lore.kernel.org/r/20260105114747.1358750-1-toke@redhat.com Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin --- net/bpf/test_run.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 2bc83525986f..88fdcfb2fdd3 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -1304,8 +1304,6 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, batch_size = NAPI_POLL_WEIGHT; else if (batch_size > TEST_XDP_MAX_BATCH) return -E2BIG; - - headroom += sizeof(struct xdp_page_head); } else if (batch_size) { return -EINVAL; } @@ -1318,16 +1316,26 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, /* There can't be user provided data before the meta data */ if (ctx->data_meta || ctx->data_end > kattr->test.data_size_in || ctx->data > ctx->data_end || - unlikely(xdp_metalen_invalid(ctx->data)) || (do_live && (kattr->test.data_out || kattr->test.ctx_out))) goto free_ctx; - /* Meta data is allocated from the headroom */ - headroom -= ctx->data; meta_sz = ctx->data; + if (xdp_metalen_invalid(meta_sz) || meta_sz > headroom - sizeof(struct xdp_frame)) + goto free_ctx; + + /* Meta data is allocated from the headroom */ + headroom -= meta_sz; linear_sz = ctx->data_end; } + /* The xdp_page_head structure takes up space in each page, limiting the + * size of the packet data; add the extra size to headroom here to make + * sure it's accounted in the length checks below, but not in the + * metadata size check above. + */ + if (do_live) + headroom += sizeof(struct xdp_page_head); + max_linear_sz = PAGE_SIZE - headroom - tailroom; linear_sz = min_t(u32, linear_sz, max_linear_sz); From 368569bc546d3368ee9980ba79fc42fdff9a3365 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Thu, 8 Jan 2026 21:36:48 +0900 Subject: [PATCH 66/74] bpf: Fix reference count leak in bpf_prog_test_run_xdp() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit ec69daabe45256f98ac86c651b8ad1b2574489a7 ] syzbot is reporting unregister_netdevice: waiting for sit0 to become free. Usage count = 2 problem. A debug printk() patch found that a refcount is obtained at xdp_convert_md_to_buff() from bpf_prog_test_run_xdp(). According to commit ec94670fcb3b ("bpf: Support specifying ingress via xdp_md context in BPF_PROG_TEST_RUN"), the refcount obtained by xdp_convert_md_to_buff() will be released by xdp_convert_buff_to_md(). Therefore, we can consider that the error handling path introduced by commit 1c1949982524 ("bpf: introduce frags support to bpf_prog_test_run_xdp()") forgot to call xdp_convert_buff_to_md(). Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Fixes: 1c1949982524 ("bpf: introduce frags support to bpf_prog_test_run_xdp()") Signed-off-by: Tetsuo Handa Reviewed-by: Toke Høiland-Jørgensen Link: https://lore.kernel.org/r/af090e53-9d9b-4412-8acb-957733b3975c@I-love.SAKURA.ne.jp Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin --- net/bpf/test_run.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 88fdcfb2fdd3..59edba6994c3 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -1373,13 +1373,13 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, if (sinfo->nr_frags == MAX_SKB_FRAGS) { ret = -ENOMEM; - goto out; + goto out_put_dev; } page = alloc_page(GFP_KERNEL); if (!page) { ret = -ENOMEM; - goto out; + goto out_put_dev; } frag = &sinfo->frags[sinfo->nr_frags++]; @@ -1392,7 +1392,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, if (copy_from_user(page_address(page), data_in + size, data_len)) { ret = -EFAULT; - goto out; + goto out_put_dev; } sinfo->xdp_frags_size += data_len; size += data_len; @@ -1407,6 +1407,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, ret = bpf_test_run_xdp_live(prog, &xdp, repeat, batch_size, &duration); else ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true); +out_put_dev: /* We convert the xdp_buff back to an xdp_md before checking the return * code so the reference count of any held netdevice will be decremented * even if the test run failed. From c20bbd3cf4b468b1ec5d4ffa7dd68a36de73320e Mon Sep 17 00:00:00 2001 From: Sumeet Pawnikar Date: Sat, 6 Dec 2025 00:32:16 +0530 Subject: [PATCH 67/74] powercap: fix race condition in register_control_type() [ Upstream commit 7bda1910c4bccd4b8d4726620bb3d6bbfb62286e ] The device becomes visible to userspace via device_register() even before it fully initialized by idr_init(). If userspace or another thread tries to register a zone immediately after device_register(), the control_type_valid() will fail because the control_type is not yet in the list. The IDR is not yet initialized, so this race condition causes zone registration failure. Move idr_init() and list addition before device_register() fix the race condition. Signed-off-by: Sumeet Pawnikar [ rjw: Subject adjustment, empty line added ] Link: https://patch.msgid.link/20251205190216.5032-1-sumeet4linux@gmail.com Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/powercap/powercap_sys.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c index fd475e463d1f..d7dadcaa3736 100644 --- a/drivers/powercap/powercap_sys.c +++ b/drivers/powercap/powercap_sys.c @@ -624,17 +624,23 @@ struct powercap_control_type *powercap_register_control_type( INIT_LIST_HEAD(&control_type->node); control_type->dev.class = &powercap_class; dev_set_name(&control_type->dev, "%s", name); - result = device_register(&control_type->dev); - if (result) { - put_device(&control_type->dev); - return ERR_PTR(result); - } idr_init(&control_type->idr); mutex_lock(&powercap_cntrl_list_lock); list_add_tail(&control_type->node, &powercap_cntrl_list); mutex_unlock(&powercap_cntrl_list_lock); + result = device_register(&control_type->dev); + if (result) { + mutex_lock(&powercap_cntrl_list_lock); + list_del(&control_type->node); + mutex_unlock(&powercap_cntrl_list_lock); + + idr_destroy(&control_type->idr); + put_device(&control_type->dev); + return ERR_PTR(result); + } + return control_type; } EXPORT_SYMBOL_GPL(powercap_register_control_type); From 5499fa1450e6365308375bd1a2c6c593963f7cb8 Mon Sep 17 00:00:00 2001 From: Sumeet Pawnikar Date: Sun, 7 Dec 2025 20:45:48 +0530 Subject: [PATCH 68/74] powercap: fix sscanf() error return value handling [ Upstream commit efc4c35b741af973de90f6826bf35d3b3ac36bf1 ] Fix inconsistent error handling for sscanf() return value check. Implicit boolean conversion is used instead of explicit return value checks. The code checks if (!sscanf(...)) which is incorrect because: 1. sscanf returns the number of successfully parsed items 2. On success, it returns 1 (one item passed) 3. On failure, it returns 0 or EOF 4. The check 'if (!sscanf(...))' is wrong because it treats success (1) as failure All occurrences of sscanf() now uses explicit return value check. With this behavior it returns '-EINVAL' when parsing fails (returns 0 or EOF), and continues when parsing succeeds (returns 1). Signed-off-by: Sumeet Pawnikar [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20251207151549.202452-1-sumeet4linux@gmail.com Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/powercap/powercap_sys.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c index d7dadcaa3736..72fa1f5affce 100644 --- a/drivers/powercap/powercap_sys.c +++ b/drivers/powercap/powercap_sys.c @@ -67,7 +67,7 @@ static ssize_t show_constraint_##_attr(struct device *dev, \ int id; \ struct powercap_zone_constraint *pconst;\ \ - if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \ + if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1) \ return -EINVAL; \ if (id >= power_zone->const_id_cnt) \ return -EINVAL; \ @@ -92,7 +92,7 @@ static ssize_t store_constraint_##_attr(struct device *dev,\ int id; \ struct powercap_zone_constraint *pconst;\ \ - if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \ + if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1) \ return -EINVAL; \ if (id >= power_zone->const_id_cnt) \ return -EINVAL; \ @@ -161,7 +161,7 @@ static ssize_t show_constraint_name(struct device *dev, ssize_t len = -ENODATA; struct powercap_zone_constraint *pconst; - if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) + if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1) return -EINVAL; if (id >= power_zone->const_id_cnt) return -EINVAL; From 46ca9dc978923c5e1247a9e9519240ba7ace413c Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Tue, 25 Nov 2025 22:39:59 +0900 Subject: [PATCH 69/74] can: j1939: make j1939_session_activate() fail if device is no longer registered [ Upstream commit 5d5602236f5db19e8b337a2cd87a90ace5ea776d ] syzbot is still reporting unregister_netdevice: waiting for vcan0 to become free. Usage count = 2 even after commit 93a27b5891b8 ("can: j1939: add missing calls in NETDEV_UNREGISTER notification handler") was added. A debug printk() patch found that j1939_session_activate() can succeed even after j1939_cancel_active_session() from j1939_netdev_notify(NETDEV_UNREGISTER) has completed. Since j1939_cancel_active_session() is processed with the session list lock held, checking ndev->reg_state in j1939_session_activate() with the session list lock held can reliably close the race window. Reported-by: syzbot Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Signed-off-by: Tetsuo Handa Acked-by: Oleksij Rempel Link: https://patch.msgid.link/b9653191-d479-4c8b-8536-1326d028db5c@I-love.SAKURA.ne.jp Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- net/can/j1939/transport.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c index 76d625c668e0..0522c223570c 100644 --- a/net/can/j1939/transport.c +++ b/net/can/j1939/transport.c @@ -1571,6 +1571,8 @@ int j1939_session_activate(struct j1939_session *session) if (active) { j1939_session_put(active); ret = -EAGAIN; + } else if (priv->ndev->reg_state != NETREG_REGISTERED) { + ret = -ENODEV; } else { WARN_ON_ONCE(session->state != J1939_SESSION_NEW); list_add_tail(&session->active_session_list_entry, From bb19a4335888ac26084c20d8b9bd842b87e6ea63 Mon Sep 17 00:00:00 2001 From: Andrew Elantsev Date: Wed, 10 Dec 2025 23:38:00 +0300 Subject: [PATCH 70/74] ASoC: amd: yc: Add quirk for Honor MagicBook X16 2025 [ Upstream commit e2cb8ef0372665854fca6fa7b30b20dd35acffeb ] Add a DMI quirk for the Honor MagicBook X16 2025 laptop fixing the issue where the internal microphone was not detected. Signed-off-by: Andrew Elantsev Link: https://patch.msgid.link/20251210203800.142822-1-elantsew.andrew@gmail.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/amd/yc/acp6x-mach.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c index fce918a089e3..cbb2a1d1a823 100644 --- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -521,6 +521,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "Bravo 15 C7UCX"), } }, + { + .driver_data = &acp6x_card, + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "HONOR"), + DMI_MATCH(DMI_PRODUCT_NAME, "GOH-X"), + } + }, {} }; From 105eca525b46c490dc56d4df3dd689dfdb110db7 Mon Sep 17 00:00:00 2001 From: Alexander Stein Date: Tue, 16 Dec 2025 11:22:45 +0100 Subject: [PATCH 71/74] ASoC: fsl_sai: Add missing registers to cache default [ Upstream commit 90ed688792a6b7012b3e8a2f858bc3fe7454d0eb ] Drivers does cache sync during runtime resume, setting all writable registers. Not all writable registers are set in cache default, resulting in the erorr message: fsl-sai 30c30000.sai: using zero-initialized flat cache, this may cause unexpected behavior Fix this by adding missing writable register defaults. Signed-off-by: Alexander Stein Link: https://patch.msgid.link/20251216102246.676181-1-alexander.stein@ew.tq-group.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/fsl/fsl_sai.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c index f5266be2bbc2..9a5be1d72a90 100644 --- a/sound/soc/fsl/fsl_sai.c +++ b/sound/soc/fsl/fsl_sai.c @@ -979,6 +979,7 @@ static struct reg_default fsl_sai_reg_defaults_ofs0[] = { {FSL_SAI_TDR6, 0}, {FSL_SAI_TDR7, 0}, {FSL_SAI_TMR, 0}, + {FSL_SAI_TTCTL, 0}, {FSL_SAI_RCR1(0), 0}, {FSL_SAI_RCR2(0), 0}, {FSL_SAI_RCR3(0), 0}, @@ -1002,12 +1003,14 @@ static struct reg_default fsl_sai_reg_defaults_ofs8[] = { {FSL_SAI_TDR6, 0}, {FSL_SAI_TDR7, 0}, {FSL_SAI_TMR, 0}, + {FSL_SAI_TTCTL, 0}, {FSL_SAI_RCR1(8), 0}, {FSL_SAI_RCR2(8), 0}, {FSL_SAI_RCR3(8), 0}, {FSL_SAI_RCR4(8), 0}, {FSL_SAI_RCR5(8), 0}, {FSL_SAI_RMR, 0}, + {FSL_SAI_RTCTL, 0}, {FSL_SAI_MCTL, 0}, {FSL_SAI_MDIV, 0}, }; From 98115f8dc31e68b0c6d1de2b88c7b6a51e773f35 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20R=C3=A1bek?= Date: Fri, 12 Dec 2025 17:08:23 +0100 Subject: [PATCH 72/74] scsi: sg: Fix occasional bogus elapsed time that exceeds timeout MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 0e1677654259a2f3ccf728de1edde922a3c4ba57 ] A race condition was found in sg_proc_debug_helper(). It was observed on a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring /proc/scsi/sg/debug every second. A very large elapsed time would sometimes appear. This is caused by two race conditions. We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64 architecture. A patched kernel was built, and the race condition could not be observed anymore after the application of this patch. A reproducer C program utilising the scsi_debug module was also built by Changhui Zhong and can be viewed here: https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c The first race happens between the reading of hp->duration in sg_proc_debug_helper() and request completion in sg_rq_end_io(). The hp->duration member variable may hold either of two types of information: #1 - The start time of the request. This value is present while the request is not yet finished. #2 - The total execution time of the request (end_time - start_time). If sg_proc_debug_helper() executes *after* the value of hp->duration was changed from #1 to #2, but *before* srp->done is set to 1 in sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the elapsed time (value type #2) is subtracted from a timestamp, which cannot yield a valid elapsed time (which is a type #2 value as well). To fix this issue, the value of hp->duration must change under the protection of the sfp->rq_list_lock in sg_rq_end_io(). Since sg_proc_debug_helper() takes this read lock, the change to srp->done and srp->header.duration will happen atomically from the perspective of sg_proc_debug_helper() and the race condition is thus eliminated. The second race condition happens between sg_proc_debug_helper() and sg_new_write(). Even though hp->duration is set to the current time stamp in sg_add_request() under the write lock's protection, it gets overwritten by a call to get_sg_io_hdr(), which calls copy_from_user() to copy struct sg_io_hdr from userspace into kernel space. hp->duration is set to the start time again in sg_common_write(). If sg_proc_debug_helper() is called between these two calls, an arbitrary value set by userspace (usually zero) is used to compute the elapsed time. To fix this issue, hp->duration must be set to the current timestamp again after get_sg_io_hdr() returns successfully. A small race window still exists between get_sg_io_hdr() and setting hp->duration, but this window is only a few instructions wide and does not result in observable issues in practice, as confirmed by testing. Additionally, we fix the format specifier from %d to %u for printing unsigned int values in sg_proc_debug_helper(). Signed-off-by: Michal Rábek Suggested-by: Tomas Henzl Tested-by: Changhui Zhong Reviewed-by: Ewan D. Milne Reviewed-by: John Meneghini Reviewed-by: Tomas Henzl Link: https://patch.msgid.link/20251212160900.64924-1-mrabek@redhat.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/sg.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 7a5d73f89f45..b63f7c09c97a 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -735,6 +735,8 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf, sg_remove_request(sfp, srp); return -EFAULT; } + hp->duration = jiffies_to_msecs(jiffies); + if (hp->interface_id != 'S') { sg_remove_request(sfp, srp); return -ENOSYS; @@ -819,7 +821,6 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp, return -ENODEV; } - hp->duration = jiffies_to_msecs(jiffies); if (hp->interface_id != '\0' && /* v3 (or later) interface */ (SG_FLAG_Q_AT_TAIL & hp->flags)) at_head = 0; @@ -1342,9 +1343,6 @@ sg_rq_end_io(struct request *rq, blk_status_t status) "sg_cmd_done: pack_id=%d, res=0x%x\n", srp->header.pack_id, result)); srp->header.resid = resid; - ms = jiffies_to_msecs(jiffies); - srp->header.duration = (ms > srp->header.duration) ? - (ms - srp->header.duration) : 0; if (0 != result) { struct scsi_sense_hdr sshdr; @@ -1393,6 +1391,9 @@ sg_rq_end_io(struct request *rq, blk_status_t status) done = 0; } srp->done = done; + ms = jiffies_to_msecs(jiffies); + srp->header.duration = (ms > srp->header.duration) ? + (ms - srp->header.duration) : 0; write_unlock_irqrestore(&sfp->rq_list_lock, iflags); if (likely(done)) { @@ -2529,6 +2530,7 @@ static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp) const sg_io_hdr_t *hp; const char * cp; unsigned int ms; + unsigned int duration; k = 0; list_for_each_entry(fp, &sdp->sfds, sfd_siblings) { @@ -2566,13 +2568,17 @@ static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp) seq_printf(s, " id=%d blen=%d", srp->header.pack_id, blen); if (srp->done) - seq_printf(s, " dur=%d", hp->duration); + seq_printf(s, " dur=%u", hp->duration); else { ms = jiffies_to_msecs(jiffies); - seq_printf(s, " t_o/elap=%d/%d", + duration = READ_ONCE(hp->duration); + if (duration) + duration = (ms > duration ? + ms - duration : 0); + seq_printf(s, " t_o/elap=%u/%u", (new_interface ? hp->timeout : jiffies_to_msecs(fp->timeout)), - (ms > hp->duration ? ms - hp->duration : 0)); + duration); } seq_printf(s, "ms sgat=%d op=0x%02x\n", usg, (int) srp->data.cmd_opcode); From 805e88b15d6f85d2b9a653b6a87a11b579e2668c Mon Sep 17 00:00:00 2001 From: Shardul Bankar Date: Tue, 14 Oct 2025 17:30:37 +0530 Subject: [PATCH 73/74] bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path commit 7f9ee5fc97e14682e36fe22ae2654c07e4998b82 upstream. Fix a memory leak in bpf_prog_test_run_xdp() where the context buffer allocated by bpf_ctx_init() is not freed when the function returns early due to a data size check. On the failing path: ctx = bpf_ctx_init(...); if (kattr->test.data_size_in - meta_sz < ETH_HLEN) return -EINVAL; The early return bypasses the cleanup label that kfree()s ctx, leading to a leak detectable by kmemleak under fuzzing. Change the return to jump to the existing free_ctx label. Fixes: fe9544ed1a2e ("bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN") Reported-by: BPF Runtime Fuzzer (BRF) Signed-off-by: Shardul Bankar Signed-off-by: Martin KaFai Lau Acked-by: Jiri Olsa Acked-by: Daniel Borkmann Link: https://patch.msgid.link/20251014120037.1981316-1-shardulsb08@gmail.com Signed-off-by: Greg Kroah-Hartman --- net/bpf/test_run.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 59edba6994c3..ab8d21372b7d 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -1344,7 +1344,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, goto free_ctx; if (kattr->test.data_size_in - meta_sz < ETH_HLEN) - return -EINVAL; + goto free_ctx; data = bpf_test_init(kattr, linear_sz, max_linear_sz, headroom, tailroom); if (IS_ERR(data)) { From cd9b81672742d239a3f05fe545bf4212b56d0ac9 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sat, 17 Jan 2026 16:39:34 +0100 Subject: [PATCH 74/74] Linux 6.1.161 Link: https://lore.kernel.org/r/20260115164143.482647486@linuxfoundation.org Tested-by: Brett A C Sheffield Tested-by: Slade Watkins Tested-by: Florian Fainelli Tested-by: Salvatore Bonaccorso Tested-by: Francesco Dolcini Tested-by: Peter Schneider Tested-by: Ron Economos Tested-by: Jon Hunter Tested-by: Mark Brown Tested-by: Hardik Garg Tested-by: Shuah Khan Tested-by: Miguel Ojeda Signed-off-by: Greg Kroah-Hartman --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 23f2092ee3ab..1f4696571746 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 6 PATCHLEVEL = 1 -SUBLEVEL = 160 +SUBLEVEL = 161 EXTRAVERSION = NAME = Curry Ramen