linux-yocto

mirror of git://git.yoctoproject.org/linux-yocto.git synced 2025-10-22 15:03:53 +02:00

Author	SHA1	Message	Date
Alex Deucher	b477c5668e	drm/amdkfd: add proper handling for S0ix commit 2ade36eaa9ac05e4913e9785df19c2cde8f912fb upstream. When in S0i3, the GFX state is retained, so all we need to do is stop the runlist so GFX can enter gfxoff. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Tested-by: David Perry <david.perry@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4bfa8609934dbf39bbe6e75b4f971469384b50b1) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:49 +02:00
Maciej S. Szmigiero	0e2db61cc5	KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active commit d02e48830e3fce9701265f6c5a58d9bdaf906a76 upstream. Commit `3bbf3565f4` ("svm: Do not intercept CR8 when enable AVIC") inhibited pre-VMRUN sync of TPR from LAPIC into VMCB::V_TPR in sync_lapic_to_cr8() when AVIC is active. AVIC does automatically sync between these two fields, however it does so only on explicit guest writes to one of these fields, not on a bare VMRUN. This meant that when AVIC is enabled host changes to TPR in the LAPIC state might not get automatically copied into the V_TPR field of VMCB. This is especially true when it is the userspace setting LAPIC state via KVM_SET_LAPIC ioctl() since userspace does not have access to the guest VMCB. Practice shows that it is the V_TPR that is actually used by the AVIC to decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI), so any leftover value in V_TPR will cause serious interrupt delivery issues in the guest when AVIC is enabled. Fix this issue by doing pre-VMRUN TPR sync from LAPIC into VMCB::V_TPR even when AVIC is enabled. Fixes: `3bbf3565f4` ("svm: Do not intercept CR8 when enable AVIC") Cc: stable@vger.kernel.org Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/c231be64280b1461e854e1ce3595d70cde3a2e9d.1756139678.git.maciej.szmigiero@oracle.com [sean: tag for stable@] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:49 +02:00
Tom Lendacky	c0603b8043	x86/sev: Guard sev_evict_cache() with CONFIG_AMD_MEM_ENCRYPT commit 7f830e126dc357fc086905ce9730140fd4528d66 upstream. The sev_evict_cache() is guest-related code and should be guarded by CONFIG_AMD_MEM_ENCRYPT, not CONFIG_KVM_AMD_SEV. CONFIG_AMD_MEM_ENCRYPT=y is required for a guest to run properly as an SEV-SNP guest, but a guest kernel built with CONFIG_KVM_AMD_SEV=n would get the stub function of sev_evict_cache() instead of the version that performs the actual eviction. Move the function declarations under the appropriate #ifdef. Fixes: 7b306dfa326f ("x86/sev: Evict cache lines during SNP memory validation") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@kernel.org # 6.16.x Link: https://lore.kernel.org/r/70e38f2c4a549063de54052c9f64929705313526.1757708959.git.thomas.lendacky@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:49 +02:00
Ben Chuang	79a9ba8da9	mmc: sdhci-uhs2: Fix calling incorrect sdhci_set_clock() function commit 09c2b628f6403ad467fc73326a50020590603871 upstream. Fix calling incorrect sdhci_set_clock() in __sdhci_uhs2_set_ios() when the vendor defines its own sdhci_set_clock(). Fixes: `10c8298a05` ("mmc: sdhci-uhs2: add set_ios()") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Ben Chuang <ben.chuang@genesyslogic.com.tw> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:49 +02:00
Ben Chuang	7650c994ce	mmc: sdhci-pci-gli: GL9767: Fix initializing the UHS-II interface during a power-on commit 77a436c93d10d68201bfd4941d1ca3230dfd1f40 upstream. According to the power structure of IC hardware design for UHS-II interface, reset control and timing must be added to the initialization process of powering on the UHS-II interface. Fixes: `27dd3b8255` ("mmc: sdhci-pci-gli: enable UHS-II mode for GL9767") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Ben Chuang <ben.chuang@genesyslogic.com.tw> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:49 +02:00
Ben Chuang	7186d8e8bd	mmc: sdhci: Move the code related to setting the clock from sdhci_set_ios_common() into sdhci_set_ios() commit 7b7e71683b4ccbe0dbd7d434707623327e852f20 upstream. The sdhci_set_clock() is called in sdhci_set_ios_common() and __sdhci_uhs2_set_ios(). According to Section 3.13.2 "Card Interface Detection Sequence" of the SD Host Controller Standard Specification Version 7.00, the SD clock is supplied after power is supplied, so we only need one in __sdhci_uhs2_set_ios(). Let's move the code related to setting the clock from sdhci_set_ios_common() into sdhci_set_ios() and modify the parameters passed to sdhci_set_clock() in __sdhci_uhs2_set_ios(). Fixes: `10c8298a05` ("mmc: sdhci-uhs2: add set_ios()") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Ben Chuang <ben.chuang@genesyslogic.com.tw> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:49 +02:00
Thomas Fourier	d0b7ff384b	mmc: mvsdio: Fix dma_unmap_sg() nents value commit 8ab2f1c35669bff7d7ed1bb16bf5cc989b3e2e17 upstream. The dma_unmap_sg() functions should be called with the same nents as the dma_map_sg(), not the value the map function returned. Fixes: `236caa7cc3` ("mmc: SDIO driver for Marvell SoCs") Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:49 +02:00
Mohammad Rafi Shaik	66e6d1c928	ASoC: qcom: q6apm-lpass-dais: Fix missing set_fmt DAI op for I2S commit 33b55b94bca904ca25a9585e3cd43d15f0467969 upstream. The q6i2s_set_fmt() function was defined but never linked into the I2S DAI operations, resulting DAI format settings is being ignored during stream setup. This change fixes the issue by properly linking the .set_fmt handler within the DAI ops. Fixes: `30ad723b93` ("ASoC: qdsp6: audioreach: add q6apm lpass dai support") Cc: stable@vger.kernel.org Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Signed-off-by: Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com> Message-ID: <20250908053631.70978-3-mohammad.rafi.shaik@oss.qualcomm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Krzysztof Kozlowski	cc336b242e	ASoC: qcom: q6apm-lpass-dais: Fix NULL pointer dereference if source graph failed commit 68f27f7c7708183e7873c585ded2f1b057ac5b97 upstream. If earlier opening of source graph fails (e.g. ADSP rejects due to incorrect audioreach topology), the graph is closed and "dai_data->graph[dai->id]" is assigned NULL. Preparing the DAI for sink graph continues though and next call to q6apm_lpass_dai_prepare() receives dai_data->graph[dai->id]=NULL leading to NULL pointer exception: qcom-apm gprsvc:service:2:1: Error (1) Processing 0x01001002 cmd qcom-apm gprsvc:service:2:1: DSP returned error[1001002] 1 q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: fail to start APM port 78 q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: ASoC: error at snd_soc_pcm_dai_prepare on TX_CODEC_DMA_TX_3: -22 Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8 ... Call trace: q6apm_graph_media_format_pcm+0x48/0x120 (P) q6apm_lpass_dai_prepare+0x110/0x1b4 snd_soc_pcm_dai_prepare+0x74/0x108 __soc_pcm_prepare+0x44/0x160 dpcm_be_dai_prepare+0x124/0x1c0 Fixes: `30ad723b93` ("ASoC: qdsp6: audioreach: add q6apm lpass dai support") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Message-ID: <20250904101849.121503-2-krzysztof.kozlowski@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Mohammad Rafi Shaik	59c4accddf	ASoC: qcom: audioreach: Fix lpaif_type configuration for the I2S interface commit 5f1af203ef964e7f7bf9d32716dfa5f332cc6f09 upstream. Fix missing lpaif_type configuration for the I2S interface. The proper lpaif interface type required to allow DSP to vote appropriate clock setting for I2S interface. Fixes: `25ab80db6b` ("ASoC: qdsp6: audioreach: add module configuration command helpers") Cc: stable@vger.kernel.org Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Signed-off-by: Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com> Message-ID: <20250908053631.70978-2-mohammad.rafi.shaik@oss.qualcomm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Maciej Strozek	8276c97dcc	ASoC: SDCA: Add quirk for incorrect function types for 3 systems commit 28edfaa10ca1b370b1a27fde632000d35c43402c upstream. Certain systems have CS42L43 DisCo that claims to conform to version 0.6.28 but uses the function types from the 1.0 spec. Add a quirk as a workaround. Closes: https://github.com/thesofproject/linux/issues/5515 Cc: stable@vger.kernel.org Signed-off-by: Maciej Strozek <mstrozek@opensource.cirrus.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev> Link: https://patch.msgid.link/20250901151518.3197941-1-mstrozek@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Qu Wenruo	417ed00d48	btrfs: tree-checker: fix the incorrect inode ref size check commit 96fa515e70f3e4b98685ef8cac9d737fc62f10e1 upstream. [BUG] Inside check_inode_ref(), we need to make sure every structure, including the btrfs_inode_extref header, is covered by the item. But our code is incorrectly using "sizeof(iref)", where @iref is just a pointer. This means "sizeof(iref)" will always be "sizeof(void )", which is much smaller than "sizeof(struct btrfs_inode_extref)". This will allow some bad inode extrefs to sneak in, defeating tree-checker. [FIX] Fix the typo by calling "sizeof(iref)", which is the same as "sizeof(struct btrfs_inode_extref)", and will be the correct behavior we want. Fixes: `71bf92a9b8` ("btrfs: tree-checker: Add check for INODE_REF") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Niklas Schnelle	359613f2fa	iommu/s390: Make attach succeed when the device was surprise removed commit 9ffaf5229055fcfbb3b3d6f1c7e58d63715c3f73 upstream. When a PCI device is removed with surprise hotplug, there may still be attempts to attach the device to the default domain as part of tear down via (__iommu_release_dma_ownership()), or because the removal happens during probe (__iommu_probe_device()). In both cases zpci_register_ioat() fails with a cc value indicating that the device handle is invalid. This is because the device is no longer part of the instance as far as the hypervisor is concerned. Currently this leads to an error return and s390_iommu_attach_device() fails. This triggers the WARN_ON() in __iommu_group_set_domain_nofail() because attaching to the default domain must never fail. With the device fenced by the hypervisor no DMAs to or from memory are possible and the IOMMU translations have no effect. Proceed as if the registration was successful and let the hotplug event handling clean up the device. This is similar to how devices in the error state are handled since commit `59bbf59679` ("iommu/s390: Make attach succeed even if the device is in error state") except that for removal the domain will not be registered later. This approach was also previously discussed at the link. Handle both cases, error state and removal, in a helper which checks if the error needs to be propagated or ignored. Avoid magic number condition codes by using the pre-existing, but never used, defines for PCI load/store condition codes and rename them to reflect that they apply to all PCI instructions. Cc: stable@vger.kernel.org # v6.2 Link: https://lore.kernel.org/linux-iommu/20240808194155.GD1985367@ziepe.ca/ Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Link: https://lore.kernel.org/r/20250904-iommu_succeed_attach_removed-v1-1-e7f333d2f80f@linux.ibm.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Matthew Rosato	17a58caf38	iommu/s390: Fix memory corruption when using identity domain commit b3506e9bcc777ed6af2ab631c86a9990ed97b474 upstream. zpci_get_iommu_ctrs() returns counter information to be reported as part of device statistics; these counters are stored as part of the s390_domain. The problem, however, is that the identity domain is not backed by an s390_domain and so the conversion via to_s390_domain() yields a bad address that is zero'd initially and read on-demand later via a sysfs read. These counters aren't necessary for the identity domain; just return NULL in this case. This issue was discovered via KASAN with reports that look like: BUG: KASAN: global-out-of-bounds in zpci_fmb_enable_device when using the identity domain for a device on s390. Cc: stable@vger.kernel.org Fixes: `64af12c6ec` ("iommu/s390: implement iommu passthrough via identity domain") Reported-by: Cam Miller <cam@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Cam Miller <cam@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20250827210828.274527-1-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Vasant Hegde	7d462bdecb	iommu/amd/pgtbl: Fix possible race while increase page table level commit 1e56310b40fd2e7e0b9493da9ff488af145bdd0c upstream. The AMD IOMMU host page table implementation supports dynamic page table levels (up to 6 levels), starting with a 3-level configuration that expands based on IOVA address. The kernel maintains a root pointer and current page table level to enable proper page table walks in alloc_pte()/fetch_pte() operations. The IOMMU IOVA allocator initially starts with 32-bit address and onces its exhuasted it switches to 64-bit address (max address is determined based on IOMMU and device DMA capability). To support larger IOVA, AMD IOMMU driver increases page table level. But in unmap path (iommu_v1_unmap_pages()), fetch_pte() reads pgtable->[root/mode] without lock. So its possible that in exteme corner case, when increase_address_space() is updating pgtable->[root/mode], fetch_pte() reads wrong page table level (pgtable->mode). It does compare the value with level encoded in page table and returns NULL. This will result is iommu_unmap ops to fail and upper layer may retry/log WARN_ON. CPU 0 CPU 1 ------ ------ map pages unmap pages alloc_pte() -> increase_address_space() iommu_v1_unmap_pages() -> fetch_pte() pgtable->root = pte (new root value) READ pgtable->[mode/root] Reads new root, old mode Updates mode (pgtable->mode += 1) Since Page table level updates are infrequent and already synchronized with a spinlock, implement seqcount to enable lock-free read operations on the read path. Fixes: `754265bcab` ("iommu/amd: Fix race in increase_address_space()") Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: stable@vger.kernel.org Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Zhen Ni	b0c0e23106	iommu/amd: Fix ivrs_base memleak in early_amd_iommu_init() commit 923b70581cb6acede90f8aaf4afe5d1c58c67b71 upstream. Fix a permanent ACPI table memory leak in early_amd_iommu_init() when CMPXCHG16B feature is not supported Fixes: `82582f85ed` ("iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported") Cc: stable@vger.kernel.org Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20250822024915.673427-1-zhen.ni@easystack.cn Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Eugene Koira	7ff7d16649	iommu/vt-d: Fix __domain_mapping()'s usage of switch_to_super_page() commit dce043c07ca1ac19cfbe2844a6dc71e35c322353 upstream. switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected. Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600), which should be backed by the following mappings: iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff. Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level. Additionally add range alignment checks to switch_to_super_page. Fixes: `9906b9352a` ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira <eugkoira@amazon.com> Cc: stable@vger.kernel.org Reviewed-by: Nicolas Saenz Julienne <nsaenz@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20250826143816.38686-1-eugkoira@amazon.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Bibo Mao	1c73128437	LoongArch: KVM: Fix VM migration failure with PTW enabled commit f58c9aa1065f73d243904b267c71f6a9d1e9f90e upstream. With PTW disabled system, bit _PAGE_DIRTY is a HW bit for page writing. However with PTW enabled system, bit _PAGE_WRITE is also a "HW bit" for page writing, because hardware synchronizes _PAGE_WRITE to _PAGE_DIRTY automatically. Previously, _PAGE_WRITE is treated as a SW bit to record the page writeable attribute for the fast page fault handling in the secondary MMU, however with PTW enabled machine, this bit is used by HW already (so setting it will silence the TLB modify exception). Here define KVM_PAGE_WRITEABLE with the SW bit _PAGE_MODIFIED, so that it can work on both PTW disabled and enabled machines. And for HW write bits, both _PAGE_DIRTY and _PAGE_WRITE are set or clear together. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Bibo Mao	960eedb14c	LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_pch_pic_regs_access() commit 8dc5245673cf7f33743e5c0d2a4207c0b8df3067 upstream. Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move function calling of copy_from_user() and copy_to_user() out of spinlock context in function kvm_pch_pic_regs_access(). Otherwise there will be possible warning such as: BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_pch_pic_regs_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158 Cc: stable@vger.kernel.org Fixes: `d206d95148` ("LoongArch: KVM: Add PCHPIC user mode read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:48 +02:00
Bibo Mao	55ba91b4e0	LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_sw_status_access() commit 01a8e68396a6d51f5ba92021ad1a4b8eaabdd0e7 upstream. Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move funtcion calling of copy_from_user() and copy_to_user() out of function kvm_eiointc_sw_status_access(). Otherwise there will be possible warning such as: BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_eiointc_sw_status_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158 Cc: stable@vger.kernel.org Fixes: `1ad7efa552` ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Bibo Mao	105605ca76	LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_regs_access() commit 62f11796a0dfa1a2ef5f50a2d1bc81c81628fb8e upstream. Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move function calling of copy_from_user() and copy_to_user() before spinlock context in function kvm_eiointc_ctrl_access(). Otherwise there will be possible warning such as: BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_eiointc_regs_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158 Cc: stable@vger.kernel.org Fixes: `1ad7efa552` ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Bibo Mao	291d4b01d3	LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_ctrl_access() commit 47256c4c8b1bfbc63223a0da2d4fa90b6ede5cbb upstream. Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move function calling of copy_from_user() and copy_to_user() before spinlock context in function kvm_eiointc_ctrl_access(). Otherwise there will be possible warning such as: BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_eiointc_ctrl_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158 Cc: stable@vger.kernel.org Fixes: `1ad7efa552` ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Tiezhu Yang	401363c839	LoongArch: Handle jump tables options for RUST commit 74f8295c6fb8436bec9995baf6ba463151b6fb68 upstream. When compiling with LLVM and CONFIG_RUST is set, there exist objtool warnings in rust/core.o and rust/kernel.o, like this: rust/core.o: warning: objtool: _RNvXs1_NtNtCs5QSdWC790r4_4core5ascii10ascii_charNtB5_9AsciiCharNtNtB9_3fmt5Debug3fmt+0x54: sibling call from callable instruction with modified stack frame For this special case, the related object file shows that there is no generated relocation section '.rela.discard.tablejump_annotate' for the table jump instruction jirl, thus objtool can not know that what is the actual destination address. If rustc has the option "-Cllvm-args=--loongarch-annotate-tablejump", pass the option to enable jump tables for objtool, otherwise it should pass "-Zno-jump-tables" to keep compatibility with older rustc. How to test: $ rustup component add rust-src $ make LLVM=1 rustavailable $ make ARCH=loongarch LLVM=1 clean defconfig $ scripts/config -d MODVERSIONS \ -e RUST -e SAMPLES -e SAMPLES_RUST \ -e SAMPLE_RUST_CONFIGFS -e SAMPLE_RUST_MINIMAL \ -e SAMPLE_RUST_MISC_DEVICE -e SAMPLE_RUST_PRINT \ -e SAMPLE_RUST_DMA -e SAMPLE_RUST_DRIVER_PCI \ -e SAMPLE_RUST_DRIVER_PLATFORM -e SAMPLE_RUST_DRIVER_FAUX \ -e SAMPLE_RUST_DRIVER_AUXILIARY -e SAMPLE_RUST_HOSTPROGS $ make ARCH=loongarch LLVM=1 olddefconfig all Cc: stable@vger.kernel.org Acked-by: Miguel Ojeda <ojeda@kernel.org> Reported-by: Miguel Ojeda <ojeda@kernel.org> Closes: https://lore.kernel.org/rust-for-linux/CANiq72mNeCuPkCDrG2db3w=AX+O-zYrfprisDPmRac_qh65Dmg@mail.gmail.com/ Suggested-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Tiezhu Yang	1967642780	LoongArch: Make LTO case independent in Makefile commit b15212824a01cb0b62f7b522f4ee334622cf982a upstream. LTO is not only used for Clang, but maybe also used for Rust, make LTO case out of CONFIG_CC_HAS_ANNOTATE_TABLEJUMP in Makefile. This is preparation for later patch, no function changes. Cc: stable@vger.kernel.org Suggested-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Tao Cui	db65fea5f0	LoongArch: Check the return value when creating kobj commit 51adb03e6b865c0c6790f29659ff52d56742de2e upstream. Add a check for the return value of kobject_create_and_add(), to ensure that the kobj allocation succeeds for later use. Cc: stable@vger.kernel.org Signed-off-by: Tao Cui <cuitao@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Huacai Chen	5f2b63a398	LoongArch: Align ACPI structures if ARCH_STRICT_ALIGN enabled commit a9d13433fe17be0e867e51e71a1acd2731fbef8d upstream. ARCH_STRICT_ALIGN is used for hardware without UAL, now it only control the -mstrict-align flag. However, ACPI structures are packed by default so will cause unaligned accesses. To avoid this, define ACPI_MISALIGNMENT_NOT_SUPPORTED in asm/acenv.h to align ACPI structures if ARCH_STRICT_ALIGN enabled. Cc: stable@vger.kernel.org Reported-by: Binbin Zhou <zhoubinbin@loongson.cn> Suggested-by: Xi Ruoyao <xry111@xry111.site> Suggested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Guangshuo Li	a417571950	LoongArch: vDSO: Check kcalloc() result in init_vdso() commit ac398f570724c41e5e039d54e4075519f6af7408 upstream. Add a NULL-pointer check after the kcalloc() call in init_vdso(). If allocation fails, return -ENOMEM to prevent a possible dereference of vdso_info.code_mapping.pages when it is NULL. Cc: stable@vger.kernel.org Fixes: `2ed119aef6` ("LoongArch: Set correct size for vDSO code mapping") Signed-off-by: Guangshuo Li <202321181@mail.sdu.edu.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Tiezhu Yang	2feeecd7c6	LoongArch: Fix unreliable stack for live patching commit 677d4a52d4dc4a147d5e84af9ff207832578be70 upstream. When testing the kernel live patching with "modprobe livepatch-sample", there is a timeout over 15 seconds from "starting patching transition" to "patching complete". The dmesg command shows "unreliable stack" for user tasks in debug mode, here is one of the messages: livepatch: klp_try_switch_task: bash:1193 has an unreliable stack The "unreliable stack" is because it can not unwind from do_syscall() to its previous frame handle_syscall(). It should use fp to find the original stack top due to secondary stack in do_syscall(), but fp is not used for some other functions, then fp can not be restored by the next frame of do_syscall(), so it is necessary to save fp if task is not current, in order to get the stack top of do_syscall(). Here are the call chains: klp_enable_patch() klp_try_complete_transition() klp_try_switch_task() klp_check_and_switch_task() klp_check_stack() stack_trace_save_tsk_reliable() arch_stack_walk_reliable() When executing "rmmod livepatch-sample", there exists a similar issue. With this patch, it takes a short time for patching and unpatching. Before: # modprobe livepatch-sample # dmesg -T \| tail -3 [Sat Sep 6 11:00:20 2025] livepatch: 'livepatch_sample': starting patching transition [Sat Sep 6 11:00:35 2025] livepatch: signaling remaining tasks [Sat Sep 6 11:00:36 2025] livepatch: 'livepatch_sample': patching complete # echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled # rmmod livepatch_sample rmmod: ERROR: Module livepatch_sample is in use # rmmod livepatch_sample # dmesg -T \| tail -3 [Sat Sep 6 11:06:05 2025] livepatch: 'livepatch_sample': starting unpatching transition [Sat Sep 6 11:06:20 2025] livepatch: signaling remaining tasks [Sat Sep 6 11:06:21 2025] livepatch: 'livepatch_sample': unpatching complete After: # modprobe livepatch-sample # dmesg -T \| tail -2 [Tue Sep 16 16:19:30 2025] livepatch: 'livepatch_sample': starting patching transition [Tue Sep 16 16:19:31 2025] livepatch: 'livepatch_sample': patching complete # echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled # rmmod livepatch_sample # dmesg -T \| tail -2 [Tue Sep 16 16:19:36 2025] livepatch: 'livepatch_sample': starting unpatching transition [Tue Sep 16 16:19:37 2025] livepatch: 'livepatch_sample': unpatching complete Cc: stable@vger.kernel.org # v6.9+ Fixes: `199cc14cb4` ("LoongArch: Add kernel livepatching support") Reported-by: Xi Zhang <zhangxi@kylinos.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Tiezhu Yang	5dbbc7b04c	objtool/LoongArch: Mark special atomic instruction as INSN_BUG type commit 539d7344d4feaea37e05863e9aa86bd31f28e46f upstream. When compiling with LLVM and CONFIG_RUST is set, there exists the following objtool warning: rust/compiler_builtins.o: warning: objtool: __rust__unordsf2(): unexpected end of section .text.unlikely. objdump shows that the end of section .text.unlikely is an atomic instruction: amswap.w $zero, $ra, $zero According to the LoongArch Reference Manual, if the amswap.w atomic memory access instruction has the same register number as rd and rj, the execution will trigger an Instruction Non-defined Exception, so mark the above instruction as INSN_BUG type to fix the warning. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Tiezhu Yang	e0aefa8f46	objtool/LoongArch: Mark types based on break immediate code commit baad7830ee9a56756b3857348452fe756cb0a702 upstream. If the break immediate code is 0, it should mark the type as INSN_TRAP. If the break immediate code is 1, it should mark the type as INSN_BUG. While at it, format the code style and add the code comment for nop. Cc: stable@vger.kernel.org Suggested-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Tiezhu Yang	953138ff0f	LoongArch: Update help info of ARCH_STRICT_ALIGN commit f5003098e2f337d8e8a87dc636250e3fa978d9ad upstream. Loongson-3A6000 and 3C6000 CPUs also support unaligned memory access, so the current description is out of date to some extent. Actually, all of Loongson-3 series processors based on LoongArch support unaligned memory access, this hardware capability is indicated by the bit 20 (UAL) of CPUCFG1 register, update the help info to reflect the reality. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:47 +02:00
Hugh Dickins	1eda9ab8da	mm: folio_may_be_lru_cached() unless folio_test_large() commit 2da6de30e60dd9bb14600eff1cc99df2fa2ddae3 upstream. mm/swap.c and mm/mlock.c agree to drain any per-CPU batch as soon as a large folio is added: so collect_longterm_unpinnable_folios() just wastes effort when calling lru_add_drain[_all]() on a large folio. But although there is good reason not to batch up PMD-sized folios, we might well benefit from batching a small number of low-order mTHPs (though unclear how that "small number" limitation will be implemented). So ask if folio_may_be_lru_cached() rather than !folio_test_large(), to insulate those particular checks from future change. Name preferred to "folio_is_batchable" because large folios can well be put on a batch: it's just the per-CPU LRU caches, drained much later, which need care. Marked for stable, to counter the increase in lru_add_drain_all()s from "mm/gup: check ref_count instead of lru before migration". Link: https://lkml.kernel.org/r/57d2eaf8-3607-f318-e0c5-be02dce61ad0@google.com Fixes: `9a4e9f3b2d` ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: Hugh Dickins <hughd@google.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keir Fraser <keirf@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: yangge <yangge1116@126.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
Hugh Dickins	fb4e6d587a	mm: revert "mm: vmscan.c: fix OOM on swap stress test" commit 8d79ed36bfc83d0583ab72216b7980340478cdfb upstream. This reverts commit 0885ef470560: that was a fix to the reverted `33dfe9204f`. Link: https://lkml.kernel.org/r/aa0e9d67-fbcd-9d79-88a1-641dfbe1d9d1@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keir Fraser <keirf@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: yangge <yangge1116@126.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
Hugh Dickins	d0c8ba94cb	mm/gup: local lru_add_drain() to avoid lru_add_drain_all() commit a09a8a1fbb374e0053b97306da9dbc05bd384685 upstream. In many cases, if collect_longterm_unpinnable_folios() does need to drain the LRU cache to release a reference, the cache in question is on this same CPU, and much more efficiently drained by a preliminary local lru_add_drain(), than the later cross-CPU lru_add_drain_all(). Marked for stable, to counter the increase in lru_add_drain_all()s from "mm/gup: check ref_count instead of lru before migration". Note for clean backports: can take 6.16 commit `a03db236ae` ("gup: optimize longterm pin_user_pages() for large folio") first. Link: https://lkml.kernel.org/r/66f2751f-283e-816d-9530-765db7edc465@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keir Fraser <keirf@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: yangge <yangge1116@126.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
Li Zhe	163843e8c8	gup: optimize longterm pin_user_pages() for large folio commit `a03db236ae` upstream. In the current implementation of longterm pin_user_pages(), we invoke collect_longterm_unpinnable_folios(). This function iterates through the list to check whether each folio belongs to the "longterm_unpinnabled" category. The folios in this list essentially correspond to a contiguous region of userspace addresses, with each folio representing a physical address in increments of PAGESIZE. If this userspace address range is mapped with large folio, we can optimize the performance of function collect_longterm_unpinnable_folios() by reducing the using of READ_ONCE() invoked in pofs_get_folio()->page_folio()->_compound_head(). Also, we can simplify the logic of collect_longterm_unpinnable_folios(). Instead of comparing with prev_folio after calling pofs_get_folio(), we can check whether the next page is within the same folio. The performance test results, based on v6.15, obtained through the gup_test tool from the kernel source tree are as follows. We achieve an improvement of over 66% for large folio with pagesize=2M. For small folio, we have only observed a very slight degradation in performance. Without this patch: [root@localhost ~] ./gup_test -HL -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:14391 put:10858 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [root@localhost ~]# ./gup_test -LT -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:130538 put:31676 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 With this patch: [root@localhost ~] ./gup_test -HL -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:4867 put:10516 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [root@localhost ~]# ./gup_test -LT -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:131798 put:31328 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [lizhe.67@bytedance.com: whitespace fix, per David] Link: https://lkml.kernel.org/r/20250606091917.91384-1-lizhe.67@bytedance.com Link: https://lkml.kernel.org/r/20250606023742.58344-1-lizhe.67@bytedance.com Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
Hugh Dickins	3958f9ec72	mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch" commit afb99e9f500485160f34b8cad6d3763ada3e80e8 upstream. This reverts commit 33dfe9204f29: now that collect_longterm_unpinnable_folios() is checking ref_count instead of lru, and mlock/munlock do not participate in the revised LRU flag clearing, those changes are misleading, and enlarge the window during which mlock/munlock may miss an mlock_count update. It is possible (I'd hesitate to claim probable) that the greater likelihood of missed mlock_count updates would explain the "Realtime threads delayed due to kcompactd0" observed on 6.12 in the Link below. If that is the case, this reversion will help; but a complete solution needs also a further patch, beyond the scope of this series. Included some 80-column cleanup around folio_batch_add_and_move(). The role of folio_test_clear_lru() (before taking per-memcg lru_lock) is questionable since 6.13 removed mem_cgroup_move_account() etc; but perhaps there are still some races which need it - not examined here. Link: https://lore.kernel.org/linux-mm/DU0PR01MB10385345F7153F334100981888259A@DU0PR01MB10385.eurprd01.prod.exchangelabs.com/ Link: https://lkml.kernel.org/r/05905d7b-ed14-68b1-79d8-bdec30367eba@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keir Fraser <keirf@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: yangge <yangge1116@126.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
Hugh Dickins	fdac0a3f58	mm/gup: check ref_count instead of lru before migration commit 98c6d259319ecf6e8d027abd3f14b81324b8c0ad upstream. Patch series "mm: better GUP pin lru_add_drain_all()", v2. Series of lru_add_drain_all()-related patches, arising from recent mm/gup migration report from Will Deacon. This patch (of 5): Will Deacon reports:- When taking a longterm GUP pin via pin_user_pages(), __gup_longterm_locked() tries to migrate target folios that should not be longterm pinned, for example because they reside in a CMA region or movable zone. This is done by first pinning all of the target folios anyway, collecting all of the longterm-unpinnable target folios into a list, dropping the pins that were just taken and finally handing the list off to migrate_pages() for the actual migration. It is critically important that no unexpected references are held on the folios being migrated, otherwise the migration will fail and pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is relatively easy to observe migration failures when running pKVM (which uses pin_user_pages() on crosvm's virtual address space to resolve stage-2 page faults from the guest) on a 6.15-based Pixel 6 device and this results in the VM terminating prematurely. In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its mapping of guest memory prior to the pinning. Subsequently, when pin_user_pages() walks the page-table, the relevant 'pte' is not present and so the faulting logic allocates a new folio, mlocks it with mlock_folio() and maps it in the page-table. Since commit `2fbb0c10d1` ("mm/munlock: mlock_page() munlock_page() batch by pagevec"), mlock/munlock operations on a folio (formerly page), are deferred. For example, mlock_folio() takes an additional reference on the target folio before placing it into a per-cpu 'folio_batch' for later processing by mlock_folio_batch(), which drops the refcount once the operation is complete. Processing of the batches is coupled with the LRU batch logic and can be forcefully drained with lru_add_drain_all() but as long as a folio remains unprocessed on the batch, its refcount will be elevated. This deferred batching therefore interacts poorly with the pKVM pinning scenario as we can find ourselves in a situation where the migration code fails to migrate a folio due to the elevated refcount from the pending mlock operation. Hugh Dickins adds:- !folio_test_lru() has never been a very reliable way to tell if an lru_add_drain_all() is worth calling, to remove LRU cache references to make the folio migratable: the LRU flag may be set even while the folio is held with an extra reference in a per-CPU LRU cache. 5.18 commit `2fbb0c10d1` may have made it more unreliable. Then 6.11 commit `33dfe9204f` ("mm/gup: clear the LRU flag of a page before adding to LRU batch") tried to make it reliable, by moving LRU flag clearing; but missed the mlock/munlock batches, so still unreliable as reported. And it turns out to be difficult to extend 33dfe9204f29's LRU flag clearing to the mlock/munlock batches: if they do benefit from batching, mlock/munlock cannot be so effective when easily suppressed while !LRU. Instead, switch to an expected ref_count check, which was more reliable all along: some more false positives (unhelpful drains) than before, and never a guarantee that the folio will prove migratable, but better. Note on PG_private_2: ceph and nfs are still using the deprecated PG_private_2 flag, with the aid of netfs and filemap support functions. Although it is consistently matched by an increment of folio ref_count, folio_expected_ref_count() intentionally does not recognize it, and ceph folio migration currently depends on that for PG_private_2 folios to be rejected. New references to the deprecated flag are discouraged, so do not add it into the collect_longterm_unpinnable_folios() calculation: but longterm pinning of transiently PG_private_2 ceph and nfs folios (an uncommon case) may invoke a redundant lru_add_drain_all(). And this makes easy the backport to earlier releases: up to and including 6.12, btrfs also used PG_private_2, but without a ref_count increment. Note for stable backports: requires 6.16 commit `86ebd50224` ("mm: add folio_expected_ref_count() for reference count calculation"). Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com Fixes: `9a4e9f3b2d` ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/ Acked-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keir Fraser <keirf@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: yangge <yangge1116@126.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
Mikulas Patocka	ee27658c23	dm-stripe: fix a possible integer overflow commit 1071d560afb4c245c2076494226df47db5a35708 upstream. There's a possible integer overflow in stripe_io_hints if we have too large chunk size. Test if the overflow happened, and if it did, don't set limits->io_min and limits->io_opt; Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Suggested-by: Dongsheng Yang <dongsheng.yang@linux.dev> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
Mikulas Patocka	ba3a78db47	dm-raid: don't set io_min and io_opt for raid1 commit a86556264696b797d94238d99d8284d0d34ed960 upstream. These commands modprobe brd rd_size=1048576 vgcreate vg /dev/ram* lvcreate -m4 -L10 -n lv vg trigger the following warnings: device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency The warnings are caused by the fact that io_min is 512 and physical block size is 4096. If there's chunk-less raid, such as raid1, io_min shouldn't be set to zero because it would be raised to 512 and it would trigger the warning. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
austinchang	e8f496001e	btrfs: initialize inode::file_extent_tree after i_mode has been set commit 8679d2687c351824d08cf1f0e86f3b65f22a00fe upstream. btrfs_init_file_extent_tree() uses S_ISREG() to determine if the file is a regular file. In the beginning of btrfs_read_locked_inode(), the i_mode hasn't been read from inode item, then file_extent_tree won't be used at all in volumes without NO_HOLES. Fix this by calling btrfs_init_file_extent_tree() after i_mode is initialized in btrfs_read_locked_inode(). Fixes: `3d7db6e8bd` ("btrfs: don't allocate file extent tree for non regular files") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: austinchang <austinchang@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
Andrea Righi	8ae0972677	Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()" commit 0b47b6c3543efd65f2e620e359b05f4938314fbd upstream. scx_bpf_reenqueue_local() can be called from ops.cpu_release() when a CPU is taken by a higher scheduling class to give tasks queued to the CPU's local DSQ a chance to be migrated somewhere else, instead of waiting indefinitely for that CPU to become available again. In doing so, we decided to skip migration-disabled tasks, under the assumption that they cannot be migrated anyway. However, when a higher scheduling class preempts a CPU, the running task is always inserted at the head of the local DSQ as a migration-disabled task. This means it is always skipped by scx_bpf_reenqueue_local(), and ends up being confined to the same CPU even if that CPU is heavily contended by other higher scheduling class tasks. As an example, let's consider the following scenario: $ schedtool -a 0,1, -e yes > /dev/null $ sudo schedtool -F -p 99 -a 0, -e \ stress-ng -c 1 --cpu-load 99 --cpu-load-slice 1000 The first task (SCHED_EXT) can run on CPU0 or CPU1. The second task (SCHED_FIFO) is pinned to CPU0 and consumes ~99% of it. If the SCHED_EXT task initially runs on CPU0, it will remain there because it always sees CPU0 as "idle" in the short gaps left by the RT task, resulting in ~1% utilization while CPU1 stays idle: 0[\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|100.0%] 8[ 0.0%] 1[ 0.0%] 9[ 0.0%] 2[ 0.0%] 10[ 0.0%] 3[ 0.0%] 11[ 0.0%] 4[ 0.0%] 12[ 0.0%] 5[ 0.0%] 13[ 0.0%] 6[ 0.0%] 14[ 0.0%] 7[ 0.0%] 15[ 0.0%] PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command 1067 root RT 0 R 0 99.0 0.2 0:31.16 stress-ng-cpu [run] 975 arighi 20 0 R 0 1.0 0.0 0:26.32 yes By allowing scx_bpf_reenqueue_local() to re-enqueue migration-disabled tasks, the scheduler can choose to migrate them to other CPUs (CPU1 in this case) via ops.enqueue(), leading to better CPU utilization: 0[\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|100.0%] 8[ 0.0%] 1[\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|100.0%] 9[ 0.0%] 2[ 0.0%] 10[ 0.0%] 3[ 0.0%] 11[ 0.0%] 4[ 0.0%] 12[ 0.0%] 5[ 0.0%] 13[ 0.0%] 6[ 0.0%] 14[ 0.0%] 7[ 0.0%] 15[ 0.0%] PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command 577 root RT 0 R 0 100.0 0.2 0:23.17 stress-ng-cpu [run] 555 arighi 20 0 R 1 100.0 0.0 0:28.67 yes It's debatable whether per-CPU tasks should be re-enqueued as well, but doing so is probably safer: the scheduler can recognize re-enqueued tasks through the %SCX_ENQ_REENQ flag, reassess their placement, and either put them back at the head of the local DSQ or let another task attempt to take the CPU. This also prevents giving per-CPU tasks an implicit priority boost, which would otherwise make them more likely to reclaim CPUs preempted by higher scheduling classes. Fixes: `97e13ecb02` ("sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()") Cc: stable@vger.kernel.org # v6.15+ Signed-off-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
H. Nikolaus Schaller	a4ee54e682	power: supply: bq27xxx: restrict no-battery detection to bq27000 commit 1e451977e1703b6db072719b37cd1b8e250b9cc9 upstream. There are fuel gauges in the bq27xxx series (e.g. bq27z561) which may in some cases report 0xff as the value of BQ27XXX_REG_FLAGS that should not be interpreted as "no battery" like for a disconnected battery with some built in bq27000 chip. So restrict the no-battery detection originally introduced by commit `3dd843e1c2` ("bq27000: report missing device better.") to the bq27000. There is no need to backport further because this was hidden before commit `f16d9fb6cf` ("power: supply: bq27xxx: Retrieve again when busy") Fixes: `f16d9fb6cf` ("power: supply: bq27xxx: Retrieve again when busy") Suggested-by: Jerry Lv <Jerry.Lv@axis.com> Cc: stable@vger.kernel.org Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Link: https://lore.kernel.org/r/dd979fa6855fd051ee5117016c58daaa05966e24.1755945297.git.hns@goldelico.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
H. Nikolaus Schaller	d18d7035ec	power: supply: bq27xxx: fix error return in case of no bq27000 hdq battery commit 2c334d038466ac509468fbe06905a32d202117db upstream. Since commit commit `f16d9fb6cf` ("power: supply: bq27xxx: Retrieve again when busy") the console log of some devices with hdq enabled but no bq27000 battery (like e.g. the Pandaboard) is flooded with messages like: [ 34.247833] power_supply bq27000-battery: driver failed to report 'status' property: -1 as soon as user-space is finding a /sys entry and trying to read the "status" property. It turns out that the offending commit changes the logic to now return the value of cache.flags if it is <0. This is likely under the assumption that it is an error number. In normal errors from bq27xxx_read() this is indeed the case. But there is special code to detect if no bq27000 is installed or accessible through hdq/1wire and wants to report this. In that case, the cache.flags are set historically by commit `3dd843e1c2` ("bq27000: report missing device better.") to constant -1 which did make reading properties return -ENODEV. So everything appeared to be fine before the return value was passed upwards. Now the -1 is returned as -EPERM instead of -ENODEV, triggering the error condition in power_supply_format_property() which then floods the console log. So we change the detection of missing bq27000 battery to simply set cache.flags = -ENODEV instead of -1. Fixes: `f16d9fb6cf` ("power: supply: bq27xxx: Retrieve again when busy") Cc: Jerry Lv <Jerry.Lv@axis.com> Cc: stable@vger.kernel.org Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Link: https://lore.kernel.org/r/692f79eb6fd541adb397038ea6e750d4de2deddf.1755945297.git.hns@goldelico.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:46 +02:00
Herbert Xu	45bcf60fe4	crypto: af_alg - Disallow concurrent writes in af_alg_sendmsg commit 1b34cbbf4f011a121ef7b2d7d6e6920a036d5285 upstream. Issuing two writes to the same af_alg socket is bogus as the data will be interleaved in an unpredictable fashion. Furthermore, concurrent writes may create inconsistencies in the internal socket state. Disallow this by adding a new ctx->write field that indiciates exclusive ownership for writing. Fixes: `8ff590903d` ("crypto: algif_skcipher - User-space interface for skcipher operations") Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg> Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:45 +02:00
Nathan Chancellor	7b7361da9e	nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/* commit 025e87f8ea2ae3a28bf1fe2b052bfa412c27ed4a upstream. When accessing one of the files under /sys/fs/nilfs2/features when CONFIG_CFI_CLANG is enabled, there is a CFI violation: CFI failure at kobj_attr_show+0x59/0x80 (target: nilfs_feature_revision_show+0x0/0x30; expected type: 0xfc392c4d) ... Call Trace: <TASK> sysfs_kf_seq_show+0x2a6/0x390 ? __cfi_kobj_attr_show+0x10/0x10 kernfs_seq_show+0x104/0x15b seq_read_iter+0x580/0xe2b ... When the kobject of the kset for /sys/fs/nilfs2 is initialized, its ktype is set to kset_ktype, which has a ->sysfs_ops of kobj_sysfs_ops. When nilfs_feature_attr_group is added to that kobject via sysfs_create_group(), the kernfs_ops of each files is sysfs_file_kfops_rw, which will call sysfs_kf_seq_show() when ->seq_show() is called. sysfs_kf_seq_show() in turn calls kobj_attr_show() through ->sysfs_ops->show(). kobj_attr_show() casts the provided attribute out to a 'struct kobj_attribute' via container_of() and calls ->show(), resulting in the CFI violation since neither nilfs_feature_revision_show() nor nilfs_feature_README_show() match the prototype of ->show() in 'struct kobj_attribute'. Resolve the CFI violation by adjusting the second parameter in nilfs_feature_{revision,README}_show() from 'struct attribute' to 'struct kobj_attribute' to match the expected prototype. Link: https://lkml.kernel.org/r/20250906144410.22511-1-konishi.ryusuke@gmail.com Fixes: `aebe17f684` ("nilfs2: add /sys/fs/nilfs2/features group") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509021646.bc78d9ef-lkp@intel.com/ Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:45 +02:00
Sergey Senozhatsky	ff750e9f2c	zram: fix slot write race condition commit ce4be9e4307c5a60701ff6e0cafa74caffdc54ce upstream. Parallel concurrent writes to the same zram index result in leaked zsmalloc handles. Schematically we can have something like this: CPU0 CPU1 zram_slot_lock() zs_free(handle) zram_slot_lock() zram_slot_lock() zs_free(handle) zram_slot_lock() compress compress handle = zs_malloc() handle = zs_malloc() zram_slot_lock zram_set_handle(handle) zram_slot_lock zram_slot_lock zram_set_handle(handle) zram_slot_lock Either CPU0 or CPU1 zsmalloc handle will leak because zs_free() is done too early. In fact, we need to reset zram entry right before we set its new handle, all under the same slot lock scope. Link: https://lkml.kernel.org/r/20250909045150.635345-1-senozhatsky@chromium.org Fixes: `71268035f5` ("zram: free slot memory early during write") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reported-by: Changhui Zhong <czhong@redhat.com> Closes: https://lore.kernel.org/all/CAGVVp+UtpGoW5WEdEU7uVTtsSCjPN=ksN6EcvyypAtFDOUf30A@mail.gmail.com/ Tested-by: Changhui Zhong <czhong@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:45 +02:00
Stefan Metzmacher	c64b915bb3	ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size commit e1868ba37fd27c6a68e31565402b154beaa65df0 upstream. This is inspired by the check for data_offset + data_length. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: stable@vger.kernel.org Fixes: `2ea086e35c` ("ksmbd: add buffer validation for smb direct") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:45 +02:00
Namjae Jeon	529b121b00	ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer commit 5282491fc49d5614ac6ddcd012e5743eecb6a67c upstream. If data_offset and data_length of smb_direct_data_transfer struct are invalid, out of bounds issue could happen. This patch validate data_offset and data_length field in recv_done. Cc: stable@vger.kernel.org Fixes: `2ea086e35c` ("ksmbd: add buffer validation for smb direct") Reviewed-by: Stefan Metzmacher <metze@samba.org> Reported-by: Luigino Camastra, Aisle Research <luigino.camastra@aisle.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-25 11:16:45 +02:00
Duoming Zhou	5ca20bb7b4	octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp() [ Upstream commit f8b4687151021db61841af983f1cb7be6915d4ef ] The original code relies on cancel_delayed_work() in otx2_ptp_destroy(), which does not ensure that the delayed work item synctstamp_work has fully completed if it was already running. This leads to use-after-free scenarios where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp(). Furthermore, the synctstamp_work is cyclic, the likelihood of triggering the bug is nonnegligible. A typical race condition is illustrated below: CPU 0 (cleanup) \| CPU 1 (delayed work callback) otx2_remove() \| otx2_ptp_destroy() \| otx2_sync_tstamp() cancel_delayed_work() \| kfree(ptp) \| \| ptp = container_of(...); //UAF \| ptp-> //UAF This is confirmed by a KASAN report: BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0 Write of size 8 at addr ffff88800aa09a18 by task bash/136 ... Call Trace: <IRQ> dump_stack_lvl+0x55/0x70 print_report+0xcf/0x610 ? __run_timer_base.part.0+0x7d7/0x8c0 kasan_report+0xb8/0xf0 ? __run_timer_base.part.0+0x7d7/0x8c0 __run_timer_base.part.0+0x7d7/0x8c0 ? __pfx___run_timer_base.part.0+0x10/0x10 ? __pfx_read_tsc+0x10/0x10 ? ktime_get+0x60/0x140 ? lapic_next_event+0x11/0x20 ? clockevents_program_event+0x1d4/0x2a0 run_timer_softirq+0xd1/0x190 handle_softirqs+0x16a/0x550 irq_exit_rcu+0xaf/0xe0 sysvec_apic_timer_interrupt+0x70/0x80 </IRQ> ... Allocated by task 1: kasan_save_stack+0x24/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x7f/0x90 otx2_ptp_init+0xb1/0x860 otx2_probe+0x4eb/0xc30 local_pci_probe+0xdc/0x190 pci_device_probe+0x2fe/0x470 really_probe+0x1ca/0x5c0 __driver_probe_device+0x248/0x310 driver_probe_device+0x44/0x120 __driver_attach+0xd2/0x310 bus_for_each_dev+0xed/0x170 bus_add_driver+0x208/0x500 driver_register+0x132/0x460 do_one_initcall+0x89/0x300 kernel_init_freeable+0x40d/0x720 kernel_init+0x1a/0x150 ret_from_fork+0x10c/0x1a0 ret_from_fork_asm+0x1a/0x30 Freed by task 136: kasan_save_stack+0x24/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3a/0x60 __kasan_slab_free+0x3f/0x50 kfree+0x137/0x370 otx2_ptp_destroy+0x38/0x80 otx2_remove+0x10d/0x4c0 pci_device_remove+0xa6/0x1d0 device_release_driver_internal+0xf8/0x210 pci_stop_bus_device+0x105/0x150 pci_stop_and_remove_bus_device_locked+0x15/0x30 remove_store+0xcc/0xe0 kernfs_fop_write_iter+0x2c3/0x440 vfs_write+0x871/0xd70 ksys_write+0xee/0x1c0 do_syscall_64+0xac/0x280 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure that the delayed work item is properly canceled before the otx2_ptp is deallocated. This bug was initially identified through static analysis. To reproduce and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced artificial delays within the otx2_sync_tstamp() function to increase the likelihood of triggering the bug. Fixes: `2958d17a89` ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-25 11:16:45 +02:00
Duoming Zhou	0627e14816	cnic: Fix use-after-free bugs in cnic_delete_task [ Upstream commit cfa7d9b1e3a8604afc84e9e51d789c29574fb216 ] The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(), which does not guarantee that the delayed work item 'delete_task' has fully completed if it was already running. Additionally, the delayed work item is cyclic, the flush_workqueue() in cnic_cm_stop_bnx2x_hw() only blocks and waits for work items that were already queued to the workqueue prior to its invocation. Any work items submitted after flush_workqueue() is called are not included in the set of tasks that the flush operation awaits. This means that after the cyclic work items have finished executing, a delayed work item may still exist in the workqueue. This leads to use-after-free scenarios where the cnic_dev is deallocated by cnic_free_dev(), while delete_task remains active and attempt to dereference cnic_dev in cnic_delete_task(). A typical race condition is illustrated below: CPU 0 (cleanup) \| CPU 1 (delayed work callback) cnic_netdev_event() \| cnic_stop_hw() \| cnic_delete_task() cnic_cm_stop_bnx2x_hw() \| ... cancel_delayed_work() \| /* the queue_delayed_work() flush_workqueue() \| executes after flush_workqueue()*/ \| queue_delayed_work() cnic_free_dev(dev)//free \| cnic_delete_task() //new instance \| dev = cp->dev; //use Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure that the cyclic delayed work item is properly canceled and that any ongoing execution of the work item completes before the cnic_dev is deallocated. Furthermore, since cancel_delayed_work_sync() uses __flush_work(work, true) to synchronously wait for any currently executing instance of the work item to finish, the flush_workqueue() becomes redundant and should be removed. This bug was identified through static analysis. To reproduce the issue and validate the fix, I simulated the cnic PCI device in QEMU and introduced intentional delays — such as inserting calls to ssleep() within the cnic_delete_task() function — to increase the likelihood of triggering the bug. Fixes: `fdf24086f4` ("cnic: Defer iscsi connection cleanup") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-25 11:16:45 +02:00

... 3 4 5 6 7 ...

1371879 Commits