From 9f1e8cd0b7c4c944e9921b52a6661b5eda2705ab Mon Sep 17 00:00:00 2001 From: Jinjiang Tu Date: Fri, 27 Jun 2025 20:57:46 +0800 Subject: [PATCH 01/11] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list In shrink_folio_list(), the hwpoisoned folio may be large folio, which can't be handled by unmap_poisoned_folio(). For THP, try_to_unmap_one() must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then retry. Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of pvmw.pte. Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a WARN_ON_ONCE due to the page isn't in swapcache. Since UCE is rare in real world, and race with reclaimation is more rare, just skipping the hwpoisoned large folio is enough. memory_failure() will handle it if the UCE is triggered again. This happens when memory reclaim for large folio races with memory_failure(), and will lead to kernel panic. The race is as follows: cpu0 cpu1 shrink_folio_list memory_failure TestSetPageHWPoison unmap_poisoned_folio --> trigger BUG_ON due to unmap_poisoned_folio couldn't handle large folio [tujinjiang@huawei.com: add comment to unmap_poisoned_folio()] Link: https://lkml.kernel.org/r/69fd4e00-1b13-d5f7-1c82-705c7d977ea4@huawei.com Link: https://lkml.kernel.org/r/20250627125747.3094074-2-tujinjiang@huawei.com Signed-off-by: Jinjiang Tu Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio") Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/ Acked-by: David Hildenbrand Reviewed-by: Miaohe Lin Acked-by: Zi Yan Reviewed-by: Oscar Salvador Cc: Kefeng Wang Cc: Michal Hocko Cc: Oscar Salvador Cc: Signed-off-by: Andrew Morton --- mm/memory-failure.c | 4 ++++ mm/vmscan.c | 8 ++++++++ 2 files changed, 12 insertions(+) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b91a33fb6c69..225dddff091d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1561,6 +1561,10 @@ static int get_hwpoison_page(struct page *p, unsigned long flags) return ret; } +/* + * The caller must guarantee the folio isn't large folio, except hugetlb. + * try_to_unmap() can't handle it. + */ int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill) { enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON; diff --git a/mm/vmscan.c b/mm/vmscan.c index f8dfd2864bbf..424412680cfc 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1138,6 +1138,14 @@ retry: goto keep; if (folio_contain_hwpoisoned_page(folio)) { + /* + * unmap_poisoned_folio() can't handle large + * folio, just skip it. memory_failure() will + * handle it if the UCE is triggered again. + */ + if (folio_test_large(folio)) + goto keep_locked; + unmap_poisoned_folio(folio, folio_pfn(folio), false); folio_unlock(folio); folio_put(folio); From 694d6b99923eb05a8fd188be44e26077d19f0e21 Mon Sep 17 00:00:00 2001 From: Harry Yoo Date: Fri, 4 Jul 2025 19:30:53 +0900 Subject: [PATCH 02/11] mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n Commit 48b4800a1c6a ("zsmalloc: page migration support") added support for migrating zsmalloc pages using the movable_operations migration framework. However, the commit did not take into account that zsmalloc supports migration only when CONFIG_COMPACTION is enabled. Tracing shows that zsmalloc was still passing the __GFP_MOVABLE flag even when compaction is not supported. This can result in unmovable pages being allocated from movable page blocks (even without stealing page blocks), ZONE_MOVABLE and CMA area. Possible user visible effects: - Some ZONE_MOVABLE memory can be not actually movable - CMA allocation can fail because of this - Increased memory fragmentation due to ignoring the page mobility grouping feature I'm not really sure who uses kernels without compaction support, though :( To fix this, clear the __GFP_MOVABLE flag when !IS_ENABLED(CONFIG_COMPACTION). Link: https://lkml.kernel.org/r/20250704103053.6913-1-harry.yoo@oracle.com Fixes: 48b4800a1c6a ("zsmalloc: page migration support") Signed-off-by: Harry Yoo Acked-by: David Hildenbrand Reviewed-by: Sergey Senozhatsky Cc: Minchan Kim Cc: Signed-off-by: Andrew Morton --- mm/zsmalloc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 999b513c7fdf..f3e2215f95eb 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1043,6 +1043,9 @@ static struct zspage *alloc_zspage(struct zs_pool *pool, if (!zspage) return NULL; + if (!IS_ENABLED(CONFIG_COMPACTION)) + gfp &= ~__GFP_MOVABLE; + zspage->magic = ZSPAGE_MAGIC; zspage->pool = pool; zspage->class = class->index; From 091a08b002d9b5040b92531fff07b52879f29722 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 7 Jul 2025 16:52:13 +0900 Subject: [PATCH 03/11] mailmap: add entry for Senozhatsky Consolidate and map all addresses to a single one. Link: https://lkml.kernel.org/r/20250707075243.858895-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky Signed-off-by: Andrew Morton --- .mailmap | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/.mailmap b/.mailmap index b0ace71968ab..ec14977b1db5 100644 --- a/.mailmap +++ b/.mailmap @@ -693,6 +693,10 @@ Sedat Dilek Senthilkumar N L Serge Hallyn Serge Hallyn +Sergey Senozhatsky +Sergey Senozhatsky +Sergey Senozhatsky +Sergey Senozhatsky Seth Forshee Shakeel Butt Shannon Nelson From 7563fcbfd484b347b776aeed4d7dac78b30884aa Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Tue, 8 Jul 2025 21:27:59 -0400 Subject: [PATCH 04/11] selftests/mm: fix split_huge_page_test for folio_split() tests PID_FMT does not have an offset field, so folio_split() tests are not performed. Add PID_FMT_OFFSET with an offset field and use it to perform folio_split() tests. Link: https://lkml.kernel.org/r/20250709012800.3225727-1-ziy@nvidia.com Fixes: 80a5c494c89f ("selftests/mm: add tests for folio_split(), buddy allocator like split") Signed-off-by: Zi Yan Tested-by: Baolin Wang Reviewed-by: Lorenzo Stoakes Reviewed-by: Donet Tom Tested-by : Donet Tom Cc: Barry Song Cc: David Hildenbrand Cc: Dev Jain Cc: Liam Howlett Cc: Mariano Pache Cc: Ryan Roberts Cc: Shuah Khan Cc: Signed-off-by: Andrew Morton --- tools/testing/selftests/mm/split_huge_page_test.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c index aa7400ed0e99..f0d9c035641d 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -31,6 +31,7 @@ uint64_t pmd_pagesize; #define INPUT_MAX 80 #define PID_FMT "%d,0x%lx,0x%lx,%d" +#define PID_FMT_OFFSET "%d,0x%lx,0x%lx,%d,%d" #define PATH_FMT "%s,0x%lx,0x%lx,%d" #define PFN_MASK ((1UL<<55)-1) @@ -483,7 +484,7 @@ void split_thp_in_pagecache_to_order_at(size_t fd_size, const char *fs_loc, write_debugfs(PID_FMT, getpid(), (uint64_t)addr, (uint64_t)addr + fd_size, order); else - write_debugfs(PID_FMT, getpid(), (uint64_t)addr, + write_debugfs(PID_FMT_OFFSET, getpid(), (uint64_t)addr, (uint64_t)addr + fd_size, order, offset); for (i = 0; i < fd_size; i++) From 4aead50caf67e01020c8be1945c3201e8a972a27 Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Thu, 10 Jul 2025 22:49:08 +0900 Subject: [PATCH 05/11] nilfs2: reject invalid file types when reading inodes To prevent inodes with invalid file types from tripping through the vfs and causing malfunctions or assertion failures, add a missing sanity check when reading an inode from a block device. If the file type is not valid, treat it as a filesystem error. Link: https://lkml.kernel.org/r/20250710134952.29862-1-konishi.ryusuke@gmail.com Fixes: 05fe58fdc10d ("nilfs2: inode operations") Signed-off-by: Ryusuke Konishi Reported-by: syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Cc: Signed-off-by: Andrew Morton --- fs/nilfs2/inode.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c index 6613b8fcceb0..5cf7328d5360 100644 --- a/fs/nilfs2/inode.c +++ b/fs/nilfs2/inode.c @@ -472,11 +472,18 @@ static int __nilfs_read_inode(struct super_block *sb, inode->i_op = &nilfs_symlink_inode_operations; inode_nohighmem(inode); inode->i_mapping->a_ops = &nilfs_aops; - } else { + } else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) || + S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) { inode->i_op = &nilfs_special_inode_operations; init_special_inode( inode, inode->i_mode, huge_decode_dev(le64_to_cpu(raw_inode->i_device_code))); + } else { + nilfs_error(sb, + "invalid file type bits in mode 0%o for inode %lu", + inode->i_mode, ino); + err = -EIO; + goto failed_unmap; } nilfs_ifile_unmap_inode(raw_inode); brelse(bh); From d367a177e2e4097065c0181ea4da7cdce4d68921 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Fri, 11 Jul 2025 19:34:44 -0300 Subject: [PATCH 06/11] mm: update MAINTAINERS entry for HMM MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Jérôme has moved on from RH and has not been looking at HMM patches for some time. I've made the most changes to the core code in the recent period and Leon is now working on the HMM side from the RDMA ODP. Link: https://lkml.kernel.org/r/0-v1-a1df5219c7a3+1d981-hmm_maintainers_jgg@nvidia.com Closes: https://lore.kernel.org/all/39d43309-9f34-48bc-a9ad-108c607ba175@samsung.com/ Signed-off-by: Jason Gunthorpe Reported-by: Marek Szyprowski Acked-by: David Hildenbrand Acked-by: Leon Romanovsky Cc: Lorenzo Stoakes Signed-off-by: Andrew Morton --- CREDITS | 4 ++++ MAINTAINERS | 3 ++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/CREDITS b/CREDITS index c30b75f9a732..cf2a53efaf4c 100644 --- a/CREDITS +++ b/CREDITS @@ -1397,6 +1397,10 @@ N: Thomas Gleixner E: tglx@linutronix.de D: NAND flash hardware support, JFFS2 on NAND flash +N: Jérôme Glisse +E: jglisse@redhat.com +D: HMM - Heterogeneous Memory Management + N: Richard E. Gooch E: rgooch@atnf.csiro.au D: parent process death signal to children diff --git a/MAINTAINERS b/MAINTAINERS index fad6cb025a19..911d0483a23b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11006,7 +11006,8 @@ F: Documentation/ABI/testing/debugfs-hisi-zip F: drivers/crypto/hisilicon/zip/ HMM - Heterogeneous Memory Management -M: Jérôme Glisse +M: Jason Gunthorpe +M: Leon Romanovsky L: linux-mm@kvack.org S: Maintained F: Documentation/mm/hmm.rst From 153ad566724fe6f57b14f66e9726d295d22e576d Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Tue, 15 Jul 2025 12:56:16 -0700 Subject: [PATCH 07/11] mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show() After a recent change in clang to expose uninitialized warnings from const variables [1], there is a false positive warning from the if statement in advisor_mode_show(). mm/ksm.c:3687:11: error: variable 'output' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] 3687 | else if (ksm_advisor == KSM_ADVISOR_SCAN_TIME) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/ksm.c:3690:33: note: uninitialized use occurs here 3690 | return sysfs_emit(buf, "%s\n", output); | ^~~~~~ Rewrite the if statement to implicitly make KSM_ADVISOR_NONE the else branch so that it is obvious to the compiler that ksm_advisor can only be KSM_ADVISOR_NONE or KSM_ADVISOR_SCAN_TIME due to the assignments in advisor_mode_store(). Link: https://lkml.kernel.org/r/20250715-ksm-fix-clang-21-uninit-warning-v1-1-f443feb4bfc4@kernel.org Fixes: 66790e9a735b ("mm/ksm: add sysfs knobs for advisor") Signed-off-by: Nathan Chancellor Closes: https://github.com/ClangBuiltLinux/linux/issues/2100 Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1] Acked-by: David Hildenbrand Cc: Chengming Zhou Cc: Stefan Roesch Cc: xu xin Cc: Signed-off-by: Andrew Morton --- mm/ksm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index 8583fb91ef13..a9d3e719e089 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -3669,10 +3669,10 @@ static ssize_t advisor_mode_show(struct kobject *kobj, { const char *output; - if (ksm_advisor == KSM_ADVISOR_NONE) - output = "[none] scan-time"; - else if (ksm_advisor == KSM_ADVISOR_SCAN_TIME) + if (ksm_advisor == KSM_ADVISOR_SCAN_TIME) output = "none [scan-time]"; + else + output = "[none] scan-time"; return sysfs_emit(buf, "%s\n", output); } From 6ade153349c6bb990d170cecc3e8bdd8628119ab Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Wed, 16 Jul 2025 17:23:28 +0200 Subject: [PATCH 08/11] kasan: use vmalloc_dump_obj() for vmalloc error reports Since 6ee9b3d84775 ("kasan: remove kasan_find_vm_area() to prevent possible deadlock"), more detailed info about the vmalloc mapping and the origin was dropped due to potential deadlocks. While fixing the deadlock is necessary, that patch was too quick in killing an otherwise useful feature, and did no due-diligence in understanding if an alternative option is available. Restore printing more helpful vmalloc allocation info in KASAN reports with the help of vmalloc_dump_obj(). Example report: | BUG: KASAN: vmalloc-out-of-bounds in vmalloc_oob+0x4c9/0x610 | Read of size 1 at addr ffffc900002fd7f3 by task kunit_try_catch/493 | | CPU: [...] | Call Trace: | | dump_stack_lvl+0xa8/0xf0 | print_report+0x17e/0x810 | kasan_report+0x155/0x190 | vmalloc_oob+0x4c9/0x610 | [...] | | The buggy address belongs to a 1-page vmalloc region starting at 0xffffc900002fd000 allocated at vmalloc_oob+0x36/0x610 | The buggy address belongs to the physical page: | page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x126364 | flags: 0x200000000000000(node=0|zone=2) | raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000 | raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 | page dumped because: kasan: bad access detected | | [..] Link: https://lkml.kernel.org/r/20250716152448.3877201-1-elver@google.com Fixes: 6ee9b3d84775 ("kasan: remove kasan_find_vm_area() to prevent possible deadlock") Signed-off-by: Marco Elver Suggested-by: Uladzislau Rezki Acked-by: Uladzislau Rezki (Sony) Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Sebastian Andrzej Siewior Cc: Yeoreum Yun Cc: Yunseong Kim Cc: Signed-off-by: Andrew Morton --- mm/kasan/report.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/kasan/report.c b/mm/kasan/report.c index b0877035491f..62c01b4527eb 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -399,7 +399,9 @@ static void print_address_description(void *addr, u8 tag, } if (is_vmalloc_addr(addr)) { - pr_err("The buggy address %px belongs to a vmalloc virtual mapping\n", addr); + pr_err("The buggy address belongs to a"); + if (!vmalloc_dump_obj(addr)) + pr_cont(" vmalloc virtual mapping\n"); page = vmalloc_to_page(addr); } From 1aef9df0ee90650ac3879dc994bb1a4b0e6dd83f Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Sat, 19 Jul 2025 11:19:32 -0700 Subject: [PATCH 09/11] mm/damon/core: commit damos_quota_goal->nid DAMOS quota goal uses 'nid' field when the metric is DAMOS_QUOTA_NODE_MEM_{USED,FREE}_BP. But the goal commit function is not updating the goal's nid field. Fix it. Link: https://lkml.kernel.org/r/20250719181932.72944-1-sj@kernel.org Fixes: 0e1c773b501f ("mm/damon/core: introduce damos quota goal metrics for memory node utilization") [6.16.x] Signed-off-by: SeongJae Park Cc: Signed-off-by: Andrew Morton --- mm/damon/core.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/damon/core.c b/mm/damon/core.c index 979b29e16ef4..339116ea30e3 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -754,6 +754,19 @@ static struct damos_quota_goal *damos_nth_quota_goal( return NULL; } +static void damos_commit_quota_goal_union( + struct damos_quota_goal *dst, struct damos_quota_goal *src) +{ + switch (dst->metric) { + case DAMOS_QUOTA_NODE_MEM_USED_BP: + case DAMOS_QUOTA_NODE_MEM_FREE_BP: + dst->nid = src->nid; + break; + default: + break; + } +} + static void damos_commit_quota_goal( struct damos_quota_goal *dst, struct damos_quota_goal *src) { @@ -762,6 +775,7 @@ static void damos_commit_quota_goal( if (dst->metric == DAMOS_QUOTA_USER_INPUT) dst->current_value = src->current_value; /* keep last_psi_total as is, since it will be updated in next cycle */ + damos_commit_quota_goal_union(dst, src); } /** @@ -795,6 +809,7 @@ int damos_commit_quota_goals(struct damos_quota *dst, struct damos_quota *src) src_goal->metric, src_goal->target_value); if (!new_goal) return -ENOMEM; + damos_commit_quota_goal_union(new_goal, src_goal); damos_add_quota_goal(dst, new_goal); } return 0; From 91a229bb7ba86b2592c3f18c54b7b2c5e6fe0f95 Mon Sep 17 00:00:00 2001 From: Akinobu Mita Date: Sat, 19 Jul 2025 20:26:04 +0900 Subject: [PATCH 10/11] resource: fix false warning in __request_region() A warning is raised when __request_region() detects a conflict with a resource whose resource.desc is IORES_DESC_DEVICE_PRIVATE_MEMORY. But this warning is only valid for iomem_resources. The hmem device resource uses resource.desc as the numa node id, which can cause spurious warnings. This warning appeared on a machine with multiple cxl memory expanders. One of the NUMA node id is 6, which is the same as the value of IORES_DESC_DEVICE_PRIVATE_MEMORY. In this environment it was just a spurious warning, but when I saw the warning I suspected a real problem so it's better to fix it. This change fixes this by restricting the warning to only iomem_resource. This also adds a missing new line to the warning message. Link: https://lkml.kernel.org/r/20250719112604.25500-1-akinobu.mita@gmail.com Fixes: 7dab174e2e27 ("dax/hmem: Move hmem device registration to dax_hmem.ko") Signed-off-by: Akinobu Mita Reviewed-by: Dan Williams Cc: Signed-off-by: Andrew Morton --- kernel/resource.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/resource.c b/kernel/resource.c index 8d3e6ed0bdc1..f9bb5481501a 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1279,8 +1279,9 @@ static int __request_region_locked(struct resource *res, struct resource *parent * become unavailable to other users. Conflicts are * not expected. Warn to aid debugging if encountered. */ - if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) { - pr_warn("Unaddressable device %s %pR conflicts with %pR", + if (parent == &iomem_resource && + conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) { + pr_warn("Unaddressable device %s %pR conflicts with %pR\n", conflict->name, conflict, res); } if (conflict != parent) { From 0dec7201788b9152f06321d0dab46eed93834cda Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Mon, 21 Jul 2025 16:15:57 +1000 Subject: [PATCH 11/11] sprintf.h requires stdarg.h In file included from drivers/crypto/intel/qat/qat_common/adf_pm_dbgfs_utils.c:4: include/linux/sprintf.h:11:54: error: unknown type name 'va_list' 11 | __printf(2, 0) int vsprintf(char *buf, const char *, va_list); | ^~~~~~~ include/linux/sprintf.h:1:1: note: 'va_list' is defined in header ''; this is probably fixable by adding '#include ' Link: https://lkml.kernel.org/r/20250721173754.42865913@canb.auug.org.au Fixes: 39ced19b9e60 ("lib/vsprintf: split out sprintf() and friends") Signed-off-by: Stephen Rothwell Cc: Andriy Shevchenko Cc: Herbert Xu Cc: Petr Mladek Cc: Steven Rostedt Cc: Rasmus Villemoes Cc: Sergey Senozhatsky Cc: Signed-off-by: Andrew Morton --- include/linux/sprintf.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/sprintf.h b/include/linux/sprintf.h index 51cab2def9ec..876130091384 100644 --- a/include/linux/sprintf.h +++ b/include/linux/sprintf.h @@ -4,6 +4,7 @@ #include #include +#include int num_to_str(char *buf, int size, unsigned long long num, unsigned int width);