linux-yocto/mm
Jann Horn 9cf5b2a3b7 mm/hugetlb: unshare page tables during VMA split, not before
commit 081056dc00 upstream.

Currently, __split_vma() triggers hugetlb page table unsharing through
vm_ops->may_split().  This happens before the VMA lock and rmap locks are
taken - which is too early, it allows racing VMA-locked page faults in our
process and racing rmap walks from other processes to cause page tables to
be shared again before we actually perform the split.

Fix it by explicitly calling into the hugetlb unshare logic from
__split_vma() in the same place where THP splitting also happens.  At that
point, both the VMA and the rmap(s) are write-locked.

An annoying detail is that we can now call into the helper
hugetlb_unshare_pmds() from two different locking contexts:

1. from hugetlb_split(), holding:
    - mmap lock (exclusively)
    - VMA lock
    - file rmap lock (exclusively)
2. hugetlb_unshare_all_pmds(), which I think is designed to be able to
   call us with only the mmap lock held (in shared mode), but currently
   only runs while holding mmap lock (exclusively) and VMA lock

Backporting note:
This commit fixes a racy protection that was introduced in commit
b30c14cd61 ("hugetlb: unshare some PMDs when splitting VMAs"); that
commit claimed to fix an issue introduced in 5.13, but it should actually
also go all the way back.

[jannh@google.com: v2]
  Link: https://lkml.kernel.org/r/20250528-hugetlb-fixes-splitrace-v2-1-1329349bad1a@google.com
Link: https://lkml.kernel.org/r/20250528-hugetlb-fixes-splitrace-v2-0-1329349bad1a@google.com
Link: https://lkml.kernel.org/r/20250527-hugetlb-fixes-splitrace-v1-1-f4136f5ec58a@google.com
Fixes: 39dde65c99 ("[PATCH] shared page table for hugetlb page")
Signed-off-by: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>	[b30c14cd6102: hugetlb: unshare some PMDs when splitting VMAs]
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[stable backport: added missing include]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27 11:11:40 +01:00
..
damon mm/damon/ops: have damon_get_folio return folio even for tail pages 2025-04-20 10:15:48 +02:00
kasan rust: treewide: switch to the kernel Vec type 2025-03-13 13:01:46 +01:00
kfence kfence: skip __GFP_THISNODE allocations on NUMA systems 2025-02-17 10:05:31 +01:00
kmsan dma: kmsan: export kmsan_handle_dma() for modules 2025-03-13 13:01:58 +01:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c
cma_debug.c
cma_sysfs.c
cma.c mm/cma: add cma_{alloc,free}_folio() 2024-09-03 21:15:36 -07:00
cma.h
compaction.c mm/compaction: fix bug in hugetlb handling pathway 2025-04-25 10:47:53 +02:00
debug_page_alloc.c
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries 2024-09-17 01:07:01 -07:00
debug.c mm: open-code page_folio() in dump_page() 2024-12-14 20:03:33 +01:00
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
dmapool.c
early_ioremap.c
execmem.c mm/execmem, arch: convert remaining overrides of module_alloc to execmem 2024-05-14 00:31:43 -07:00
fadvise.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
fail_page_alloc.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
failslab.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
filemap.c mm: fix filemap_get_folios_contig returning batches of identical folios 2025-04-25 10:47:53 +02:00
folio-compat.c mm: remove putback_lru_page() 2024-09-09 16:38:59 -07:00
gup_test.c
gup_test.h
gup.c mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable() 2025-04-25 10:47:53 +02:00
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
huge_memory.c mm/huge_memory: fix dereferencing invalid pmd migration entry 2025-05-18 08:24:51 +02:00
hugetlb_cgroup.c mm: memcg: don't call propagate_protected_usage() needlessly 2024-09-01 20:25:50 -07:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO 2024-09-01 20:25:45 -07:00
hugetlb_vmemmap.h
hugetlb.c mm/hugetlb: unshare page tables during VMA split, not before 2025-06-27 11:11:40 +01:00
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c
internal.h mm: fix folio_pte_batch() on XEN PV 2025-05-18 08:24:51 +02:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig resource: remove dependency on SPARSEMEM from GET_FREE_REGION 2024-10-28 21:40:39 -07:00
Kconfig.debug slub: Introduce CONFIG_SLUB_RCU_DEBUG 2024-08-27 14:12:51 +02:00
khugepaged.c mm: khugepaged: fix the incorrect statistics when collapsing large file folios 2024-10-17 00:28:10 -07:00
kmemleak.c mm: kmemleak: fix upper boundary check for physical address objects 2025-02-17 10:05:36 +01:00
ksm.c mm: remove PageSwapCache 2024-09-03 21:15:44 -07:00
list_lru.c mm: list_lru: fix UAF for memory cgroup 2024-08-07 18:33:56 -07:00
maccess.c
madvise.c mm: close theoretical race where stale TLB entries could linger 2025-06-27 11:11:38 +01:00
Makefile mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
mapping_dirty_helpers.c
memblock.c memblock: Accept allocated memory before use in memblock_double_array() 2025-05-18 08:24:54 +02:00
memcontrol-v1.c mm/thp: fix deferred split unqueue naming and locking 2024-11-05 16:49:54 -08:00
memcontrol-v1.h mm: memcg: declare do_memsw_account inline 2024-12-14 20:03:33 +01:00
memcontrol.c memcg: always call cond_resched() after fn() 2025-05-29 11:03:22 +02:00
memfd.c mm: reinstate ability to map write-sealed memfd mappings read-only 2025-01-09 13:33:54 +01:00
memory_hotplug.c mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper 2025-04-20 10:15:50 +02:00
memory-failure.c mm/hwpoison: do not send SIGBUS to processes with recovered clean pages 2025-04-20 10:15:49 +02:00
memory-tiers.c memory tiers: use default_dram_perf_ref_source in log message 2024-09-26 14:01:44 -07:00
memory.c mm: fix apply_to_existing_page_range() 2025-04-25 10:47:53 +02:00
mempolicy.c mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM 2024-12-14 20:03:32 +01:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c
migrate_device.c mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() 2025-02-27 04:30:22 -08:00
migrate.c mm/migrate: fix shmem xarray update during migration 2025-03-28 22:03:30 +01:00
mincore.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
mlock.c mm/mlock: set the correct prev on failure 2024-11-07 14:14:58 -08:00
mm_init.c mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION 2024-09-03 21:15:28 -07:00
mm_slot.h
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-07-03 19:30:26 -07:00
mmap.c mm: reinstate ability to map write-sealed memfd mappings read-only 2025-01-09 13:33:54 +01:00
mmu_gather.c
mmu_notifier.c mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
mmzone.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mprotect.c mm: refactor map_deny_write_exec() 2024-11-05 16:49:55 -08:00
mremap.c mm/mremap: correctly handle partial mremap() of VMA starting at 0 2025-04-20 10:15:49 +02:00
mseal.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
msync.c
nommu.c nommu: pass NULL argument to vma_iter_prealloc() 2024-11-11 17:20:23 -08:00
numa_emulation.c mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
numa_memblks.c mm: numa_clear_kernel_node_hotplug: Add NUMA_NO_NODE check for node id 2024-10-28 21:40:40 -07:00
numa.c mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
oom_kill.c memcg: fix soft lockup in the OOM process 2025-02-08 09:58:19 +01:00
page_alloc.c page_pool: Move pp_magic check into helper functions 2025-06-19 15:31:42 +02:00
page_counter.c mm, memcg: cg2 memory{.swap,}.peak write handlers 2024-09-01 20:25:53 -07:00
page_ext.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
page_idle.c
page_io.c mm: count zeromap read and set for swapout and swapin 2024-11-11 00:00:37 -08:00
page_isolation.c mm/hugetlb: wait for hugetlb folios to be freed 2025-03-22 12:54:28 -07:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-15 10:43:04 -07:00
page_vma_mapped.c mm: make page_mapped_in_vma() hugetlb walk aware 2025-04-20 10:15:49 +02:00
page-writeback.c mm: fix ratelimit_pages update error in dirty_ratio_handler() 2025-06-27 11:11:22 +01:00
pagewalk.c mm/pagewalk: fix usage of pmd_leaf()/pud_leaf() without present check 2024-10-28 21:40:38 -07:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c percpu: remove pcpu_alloc_size() 2024-09-01 20:26:04 -07:00
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-05-07 10:37:00 -07:00
process_vm_access.c
ptdump.c
readahead.c mm/readahead: fix large folio support in async readahead 2025-01-09 13:33:54 +01:00
rmap.c mm/rmap: reject hugetlb folios in folio_make_device_exclusive() 2025-04-20 10:15:49 +02:00
rodata_test.c
secretmem.c secretmem: disable memfd_secret() if arch cannot set direct map 2024-10-09 12:47:19 -07:00
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
shmem.c mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper 2025-04-20 10:15:50 +02:00
show_mem.c mm/show_mem.c: report alloc tags in human readable units 2024-09-17 01:07:00 -07:00
shrinker_debug.c mm: shrinker: use min() to improve shrinker_debugfs_scan_write() 2024-09-03 21:15:40 -07:00
shrinker.c mm: shrinker: avoid memleak in alloc_shrinker_info 2024-10-31 20:27:04 -07:00
shuffle.c
shuffle.h
slab_common.c slab: Fix too strict alignment check in create_cache() 2024-12-09 10:41:07 +01:00
slab.h mm/slub: Avoid list corruption when removing a slab from the full list 2024-12-09 10:41:04 +01:00
slub.c mm, slab: clean up slab->obj_exts always 2025-05-09 09:50:49 +02:00
sparse-vmemmap.c LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space 2024-10-21 22:11:19 +08:00
sparse.c A set of X86 fixes: 2024-09-01 14:43:08 -07:00
swap_cgroup.c mm: attempt to batch free swap entries for zap_pte_range() 2024-09-03 21:15:33 -07:00
swap_slots.c mm: swap: update get_swap_pages() to take folio order 2024-04-25 20:56:37 -07:00
swap_state.c mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios 2024-09-17 01:07:01 -07:00
swap.c mm: page_alloc: move mlocked flag clearance into free_pages_prepare() 2024-11-11 17:20:23 -08:00
swap.h mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
swapfile.c mm, swap: fix allocation and scanning race with swapoff 2024-11-14 15:25:07 -08:00
truncate.c mm: Fix missing folio invalidation calls during truncation 2024-08-24 16:09:16 +02:00
usercopy.c
userfaultfd.c mm: userfaultfd: correct dirty flags set for both present and swap pte 2025-05-22 14:29:50 +02:00
util.c mm: only enforce minimum stack gap size if it's sensible 2024-09-01 20:26:02 -07:00
vma_internal.h mm/hugetlb: unshare page tables during VMA split, not before 2025-06-27 11:11:40 +01:00
vma.c mm/hugetlb: unshare page tables during VMA split, not before 2025-06-27 11:11:40 +01:00
vma.h mm/vma: add give_up_on_oom option on modify/merge, use in uffd release 2025-04-25 10:48:06 +02:00
vmalloc.c mm: vmalloc: only zero-init on vrealloc shrink 2025-05-29 11:03:23 +02:00
vmpressure.c
vmscan.c mm/vmscan: don't try to reclaim hwpoison folio 2025-05-02 07:58:52 +02:00
vmstat.c vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event 2024-12-09 10:41:01 +01:00
workingset.c cachestat: do not flush stats in recency check 2024-07-03 22:40:37 -07:00
z3fold.c mm/z3fold: add __percpu annotation to *unbuddied pointer in struct z3fold_pool 2024-09-01 20:25:56 -07:00
zbud.c
zpool.c
zsmalloc.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
zswap.c mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead() 2025-04-10 14:39:40 +02:00