- The 2 patch series "mm: fixes for fallouts from mem_init() cleanup"

from Mike Rapoport fixes a couple of issues with the just-merged "arch,
   mm: reduce code duplication in mem_init()" series.
 
 - The 4 patch series "MAINTAINERS: add my isub-entries to MM part." from
   Mike Rapoport does some maintenance on MAINTAINERS.
 
 - The 6 patch series "remove tlb_remove_page_ptdesc()" from Qi Zheng
   does some cleanup work to the page mapping code.
 
 - The 7 patch series "mseal system mappings" from Jeff Xu permits
   sealing of "system mappings", such as vdso, vvar, vvar_vclock, vectors
   (arm compat-mode), sigpage (arm compat-mode).
 
 - Plus the usual shower of singleton patches.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+4XpgAKCRDdBJ7gKXxA
 jnwtAP43Rp3zyWf034fEypea36xQqcsy4I7YUTdZEgnFS7LCZwEApM97JvGHsYEr
 Ns9Zhnh+E3RWASfOAzJoVZVrAaMovg4=
 =MyVR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - The series "mm: fixes for fallouts from mem_init() cleanup" from Mike
   Rapoport fixes a couple of issues with the just-merged "arch, mm:
   reduce code duplication in mem_init()" series

 - The series "MAINTAINERS: add my isub-entries to MM part." from Mike
   Rapoport does some maintenance on MAINTAINERS

 - The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some
   cleanup work to the page mapping code

 - The series "mseal system mappings" from Jeff Xu permits sealing of
   "system mappings", such as vdso, vvar, vvar_vclock, vectors (arm
   compat-mode), sigpage (arm compat-mode)

 - Plus the usual shower of singleton patches

* tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  mseal sysmap: add arch-support txt
  mseal sysmap: enable s390
  selftest: test system mappings are sealed
  mseal sysmap: update mseal.rst
  mseal sysmap: uprobe mapping
  mseal sysmap: enable arm64
  mseal sysmap: enable x86-64
  mseal sysmap: generic vdso vvar mapping
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal sysmap: kernel config and header change
  mm: pgtable: remove tlb_remove_page_ptdesc()
  x86: pgtable: convert to use tlb_remove_ptdesc()
  riscv: pgtable: unconditionally use tlb_remove_ptdesc()
  mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
  mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
  mm: pgtable: make generic tlb_remove_table() use struct ptdesc
  microblaze/mm: put mm_cmdline_setup() in .init.text section
  mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
  MAINTAINERS: mm: add entry for secretmem
  MAINTAINERS: mm: add entry for numa memblocks and numa emulation
  ...
This commit is contained in:
Linus Torvalds 2025-04-03 11:10:00 -07:00
commit 8c7c1b5506
41 changed files with 417 additions and 128 deletions

View File

@ -0,0 +1,30 @@
#
# Feature name: mseal-system-mappings
# Kconfig: ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
# description: arch supports mseal system mappings
#
-----------------------
| arch |status|
-----------------------
| alpha: | TODO |
| arc: | N/A |
| arm: | N/A |
| arm64: | ok |
| csky: | N/A |
| hexagon: | N/A |
| loongarch: | TODO |
| m68k: | N/A |
| microblaze: | N/A |
| mips: | TODO |
| nios2: | N/A |
| openrisc: | N/A |
| parisc: | TODO |
| powerpc: | TODO |
| riscv: | TODO |
| s390: | ok |
| sh: | N/A |
| sparc: | TODO |
| um: | TODO |
| x86: | ok |
| xtensa: | N/A |
-----------------------

View File

@ -130,6 +130,27 @@ Use cases
- Chrome browser: protect some security sensitive data structures.
- System mappings:
The system mappings are created by the kernel and includes vdso, vvar,
vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode), uprobes.
Those system mappings are readonly only or execute only, memory sealing can
protect them from ever changing to writable or unmmap/remapped as different
attributes. This is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system.
If supported by an architecture (CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS),
the CONFIG_MSEAL_SYSTEM_MAPPINGS seals all system mappings of this
architecture.
The following architectures currently support this feature: x86-64, arm64,
and s390.
WARNING: This feature breaks programs which rely on relocating
or unmapping system mappings. Known broken software at the time
of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore
this config can't be enabled universally.
When not to use mseal
=====================
Applications can apply sealing to any virtual memory region from userspace,

View File

@ -15487,6 +15487,45 @@ F: tools/mm/
F: tools/testing/selftests/mm/
N: include/linux/page[-_]*
MEMORY MANAGEMENT - EXECMEM
M: Andrew Morton <akpm@linux-foundation.org>
M: Mike Rapoport <rppt@kernel.org>
L: linux-mm@kvack.org
S: Maintained
F: include/linux/execmem.h
F: mm/execmem.c
MEMORY MANAGEMENT - NUMA MEMBLOCKS AND NUMA EMULATION
M: Andrew Morton <akpm@linux-foundation.org>
M: Mike Rapoport <rppt@kernel.org>
L: linux-mm@kvack.org
S: Maintained
F: include/linux/numa_memblks.h
F: mm/numa.c
F: mm/numa_emulation.c
F: mm/numa_memblks.c
MEMORY MANAGEMENT - SECRETMEM
M: Andrew Morton <akpm@linux-foundation.org>
M: Mike Rapoport <rppt@kernel.org>
L: linux-mm@kvack.org
S: Maintained
F: include/linux/secretmem.h
F: mm/secretmem.c
MEMORY MANAGEMENT - USERFAULTFD
M: Andrew Morton <akpm@linux-foundation.org>
R: Peter Xu <peterx@redhat.com>
L: linux-mm@kvack.org
S: Maintained
F: Documentation/admin-guide/mm/userfaultfd.rst
F: fs/userfaultfd.c
F: include/asm-generic/pgtable_uffd.h
F: include/linux/userfaultfd_k.h
F: include/uapi/linux/userfaultfd.h
F: mm/userfaultfd.c
F: tools/testing/selftests/mm/uffd-*.[ch]
MEMORY MAPPING
M: Andrew Morton <akpm@linux-foundation.org>
M: Liam R. Howlett <Liam.Howlett@oracle.com>

View File

@ -38,6 +38,7 @@ config ARM64
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MEM_ENCRYPT
select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT

View File

@ -130,7 +130,8 @@ static int __setup_additional_pages(enum vdso_abi abi,
mm->context.vdso = (void *)vdso_base;
ret = _install_special_mapping(mm, vdso_base, vdso_text_len,
VM_READ|VM_EXEC|gp_flags|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC|
VM_SEALED_SYSMAP,
vdso_info[abi].cm);
if (IS_ERR(ret))
goto up_fail;
@ -256,7 +257,8 @@ static int aarch32_kuser_helpers_setup(struct mm_struct *mm)
*/
ret = _install_special_mapping(mm, AARCH32_VECTORS_BASE, PAGE_SIZE,
VM_READ | VM_EXEC |
VM_MAYREAD | VM_MAYEXEC,
VM_MAYREAD | VM_MAYEXEC |
VM_SEALED_SYSMAP,
&aarch32_vdso_maps[AA32_MAP_VECTORS]);
return PTR_ERR_OR_ZERO(ret);
@ -279,7 +281,8 @@ static int aarch32_sigreturn_setup(struct mm_struct *mm)
*/
ret = _install_special_mapping(mm, addr, PAGE_SIZE,
VM_READ | VM_EXEC | VM_MAYREAD |
VM_MAYWRITE | VM_MAYEXEC,
VM_MAYWRITE | VM_MAYEXEC |
VM_SEALED_SYSMAP,
&aarch32_vdso_maps[AA32_MAP_SIGPAGE]);
if (IS_ERR(ret))
goto out;

View File

@ -61,11 +61,8 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
return ret;
}
#define __pte_free_tlb(tlb, pte, address) \
do { \
pagetable_dtor(page_ptdesc(pte)); \
tlb_remove_page_ptdesc(tlb, page_ptdesc(pte)); \
} while (0)
#define __pte_free_tlb(tlb, pte, address) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
extern void pagetable_init(void);
extern void mmu_init(unsigned long min_pfn, unsigned long max_pfn);

View File

@ -87,10 +87,7 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd,
max_kernel_seg = pmdindex;
}
#define __pte_free_tlb(tlb, pte, addr) \
do { \
pagetable_dtor((page_ptdesc(pte))); \
tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte))); \
} while (0)
#define __pte_free_tlb(tlb, pte, addr) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#endif

View File

@ -55,11 +55,8 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
return pte;
}
#define __pte_free_tlb(tlb, pte, address) \
do { \
pagetable_dtor(page_ptdesc(pte)); \
tlb_remove_page_ptdesc((tlb), page_ptdesc(pte)); \
} while (0)
#define __pte_free_tlb(tlb, pte, address) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#ifndef __PAGETABLE_PMD_FOLDED

View File

@ -17,11 +17,8 @@
extern const char bad_pmd_string[];
#define __pte_free_tlb(tlb, pte, addr) \
do { \
pagetable_dtor(page_ptdesc(pte)); \
tlb_remove_page_ptdesc((tlb), page_ptdesc(pte)); \
} while (0)
#define __pte_free_tlb(tlb, pte, addr) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte)
{

View File

@ -118,7 +118,7 @@ int page_is_ram(unsigned long pfn)
/*
* Check for command-line options that affect what MMU_init will do.
*/
static void mm_cmdline_setup(void)
static void __init mm_cmdline_setup(void)
{
unsigned long maxmem = 0;
char *p = cmd_line;

View File

@ -48,11 +48,8 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
extern void pgd_init(void *addr);
extern pgd_t *pgd_alloc(struct mm_struct *mm);
#define __pte_free_tlb(tlb, pte, address) \
do { \
pagetable_dtor(page_ptdesc(pte)); \
tlb_remove_page_ptdesc((tlb), page_ptdesc(pte)); \
} while (0)
#define __pte_free_tlb(tlb, pte, address) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#ifndef __PAGETABLE_PMD_FOLDED

View File

@ -28,10 +28,7 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
extern pgd_t *pgd_alloc(struct mm_struct *mm);
#define __pte_free_tlb(tlb, pte, addr) \
do { \
pagetable_dtor(page_ptdesc(pte)); \
tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte))); \
} while (0)
#define __pte_free_tlb(tlb, pte, addr) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#endif /* _ASM_NIOS2_PGALLOC_H */

View File

@ -64,10 +64,7 @@ extern inline pgd_t *pgd_alloc(struct mm_struct *mm)
extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm);
#define __pte_free_tlb(tlb, pte, addr) \
do { \
pagetable_dtor(page_ptdesc(pte)); \
tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte))); \
} while (0)
#define __pte_free_tlb(tlb, pte, addr) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#endif

View File

@ -15,24 +15,6 @@
#define __HAVE_ARCH_PUD_FREE
#include <asm-generic/pgalloc.h>
/*
* While riscv platforms with riscv_ipi_for_rfence as true require an IPI to
* perform TLB shootdown, some platforms with riscv_ipi_for_rfence as false use
* SBI to perform TLB shootdown. To keep software pagetable walkers safe in this
* case we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
* comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in include/asm-generic/tlb.h
* for more details.
*/
static inline void riscv_tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
{
if (riscv_use_sbi_for_rfence()) {
tlb_remove_ptdesc(tlb, pt);
} else {
pagetable_dtor(pt);
tlb_remove_page_ptdesc(tlb, pt);
}
}
static inline void pmd_populate_kernel(struct mm_struct *mm,
pmd_t *pmd, pte_t *pte)
{
@ -108,14 +90,14 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
unsigned long addr)
{
if (pgtable_l4_enabled)
riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(pud));
tlb_remove_ptdesc(tlb, virt_to_ptdesc(pud));
}
static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
unsigned long addr)
{
if (pgtable_l5_enabled)
riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
}
#endif /* __PAGETABLE_PMD_FOLDED */
@ -143,7 +125,7 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
unsigned long addr)
{
riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(pmd));
tlb_remove_ptdesc(tlb, virt_to_ptdesc(pmd));
}
#endif /* __PAGETABLE_PMD_FOLDED */
@ -151,7 +133,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
unsigned long addr)
{
riscv_tlb_remove_ptdesc(tlb, page_ptdesc(pte));
tlb_remove_ptdesc(tlb, page_ptdesc(pte));
}
#endif /* CONFIG_MMU */

View File

@ -137,6 +137,7 @@ config S390
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_HUGETLBFS
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && CC_IS_CLANG
select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_USE_BUILTIN_BSWAP

View File

@ -80,7 +80,7 @@ static int map_vdso(unsigned long addr, unsigned long vdso_mapping_len)
vdso_text_start = vvar_start + VDSO_NR_PAGES * PAGE_SIZE;
/* VM_MAYWRITE for COW so gdb can set breakpoints */
vma = _install_special_mapping(mm, vdso_text_start, vdso_text_len,
VM_READ|VM_EXEC|
VM_READ|VM_EXEC|VM_SEALED_SYSMAP|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
vdso_mapping);
if (IS_ERR(vma)) {

View File

@ -32,10 +32,7 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
set_pmd(pmd, __pmd((unsigned long)page_address(pte)));
}
#define __pte_free_tlb(tlb, pte, addr) \
do { \
pagetable_dtor(page_ptdesc(pte)); \
tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte))); \
} while (0)
#define __pte_free_tlb(tlb, pte, addr) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#endif /* __ASM_SH_PGALLOC_H */

View File

@ -25,27 +25,18 @@
*/
extern pgd_t *pgd_alloc(struct mm_struct *);
#define __pte_free_tlb(tlb, pte, address) \
do { \
pagetable_dtor(page_ptdesc(pte)); \
tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte))); \
} while (0)
#define __pte_free_tlb(tlb, pte, address) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#if CONFIG_PGTABLE_LEVELS > 2
#define __pmd_free_tlb(tlb, pmd, address) \
do { \
pagetable_dtor(virt_to_ptdesc(pmd)); \
tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
} while (0)
#define __pmd_free_tlb(tlb, pmd, address) \
tlb_remove_ptdesc((tlb), virt_to_ptdesc(pmd))
#if CONFIG_PGTABLE_LEVELS > 3
#define __pud_free_tlb(tlb, pud, address) \
do { \
pagetable_dtor(virt_to_ptdesc(pud)); \
tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
} while (0)
#define __pud_free_tlb(tlb, pud, address) \
tlb_remove_ptdesc((tlb), virt_to_ptdesc(pud))
#endif
#endif

View File

@ -27,6 +27,7 @@ config X86_64
# Options that are inherently 64-bit kernel only:
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_PTDUMP
select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE

View File

@ -162,7 +162,8 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
text_start,
image->size,
VM_READ|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC|
VM_SEALED_SYSMAP,
&vdso_mapping);
if (IS_ERR(vma)) {
@ -181,7 +182,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
VDSO_VCLOCK_PAGES_START(addr),
VDSO_NR_VCLOCK_PAGES * PAGE_SIZE,
VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|
VM_PFNMAP,
VM_PFNMAP|VM_SEALED_SYSMAP,
&vvar_vclock_mapping);
if (IS_ERR(vma)) {

View File

@ -20,7 +20,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
{
paravirt_release_pte(page_to_pfn(pte));
tlb_remove_table(tlb, page_ptdesc(pte));
tlb_remove_ptdesc(tlb, page_ptdesc(pte));
}
#if CONFIG_PGTABLE_LEVELS > 2
@ -34,21 +34,21 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
#ifdef CONFIG_X86_PAE
tlb->need_flush_all = 1;
#endif
tlb_remove_table(tlb, virt_to_ptdesc(pmd));
tlb_remove_ptdesc(tlb, virt_to_ptdesc(pmd));
}
#if CONFIG_PGTABLE_LEVELS > 3
void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
{
paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
tlb_remove_table(tlb, virt_to_ptdesc(pud));
tlb_remove_ptdesc(tlb, virt_to_ptdesc(pud));
}
#if CONFIG_PGTABLE_LEVELS > 4
void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
{
paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
tlb_remove_table(tlb, virt_to_ptdesc(p4d));
tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
}
#endif /* CONFIG_PGTABLE_LEVELS > 4 */
#endif /* CONFIG_PGTABLE_LEVELS > 3 */

View File

@ -227,10 +227,10 @@ static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page);
*/
static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
{
struct page *page = (struct page *)table;
struct ptdesc *ptdesc = (struct ptdesc *)table;
pagetable_dtor(page_ptdesc(page));
tlb_remove_page(tlb, page);
pagetable_dtor(ptdesc);
tlb_remove_page(tlb, ptdesc_page(ptdesc));
}
#endif /* CONFIG_MMU_GATHER_TABLE_FREE */
@ -493,17 +493,11 @@ static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
return tlb_remove_page_size(tlb, page, PAGE_SIZE);
}
static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, struct ptdesc *pt)
{
tlb_remove_table(tlb, pt);
}
/* Like tlb_remove_ptdesc, but for page-like page directories. */
static inline void tlb_remove_page_ptdesc(struct mmu_gather *tlb, struct ptdesc *pt)
{
tlb_remove_page(tlb, ptdesc_page(pt));
}
static inline void tlb_change_page_size(struct mmu_gather *tlb,
unsigned int page_size)
{

View File

@ -4238,4 +4238,14 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st
int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
/*
* mseal of userspace process's system mappings.
*/
#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
#define VM_SEALED_SYSMAP VM_SEALED
#else
#define VM_SEALED_SYSMAP VM_NONE
#endif
#endif /* _LINUX_MM_H */

View File

@ -1888,6 +1888,28 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
config ARCH_HAS_MEMBARRIER_SYNC_CORE
bool
config ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
bool
help
Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
A 64-bit kernel is required for the memory sealing feature.
No specific hardware features from the CPU are needed.
To enable this feature, the architecture needs to update their
special mappings calls to include the sealing flag and confirm
that it doesn't unmap/remap system mappings during the life
time of the process. The existence of this flag for an architecture
implies that it does not require the remapping of the system
mappings during process lifetime, so sealing these mappings is safe
from a kernel perspective.
After the architecture enables this, a distribution can set
CONFIG_MSEAL_SYSTEM_MAPPING to manage access to the feature.
For complete descriptions of memory sealing, please see
Documentation/userspace-api/mseal.rst
config HAVE_PERF_EVENTS
bool
help

View File

@ -1703,7 +1703,8 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
}
vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE,
VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO,
VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|
VM_SEALED_SYSMAP,
&xol_mapping);
if (IS_ERR(vma)) {
ret = PTR_ERR(vma);

View File

@ -99,7 +99,8 @@ const struct vm_special_mapping vdso_vvar_mapping = {
struct vm_area_struct *vdso_install_vvar_mapping(struct mm_struct *mm, unsigned long addr)
{
return _install_special_mapping(mm, addr, VDSO_NR_PAGES * PAGE_SIZE,
VM_READ | VM_MAYREAD | VM_IO | VM_DONTDUMP | VM_PFNMAP,
VM_READ | VM_MAYREAD | VM_IO | VM_DONTDUMP |
VM_PFNMAP | VM_SEALED_SYSMAP,
&vdso_vvar_mapping);
}

View File

@ -76,14 +76,13 @@ int damon_register_ops(struct damon_operations *ops)
if (ops->id >= NR_DAMON_OPS)
return -EINVAL;
mutex_lock(&damon_ops_lock);
/* Fail for already registered ops */
if (__damon_is_registered_ops(ops->id)) {
if (__damon_is_registered_ops(ops->id))
err = -EINVAL;
goto out;
}
damon_registered_ops[ops->id] = *ops;
out:
else
damon_registered_ops[ops->id] = *ops;
mutex_unlock(&damon_ops_lock);
return err;
}

View File

@ -1073,14 +1073,11 @@ static void kmem_cache_rcu_uaf(struct kunit *test)
kmem_cache_destroy(cache);
}
static void empty_cache_ctor(void *object) { }
static void kmem_cache_double_destroy(struct kunit *test)
{
struct kmem_cache *cache;
/* Provide a constructor to prevent cache merging. */
cache = kmem_cache_create("test_cache", 200, 0, 0, empty_cache_ctor);
cache = kmem_cache_create("test_cache", 200, 0, SLAB_NO_MERGE, NULL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, cache);
kmem_cache_destroy(cache);
KUNIT_EXPECT_KASAN_FAIL(test, kmem_cache_destroy(cache));

View File

@ -2167,6 +2167,9 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
unsigned long start_pfn = PFN_UP(start);
unsigned long end_pfn = PFN_DOWN(end);
if (!IS_ENABLED(CONFIG_HIGHMEM) && end_pfn > max_low_pfn)
end_pfn = max_low_pfn;
if (start_pfn >= end_pfn)
return 0;

View File

@ -1813,21 +1813,15 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
page = pfn_to_page(pfn);
folio = page_folio(page);
/*
* No reference or lock is held on the folio, so it might
* be modified concurrently (e.g. split). As such,
* folio_nr_pages() may read garbage. This is fine as the outer
* loop will revisit the split folio later.
*/
if (folio_test_large(folio))
pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
if (!folio_try_get(folio))
continue;
if (unlikely(page_folio(page) != folio))
goto put_folio;
if (folio_test_large(folio))
pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
if (folio_contain_hwpoisoned_page(folio)) {
if (WARN_ON(folio_test_lru(folio)))
folio_isolate_lru(folio);

View File

@ -984,19 +984,19 @@ static void __init memmap_init(void)
}
}
#ifdef CONFIG_SPARSEMEM
/*
* Initialize the memory map for hole in the range [memory_end,
* section_end].
* section_end] for SPARSEMEM and in the range [memory_end, memmap_end]
* for FLATMEM.
* Append the pages in this hole to the highest zone in the last
* node.
* The call to init_unavailable_range() is outside the ifdef to
* silence the compiler warining about zone_id set but not used;
* for FLATMEM it is a nop anyway
*/
#ifdef CONFIG_SPARSEMEM
end_pfn = round_up(end_pfn, PAGES_PER_SECTION);
if (hole_pfn < end_pfn)
#else
end_pfn = round_up(end_pfn, MAX_ORDER_NR_PAGES);
#endif
if (hole_pfn < end_pfn)
init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
}

View File

@ -1561,11 +1561,12 @@ static unsigned long expand_vma_in_place(struct vma_remap_struct *vrm)
* adjacent to the expanded vma and otherwise
* compatible.
*/
vma = vrm->vma = vma_merge_extend(&vmi, vma, vrm->delta);
vma = vma_merge_extend(&vmi, vma, vrm->delta);
if (!vma) {
vrm_uncharge(vrm);
return -ENOMEM;
}
vrm->vma = vma;
vrm_stat_account(vrm, vrm->delta);

View File

@ -1593,7 +1593,7 @@ static __always_inline void page_del_and_expand(struct zone *zone,
static void check_new_page_bad(struct page *page)
{
if (unlikely(page->flags & __PG_HWPOISON)) {
if (unlikely(PageHWPoison(page))) {
/* Don't complain about hwpoisoned pages */
if (PageBuddy(page))
__ClearPageBuddy(page);
@ -4604,8 +4604,8 @@ retry:
goto retry;
/* Reclaim/compaction failed to prevent the fallback */
if (defrag_mode) {
alloc_flags &= ALLOC_NOFRAGMENT;
if (defrag_mode && (alloc_flags & ALLOC_NOFRAGMENT)) {
alloc_flags &= ~ALLOC_NOFRAGMENT;
goto retry;
}

View File

@ -51,6 +51,27 @@ config PROC_MEM_NO_FORCE
endchoice
config MSEAL_SYSTEM_MAPPINGS
bool "mseal system mappings"
depends on 64BIT
depends on ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
depends on !CHECKPOINT_RESTORE
help
Apply mseal on system mappings.
The system mappings includes vdso, vvar, vvar_vclock,
vectors (arm compat-mode), sigpage (arm compat-mode), uprobes.
A 64-bit kernel is required for the memory sealing feature.
No specific hardware features from the CPU are needed.
WARNING: This feature breaks programs which rely on relocating
or unmapping system mappings. Known broken software at the time
of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore
this config can't be enabled universally.
For complete descriptions of memory sealing, please see
Documentation/userspace-api/mseal.rst
config SECURITY
bool "Enable different security models"
depends on SYSFS

View File

@ -62,6 +62,7 @@ TARGETS += mount
TARGETS += mount_setattr
TARGETS += move_mount_set_group
TARGETS += mqueue
TARGETS += mseal_system_mappings
TARGETS += nci
TARGETS += net
TARGETS += net/af_unix

View File

@ -41,6 +41,31 @@ check_supported_x86_64()
fi
}
check_supported_ppc64()
{
local config="/proc/config.gz"
[[ -f "${config}" ]] || config="/boot/config-$(uname -r)"
[[ -f "${config}" ]] || fail "Cannot find kernel config in /proc or /boot"
local pg_table_levels=$(gzip -dcfq "${config}" | grep PGTABLE_LEVELS | cut -d'=' -f 2)
if [[ "${pg_table_levels}" -lt 5 ]]; then
echo "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to run this test"
exit $ksft_skip
fi
local mmu_support=$(grep -m1 "mmu" /proc/cpuinfo | awk '{print $3}')
if [[ "$mmu_support" != "radix" ]]; then
echo "$0: System does not use Radix MMU, required for 5-level paging"
exit $ksft_skip
fi
local hugepages_total=$(awk '/HugePages_Total/ {print $2}' /proc/meminfo)
if [[ "${hugepages_total}" -eq 0 ]]; then
echo "$0: HugePages are not enabled, required for some tests"
exit $ksft_skip
fi
}
check_test_requirements()
{
# The test supports x86_64 and powerpc64. We currently have no useful
@ -50,6 +75,9 @@ check_test_requirements()
"x86_64")
check_supported_x86_64
;;
"ppc64le"|"ppc64")
check_supported_ppc64
;;
*)
return 0
;;

View File

@ -0,0 +1,2 @@
# SPDX-License-Identifier: GPL-2.0-only
sysmap_is_sealed

View File

@ -0,0 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
CFLAGS += -std=c99 -pthread -Wall $(KHDR_INCLUDES)
TEST_GEN_PROGS := sysmap_is_sealed
include ../lib.mk

View File

@ -0,0 +1 @@
CONFIG_MSEAL_SYSTEM_MAPPINGS=y

View File

@ -0,0 +1,119 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* test system mappings are sealed when
* KCONFIG_MSEAL_SYSTEM_MAPPINGS=y
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <stdbool.h>
#include "../kselftest.h"
#include "../kselftest_harness.h"
#define VMFLAGS "VmFlags:"
#define MSEAL_FLAGS "sl"
#define MAX_LINE_LEN 512
bool has_mapping(char *name, FILE *maps)
{
char line[MAX_LINE_LEN];
while (fgets(line, sizeof(line), maps)) {
if (strstr(line, name))
return true;
}
return false;
}
bool mapping_is_sealed(char *name, FILE *maps)
{
char line[MAX_LINE_LEN];
while (fgets(line, sizeof(line), maps)) {
if (!strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
if (strstr(line, MSEAL_FLAGS))
return true;
return false;
}
}
return false;
}
FIXTURE(basic) {
FILE *maps;
};
FIXTURE_SETUP(basic)
{
self->maps = fopen("/proc/self/smaps", "r");
if (!self->maps)
SKIP(return, "Could not open /proc/self/smap, errno=%d",
errno);
};
FIXTURE_TEARDOWN(basic)
{
if (self->maps)
fclose(self->maps);
};
FIXTURE_VARIANT(basic)
{
char *name;
bool sealed;
};
FIXTURE_VARIANT_ADD(basic, vdso) {
.name = "[vdso]",
.sealed = true,
};
FIXTURE_VARIANT_ADD(basic, vvar) {
.name = "[vvar]",
.sealed = true,
};
FIXTURE_VARIANT_ADD(basic, vvar_vclock) {
.name = "[vvar_vclock]",
.sealed = true,
};
FIXTURE_VARIANT_ADD(basic, sigpage) {
.name = "[sigpage]",
.sealed = true,
};
FIXTURE_VARIANT_ADD(basic, vectors) {
.name = "[vectors]",
.sealed = true,
};
FIXTURE_VARIANT_ADD(basic, uprobes) {
.name = "[uprobes]",
.sealed = true,
};
FIXTURE_VARIANT_ADD(basic, stack) {
.name = "[stack]",
.sealed = false,
};
TEST_F(basic, check_sealed)
{
if (!has_mapping(variant->name, self->maps)) {
SKIP(return, "could not find the mapping, %s",
variant->name);
}
EXPECT_EQ(variant->sealed,
mapping_is_sealed(variant->name, self->maps));
};
TEST_HARNESS_MAIN

View File

@ -14,6 +14,7 @@
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <stdbool.h>
#include <sys/mman.h>
#include <sys/auxv.h>
@ -55,13 +56,55 @@ static int try_to_remap(void *vdso_addr, unsigned long size)
}
#define VDSO_NAME "[vdso]"
#define VMFLAGS "VmFlags:"
#define MSEAL_FLAGS "sl"
#define MAX_LINE_LEN 512
bool vdso_sealed(FILE *maps)
{
char line[MAX_LINE_LEN];
bool has_vdso = false;
while (fgets(line, sizeof(line), maps)) {
if (strstr(line, VDSO_NAME))
has_vdso = true;
if (has_vdso && !strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
if (strstr(line, MSEAL_FLAGS))
return true;
return false;
}
}
return false;
}
int main(int argc, char **argv, char **envp)
{
pid_t child;
FILE *maps;
ksft_print_header();
ksft_set_plan(1);
maps = fopen("/proc/self/smaps", "r");
if (!maps) {
ksft_test_result_skip(
"Could not open /proc/self/smaps, errno=%d\n",
errno);
return 0;
}
if (vdso_sealed(maps)) {
ksft_test_result_skip("vdso is sealed\n");
return 0;
}
fclose(maps);
child = fork();
if (child == -1)
ksft_exit_fail_msg("failed to fork (%d): %m\n", errno);