linux-yocto/arch
Dev Jain fa93b45fd3 arm64: Enable vmalloc-huge with ptdump
Our goal is to move towards enabling vmalloc-huge by default on arm64 so
as to reduce TLB pressure. Therefore, we need a way to analyze the portion
of block mappings in vmalloc space we can get on a production system; this
can be done through ptdump, but currently we disable vmalloc-huge if
CONFIG_PTDUMP_DEBUGFS is on. The reason is that lazy freeing of kernel
pagetables via vmap_try_huge_pxd() may race with ptdump, so ptdump
may dereference a bogus address.

To solve this, we need to synchronize ptdump_walk() and ptdump_check_wx()
with pud_free_pmd_page() and pmd_free_pte_page().

Since this race is very unlikely to happen in practice, we do not want to
penalize the vmalloc pagetable tearing path by taking the init_mm
mmap_lock. Therefore, we use static keys. ptdump_walk() and
ptdump_check_wx() are the pagetable walkers; they will enable the static
key - upon observing that, the vmalloc pagetable tearing path will get
patched in with an mmap_read_lock/unlock sequence. A combination of the
patched-in mmap_read_lock/unlock, the acquire semantics of
static_branch_inc(), and the barriers in __flush_tlb_kernel_pgtable()
ensures that ptdump will never get a hold on the address of a freed PMD
or PTE table.

We can verify the correctness of the algorithm via the following litmus
test (thanks to James Houghton and Will Deacon):

AArch64 ptdump
Variant=Ifetch
{
uint64_t pud=0xa110c;
uint64_t pmd;

0:X0=label:"P1:L0"; 0:X1=instr:"NOP"; 0:X2=lock; 0:X3=pud; 0:X4=pmd;
                    1:X1=0xdead;      1:X2=lock; 1:X3=pud; 1:X4=pmd;
}
 P0				| P1				;
 (* static_key_enable *)	| (* pud_free_pmd_page *)	;
 STR	W1, [X0]		| LDR	X9, [X3]		;
 DC	CVAU,X0			| STR	XZR, [X3]		;
 DSB	ISH			| DSB	ISH			;
 IC	IVAU,X0			| ISB				;
 DSB	ISH			|				;
 ISB				| (* static key *)		;
				| L0:				;
 (* mmap_lock *)		| B	out1			;
 Lwlock:			|				;
 MOV	W7, #1			| (* mmap_lock *)		;
 SWPA	W7, W8, [X2]		| Lrlock:			;
				| MOV	W7, #1			;
				| SWPA	W7, W8, [X2]		;
 (* walk pgtable *)		|				;
 LDR	X9, [X3]		| (* mmap_unlock *)		;
 CBZ	X9, out0		| STLR	WZR, [X2]		;
 EOR	X10, X9, X9		|				;
 LDR	X11, [X4, X10]		| out1:				;
				| EOR	X10, X9, X9		;
 out0:				| STR	X1, [X4, X10]		;

exists (0:X8=0 /\ 1:X8=0 /\	(* Lock acquisitions succeed *)
	0:X9=0xa110c /\		(* P0 sees the valid PUD ...*)
	0:X11=0xdead)		(* ... but the freed PMD *)

For an approximate written proof of why this algorithm works, please read
the code comment in [1], which is now removed for the sake of simplicity.

mm-selftests pass. No issues were observed while parallelly running
test_vmalloc.sh (which stresses the vmalloc subsystem),
and cat /sys/kernel/debug/{kernel_page_tables, check_wx_pages} in a loop.

Link: https://lore.kernel.org/all/20250723161827.15802-1-dev.jain@arm.com/ [1]
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-22 11:53:24 +01:00
..
alpha Significant patch series in this pull request: 2025-08-03 16:23:09 -07:00
arc
arm gpio updates for v6.17-rc1 2025-08-09 08:15:43 +03:00
arm64 arm64: Enable vmalloc-huge with ptdump 2025-09-22 11:53:24 +01:00
csky
hexagon
loongarch LoongArch changes for v6.17 2025-08-08 06:36:48 +03:00
m68k treewide: rename GPIO set callbacks back to their original names 2025-08-07 10:07:06 +02:00
microblaze
mips treewide: rename GPIO set callbacks back to their original names 2025-08-07 10:07:06 +02:00
nios2
openrisc OpenRISC updates for 6.17 2025-08-04 08:37:46 -07:00
parisc parisc architecture fixes for kernel v6.17-rc1: 2025-08-01 16:15:53 -07:00
powerpc treewide: rename GPIO set callbacks back to their original names 2025-08-07 10:07:06 +02:00
riscv Significant patch series in this pull request: 2025-08-03 16:23:09 -07:00
s390 more s390 updates for 6.17 merge window 2025-08-08 06:56:55 +03:00
sh Significant patch series in this pull request: 2025-08-03 16:23:09 -07:00
sparc Summary of significant series in this pull request: 2025-07-31 14:57:54 -07:00
um
x86 - Fix an interrupt vector setup race which leads to a non-functioning device 2025-08-10 08:15:32 +03:00
xtensa Xtensa updates for v6.17 2025-08-09 07:35:03 +03:00
.gitignore
Kconfig Deferred unwind changes for 6.17 2025-08-01 09:46:24 -07:00