linux-yocto/arch/x86/mm
Balbir Singh 10aecdc1c3 x86/mm/init: Handle the special case of device private pages in add_pages(), to not increase max_pfn and trigger dma_addressing_limited() bounce buffers bounce buffers
commit 7170130e4c upstream.

As Bert Karwatzki reported, the following recent commit causes a
performance regression on AMD iGPU and dGPU systems:

  7ffb791423 ("x86/kaslr: Reduce KASLR entropy on most x86 systems")

It exposed a bug with nokaslr and zone device interaction.

The root cause of the bug is that, the GPU driver registers a zone
device private memory region. When KASLR is disabled or the above commit
is applied, the direct_map_physmem_end is set to much higher than 10 TiB
typically to the 64TiB address. When zone device private memory is added
to the system via add_pages(), it bumps up the max_pfn to the same
value. This causes dma_addressing_limited() to return true, since the
device cannot address memory all the way up to max_pfn.

This caused a regression for games played on the iGPU, as it resulted in
the DMA32 zone being used for GPU allocations.

Fix this by not bumping up max_pfn on x86 systems, when pgmap is passed
into add_pages(). The presence of pgmap is used to determine if device
private memory is being added via add_pages().

More details:

devm_request_mem_region() and request_free_mem_region() request for
device private memory. iomem_resource is passed as the base resource
with start and end parameters. iomem_resource's end depends on several
factors, including the platform and virtualization. On x86 for example
on bare metal, this value is set to boot_cpu_data.x86_phys_bits.
boot_cpu_data.x86_phys_bits can change depending on support for MKTME.
By default it is set to the same as log2(direct_map_physmem_end) which
is 46 to 52 bits depending on the number of levels in the page table.
The allocation routines used iomem_resource's end and
direct_map_physmem_end to figure out where to allocate the region.

[ arch/powerpc is also impacted by this problem, but this patch does not fix
  the issue for PowerPC. ]

Testing:

 1. Tested on a virtual machine with test_hmm for zone device inseration

 2. A previous version of this patch was tested by Bert, please see:
    https://lore.kernel.org/lkml/d87680bab997fdc9fb4e638983132af235d9a03a.camel@web.de/

[ mingo: Clarified the comments and the changelog. ]

Reported-by: Bert Karwatzki <spasswolf@web.de>
Tested-by: Bert Karwatzki <spasswolf@web.de>
Fixes: 7ffb791423 ("x86/kaslr: Reduce KASLR entropy on most x86 systems")
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Link: https://lore.kernel.org/r/20250401000752.249348-1-balbirs@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04 14:42:21 +02:00
..
pat x86/mm: Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW 2025-04-25 10:45:10 +02:00
amdtopology.c
cpu_entry_area.c
debug_pagetables.c
dump_pagetables.c
extable.c x86/extable: Remove unused fixup type EX_TYPE_COPY 2025-05-02 07:50:36 +02:00
fault.c x86/mm: Remove broken vsyscall emulation code from the page fault code 2024-06-12 11:11:29 +02:00
highmem_32.c
hugetlbpage.c
ident_map.c x86/mm/ident_map: Use gbpages only where full GB page should be mapped. 2025-02-17 09:40:43 +01:00
init_32.c
init_64.c x86/mm/init: Handle the special case of device private pages in add_pages(), to not increase max_pfn and trigger dma_addressing_limited() bounce buffers bounce buffers 2025-06-04 14:42:21 +02:00
init.c x86/mm: Check return value from memblock_phys_alloc_range() 2025-06-04 14:41:56 +02:00
iomap_32.c
ioremap.c x86/mm: Fix a kdump kernel failure on SME system when CONFIG_IMA_KEXEC=y 2024-11-22 15:38:33 +01:00
kasan_init_64.c mm/treewide: replace pud_large() with pud_leaf() 2024-04-10 16:35:46 +02:00
kaslr.c x86/kaslr: Reduce KASLR entropy on most x86 systems 2025-06-04 14:42:06 +02:00
kmmio.c
kmsan_shadow.c
maccess.c
Makefile
mem_encrypt_amd.c x86/sev: Skip ROM range scans and validation for SEV-SNP guests 2024-04-03 15:29:03 +02:00
mem_encrypt_boot.S
mem_encrypt_identity.c x86/sev: Add missing RIP_REL_REF() invocations during sme_enable() 2025-04-10 14:37:25 +02:00
mem_encrypt.c
mm_internal.h
mmap.c
mmio-mod.c
numa_32.c
numa_64.c
numa_emulation.c
numa_internal.h
numa.c x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node() 2025-01-17 13:36:14 +01:00
pf_in.c
pf_in.h
pgprot.c
pgtable_32.c
pgtable.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-06-16 13:47:40 +02:00
physaddr.c
physaddr.h
pkeys.c
pti.c x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables 2024-12-14 19:59:58 +01:00
srat.c
testmmiotrace.c
tlb.c x86/mm: Eliminate window where TLB flushes may be inadvertently skipped 2025-05-18 08:24:06 +02:00