Go to file
Dave Hansen d87392094f x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
commit fea4e317f9 upstream.

tl;dr: There is a window in the mm switching code where the new CR3 is
set and the CPU should be getting TLB flushes for the new mm.  But
should_flush_tlb() has a bug and suppresses the flush.  Fix it by
widening the window where should_flush_tlb() sends an IPI.

Long Version:

=== History ===

There were a few things leading up to this.

First, updating mm_cpumask() was observed to be too expensive, so it was
made lazier.  But being lazy caused too many unnecessary IPIs to CPUs
due to the now-lazy mm_cpumask().  So code was added to cull
mm_cpumask() periodically[2].  But that culling was a bit too aggressive
and skipped sending TLB flushes to CPUs that need them.  So here we are
again.

=== Problem ===

The too-aggressive code in should_flush_tlb() strikes in this window:

	// Turn on IPIs for this CPU/mm combination, but only
	// if should_flush_tlb() agrees:
	cpumask_set_cpu(cpu, mm_cpumask(next));

	next_tlb_gen = atomic64_read(&next->context.tlb_gen);
	choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
	load_new_mm_cr3(need_flush);
	// ^ After 'need_flush' is set to false, IPIs *MUST*
	// be sent to this CPU and not be ignored.

        this_cpu_write(cpu_tlbstate.loaded_mm, next);
	// ^ Not until this point does should_flush_tlb()
	// become true!

should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()
and writing to 'loaded_mm', which is a window where they should not be
suppressed.  Whoops.

=== Solution ===

Thankfully, the fuzzy "just about to write CR3" window is already marked
with loaded_mm==LOADED_MM_SWITCHING.  Simply checking for that state in
should_flush_tlb() is sufficient to ensure that the CPU is targeted with
an IPI.

This will cause more TLB flush IPIs.  But the window is relatively small
and I do not expect this to cause any kind of measurable performance
impact.

Update the comment where LOADED_MM_SWITCHING is written since it grew
yet another user.

Peter Z also raised a concern that should_flush_tlb() might not observe
'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()
writes them.  Add a barrier to ensure that they are observed in the
order they are written.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/ [1]
Fixes: 6db2526c1d ("x86/mm/tlb: Only trim the mm_cpumask once a second") [2]
Reported-by: Stephen Dolan <sdolan@janestreet.com>
Cc: stable@vger.kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18 08:24:51 +02:00
arch x86/mm: Eliminate window where TLB flushes may be inadvertently skipped 2025-05-18 08:24:51 +02:00
block blk-mq: create correct map for fallback case 2025-05-09 09:50:48 +02:00
certs
crypto crypto: Kconfig - Select LIB generic option 2025-05-02 07:59:33 +02:00
Documentation bpf: Add namespace to BPF internal symbols 2025-05-02 07:59:04 +02:00
drivers staging: axis-fifo: Correct handling of tx_fifo_depth for size validation 2025-05-18 08:24:51 +02:00
fs erofs: ensure the extra temporary copy is valid for shortened bvecs 2025-05-18 08:24:48 +02:00
include net: export a helper for adding up queue stats 2025-05-18 08:24:50 +02:00
init rust: clean Rust 1.88.0's unnecessary_transmutes lint 2025-05-18 08:24:51 +02:00
io_uring io_uring: always do atomic put from iowq 2025-05-02 07:59:21 +02:00
ipc ipc: fix memleak if msg_init_ns failed in create_ipc_ns 2024-12-05 14:03:02 +01:00
kernel kernel: globalize lookup_or_create_module_kobject() 2025-05-09 09:50:52 +02:00
lib crypto: lib/Kconfig - Hide arch options from user 2025-05-02 07:59:32 +02:00
LICENSES
mm mm, slab: clean up slab->obj_exts always 2025-05-09 09:50:49 +02:00
net net: export a helper for adding up queue stats 2025-05-18 08:24:50 +02:00
rust rust: clean Rust 1.88.0's unnecessary_transmutes lint 2025-05-18 08:24:51 +02:00
samples tracing: Verify event formats that have "%*p.." 2025-05-02 07:58:52 +02:00
scripts objtool: Silence more KCOV warnings, part 2 2025-05-02 07:59:33 +02:00
security landlock: Prepare to add second errata 2025-04-20 10:15:56 +02:00
sound ASoC: simple-card-utils: Fix pointer check in graph_util_parse_link_direction 2025-05-09 09:50:46 +02:00
tools objtool/rust: add one more noreturn Rust function for Rust 1.87.0 2025-05-18 08:24:51 +02:00
usr kbuild: hdrcheck: fix cross build with clang 2025-03-13 13:02:18 +01:00
virt KVM: Allow building irqbypass.ko as as module when kvm.ko is a module 2025-04-20 10:15:54 +02:00
.clang-format
.clippy.toml rust: clean Rust 1.88.0's warning about clippy::disallowed_macros configuration 2025-05-18 08:24:51 +02:00
.cocciconfig
.editorconfig
.get_maintainer.ignore
.gitattributes
.gitignore rust: introduce .clippy.toml 2025-03-13 13:01:42 +01:00
.mailmap mailmap: add entry for Thorsten Blum 2024-11-07 14:14:59 -08:00
.rustfmt.toml
COPYING
CREDITS MAINTAINERS: Remove self from DSA entry 2024-11-03 12:52:38 -08:00
Kbuild
Kconfig
MAINTAINERS MAINTAINERS: add entry for the Rust alloc module 2025-03-13 13:01:47 +01:00
Makefile Linux 6.12.28 2025-05-09 09:50:53 +02:00
README

Linux kernel

There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first.

In order to build the documentation, use make htmldocs or make pdfdocs. The formatted documentation can also be read online at:

https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory, several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.