linux-yocto/kernel
Lorenzo Stoakes 609a43e107 mm: drop the assumption that VM_SHARED always implies writable
[ Upstream commit e8e17ee90e ]

Patch series "permit write-sealed memfd read-only shared mappings", v4.

The man page for fcntl() describing memfd file seals states the following
about F_SEAL_WRITE:-

    Furthermore, trying to create new shared, writable memory-mappings via
    mmap(2) will also fail with EPERM.

With emphasis on 'writable'.  In turns out in fact that currently the
kernel simply disallows all new shared memory mappings for a memfd with
F_SEAL_WRITE applied, rendering this documentation inaccurate.

This matters because users are therefore unable to obtain a shared mapping
to a memfd after write sealing altogether, which limits their usefulness.
This was reported in the discussion thread [1] originating from a bug
report [2].

This is a product of both using the struct address_space->i_mmap_writable
atomic counter to determine whether writing may be permitted, and the
kernel adjusting this counter when any VM_SHARED mapping is performed and
more generally implicitly assuming VM_SHARED implies writable.

It seems sensible that we should only update this mapping if VM_MAYWRITE
is specified, i.e.  whether it is possible that this mapping could at any
point be written to.

If we do so then all we need to do to permit write seals to function as
documented is to clear VM_MAYWRITE when mapping read-only.  It turns out
this functionality already exists for F_SEAL_FUTURE_WRITE - we can
therefore simply adapt this logic to do the same for F_SEAL_WRITE.

We then hit a chicken and egg situation in mmap_region() where the check
for VM_MAYWRITE occurs before we are able to clear this flag.  To work
around this, perform this check after we invoke call_mmap(), with careful
consideration of error paths.

Thanks to Andy Lutomirski for the suggestion!

[1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/
[2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238

This patch (of 3):

There is a general assumption that VMAs with the VM_SHARED flag set are
writable.  If the VM_MAYWRITE flag is not set, then this is simply not the
case.

Update those checks which affect the struct address_space->i_mmap_writable
field to explicitly test for this by introducing
[vma_]is_shared_maywrite() helper functions.

This remains entirely conservative, as the lack of VM_MAYWRITE guarantees
that the VMA cannot be written to.

Link: https://lkml.kernel.org/r/cover.1697116581.git.lstoakes@gmail.com
Link: https://lkml.kernel.org/r/d978aefefa83ec42d18dfa964ad180dbcde34795.1697116581.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
[isaacmanjarres: resolved merge conflicts due to
due to refactoring that happened in upstream commit
5de195060b ("mm: resolve faulty mmap_region() error path behaviour")]
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28 16:22:54 +02:00
..
bpf bpf: fix precision backtracking instruction iteration 2025-06-27 11:04:25 +01:00
cgroup cgroup: Fix compilation issue due to cgroup_mutex not being exported 2025-06-04 14:36:58 +02:00
configs
debug kdb: Do not assume write() callback available 2025-03-13 12:47:22 +01:00
dma dma/contiguous: avoid warning about unused size_bytes 2025-05-02 07:41:14 +02:00
entry entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set 2023-01-04 11:39:22 +01:00
events perf/core: Prevent VMA split of buffer mappings 2025-08-28 16:22:38 +02:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:06:44 +01:00
gcov gcov: add support for GCC 14 2024-07-05 09:12:41 +02:00
irq genirq: Make handle_enforce_irqctx() unconditionally available 2025-03-13 12:46:45 +01:00
kcsan kcsan: Turn report_filterlist_lock into a raw_spinlock 2024-12-14 19:48:25 +01:00
livepatch kallsyms: refactor {,module_}kallsyms_on_each_symbol 2024-06-21 14:52:58 +02:00
locking locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class() 2025-05-02 07:40:55 +02:00
power PM: sleep: console: Fix the black screen issue 2025-08-28 16:22:41 +02:00
printk printk: Fix signed integer overflow when defining LOG_BUF_LEN_MAX 2025-03-13 12:47:01 +01:00
rcu rcu: Protect ->defer_qs_iw_pending from data race 2025-08-28 16:22:42 +02:00
sched cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS 2025-05-02 07:41:00 +02:00
time clocksource: Fix the CPUs' choice in the watchdog per CPU verification 2025-06-27 11:04:15 +01:00
trace tracing: Add down_write(trace_event_sem) when adding trace event 2025-08-28 16:22:53 +02:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
acct.c acct: perform last write from workqueue 2025-03-13 12:47:35 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-02-23 08:41:53 +01:00
audit_fsnotify.c fsnotify: make allow_dups a property of the group 2024-06-21 14:53:39 +02:00
audit_tree.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-06-21 14:53:39 +02:00
audit_watch.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-06-21 14:53:39 +02:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-23 08:42:03 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:40:00 +02:00
auditfilter.c ima: Avoid blocking in RCU read-side critical section 2024-07-18 13:05:44 +02:00
auditsc.c audit: fix possible soft lockup in __audit_inode_child() 2023-09-19 12:20:13 +02:00
backtracetest.c
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:23:46 +02:00
capability.c
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:23:45 +02:00
configs.c
context_tracking.c
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-09-15 09:50:40 +02:00
cpu.c hrtimers: Handle CPU state correctly on hotplug 2025-02-01 18:22:29 +01:00
crash_core.c crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo 2021-06-23 14:42:52 +02:00
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 15:44:30 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c perf: Fix sample vs do_exit() 2025-06-27 11:04:24 +01:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 16:40:18 +01:00
fork.c mm: drop the assumption that VM_SHARED always implies writable 2025-08-28 16:22:54 +02:00
freezer.c
gen_kheaders.sh kheaders: Ignore silly-rename files 2025-02-01 18:22:28 +01:00
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-20 16:05:58 +02:00
kallsyms.c kallsyms: Make kallsyms_on_each_symbol generally available 2024-10-17 15:08:29 +02:00
kcmp.c kcmp: In get_file_raw_ptr use task_lookup_fd_rcu 2024-06-21 14:52:48 +02:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: don't lose track of remote references during softirqs 2024-07-05 09:12:41 +02:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-27 08:43:40 +02:00
kexec_elf.c kexec: initialize ELF lowest address to ULONG_MAX 2025-04-10 14:30:58 +02:00
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 15:45:37 +02:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:10:29 +02:00
kexec.c panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:10:29 +02:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-17 11:47:33 +02:00
kmod.c
kprobes.c x86/ibt,ftrace: Search for __fentry__ location 2024-10-17 15:07:37 +02:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2023-04-20 12:10:29 +02:00
kthread.c kthread: fix task state in kthread worker if being frozen 2024-10-17 15:07:50 +02:00
latencytop.c
Makefile futex: Move to kernel/futex/ 2023-01-14 10:15:20 +01:00
module_signature.c
module_signing.c
module-internal.h
module.c kallsyms: Make module_kallsyms_on_each_symbol generally available 2024-10-17 15:08:29 +02:00
notifier.c
nsproxy.c
padata.c padata: do not leak refcount in reorder_work 2025-06-04 14:37:07 +02:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 12:59:40 +02:00
params.c module: ensure that kobject_put() is safe for module type kobjects 2025-06-04 14:36:55 +02:00
pid_namespace.c zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING 2024-07-05 09:12:33 +02:00
pid.c kernel/pid.c: implement additional checks upon pidfd_create() parameters 2024-06-21 14:53:17 +02:00
profile.c kernel: Initialize cpumask before parsing 2025-01-09 13:25:04 +01:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-09 10:20:49 +02:00
range.c
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 16:54:58 +00:00
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-17 11:47:34 +02:00
resource.c kernel/resource: fix kfree() of bootmem memory again 2025-05-02 07:41:09 +02:00
rseq.c rseq: Fix segfault on registration when rseq_cs is non-zero 2025-07-17 18:28:01 +02:00
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:01:05 +02:00
scs.c
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2024-03-01 13:16:46 +01:00
signal.c signal: Replace BUG_ON()s 2024-10-17 15:08:12 +02:00
smp.c smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu() 2024-09-12 11:06:48 +02:00
smpboot.c sched/core: Initialize the idle task with preemption disabled 2021-07-14 16:55:50 +02:00
smpboot.h
softirq.c
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:01:00 +01:00
stacktrace.c
static_call.c x86/static-call: provide a way to do very early static-call updates 2024-12-19 18:06:13 +01:00
stop_machine.c
sys_ni.c syscalls: fix compat_sys_io_pgetevents_time64 usage 2024-07-05 09:12:55 +02:00
sys.c fs: add file and path permissions helpers 2024-06-21 14:52:58 +02:00
sysctl-test.c
sysctl.c sysctl: introduce new proc handler proc_dobool 2024-06-21 14:53:18 +02:00
task_work.c task_work: Introduce task_work_cancel() again 2024-08-19 05:40:57 +02:00
taskstats.c
test_kprobes.c
torture.c torture: Fix hang during kthread shutdown phase 2023-08-30 16:23:17 +02:00
tracepoint.c tracepoint: Use rcu get state and cond sync for static call updates 2021-09-03 10:09:30 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 10:54:33 +01:00
ucount.c fanotify: configurable limits via sysfs 2024-06-21 14:53:06 +02:00
uid16.c
uid16.h
umh.c
up.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:50:46 +02:00
user_namespace.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
user-return-notifier.c
user.c
usermode_driver.c
utsname_sysctl.c
utsname.c
watch_queue.c watch_queue: fix pipe accounting mismatch 2025-04-10 14:30:55 +02:00
watchdog_hld.c watchdog/perf: properly initialize the turbo mode timestamp and rearm counter 2024-08-19 05:41:01 +02:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 16:54:56 +00:00
workqueue_internal.h
workqueue.c Revert "workqueue: remove unused cancel_work()" 2023-12-08 08:46:13 +01:00