Commit Graph

17957 Commits

Author SHA1 Message Date
Linus Torvalds
bdd4f86c97 AT_EXECVE_CHECK update for v6.14-rc1 (fix1)
- selftests: Handle old glibc without execveat(2) (Mickaël Salaün)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ51DjgAKCRA2KwveOeQk
 u5ahAP9m2RJdQm/oW/SPdhZ3nJynrD0UXKpZPYe733E9D2mccQEAvh0LIAUJGJoK
 FbpLRWSGXOkWxAJ1oabQo8GB5v+8EQs=
 =WVRq
 -----END PGP SIGNATURE-----

Merge tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull AT_EXECVE_CHECK selftest fix from Kees Cook:
 "Fixes the AT_EXECVE_CHECK selftests which didn't run on old versions
  of glibc"

* tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests: Handle old glibc without execveat(2)
2025-01-31 17:12:31 -08:00
Linus Torvalds
1b5f3c51fb RISC-V Patches for the 6.14 Merge Window, Part 1
* The PH1520 pinctrl and dwmac drivers are enabeled in defconfig.
 * A redundant AQRL barrier has been removed from the futex cmpxchg
   implementation.
 * Support for the T-Head vector extensions, which includes exposing
   these extensions to userspace on systems that implement them.
 * Some more page table information is now printed on die() and systems
   that cause PA overflows.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmedHIoTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYievXD/4hdt8h+fMM0I9mmJS096YevRJONdfe
 Wk7D5q4PBwSHISHahuzfphieBhqPVnYkkEd7Vw6xRrLbUnhA41Fe0uvR52dx5UZd
 3LwrDV/kjGTD59x6A2Zo9bSs/qPKJ2WHmHwHM21jY5tvcIB2Lo4dF8HT63OrwVNW
 DxsujLO0jUw+HEwXPsfmUAZJWOPZuUnatl/9CaLMLwQv5N7yiMuz5oYDzJXTLnNh
 m3Hv3CCtj1EeQPqDoWzz9nZvmAKOwcblSzz6OAy+xrRk1N0N3QFQPbIaRvkI9OVz
 +wPHQiyx4KZNeAe0csV0uLQRIiXZV8rkCz5UT65s3Bfy3vukvzz+1VBdNnCqiP8Q
 RpCTcYw62Cr6BWnvyTh+s9bhHb1ijG043nXd/Ty7ZRPCNLKHY6oL1CZ0pgqbTwPs
 D2U2ZTZFTc35mPrU6QMfbTiUVWCU2XagFhI27Dgj3xh9mkBOQCHwk2Mrzn7uS4iz
 xGNnrjRnKtuwBrvD68JzxCkEi8INFn2ifbVr44VZrOdTM7XtODGAYrBohQtV62kU
 2L+q8DoHYis+0xFbR1wdrY1mRZoe45boUFgwnOpmoBr9ULe584sL+526y7IkkEHu
 /9hmLPtLg7nyoR/rO1j1Sfg4Eqdwg5HY1TKNfagJZAdu23EDRwrcW1PD0P6vtDv8
 j4og8MmL7dTt3A==
 =HbAQ
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - The PH1520 pinctrl and dwmac drivers are enabeled in defconfig

 - A redundant AQRL barrier has been removed from the futex cmpxchg
   implementation

 - Support for the T-Head vector extensions, which includes exposing
   these extensions to userspace on systems that implement them

 - Some more page table information is now printed on die() and systems
   that cause PA overflows

* tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: add a warning when physical memory address overflows
  riscv/mm/fault: add show_pte() before die()
  riscv: Add ghostwrite vulnerability
  selftests: riscv: Support xtheadvector in vector tests
  selftests: riscv: Fix vector tests
  riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
  riscv: hwprobe: Add thead vendor extension probing
  riscv: vector: Support xtheadvector save/restore
  riscv: Add xtheadvector instruction definitions
  riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
  RISC-V: define the elements of the VCSR vector CSR
  riscv: vector: Use vlenb from DT for thead
  riscv: Add thead and xtheadvector as a vendor extension
  riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
  dt-bindings: cpus: add a thead vlen register length property
  dt-bindings: riscv: Add xtheadvector ISA extension description
  RISC-V: Mark riscv_v_init() as __init
  riscv: defconfig: drop RT_GROUP_SCHED=y
  riscv/futex: Optimize atomic cmpxchg
  riscv: defconfig: enable pinctrl and dwmac support for TH1520
2025-01-31 15:13:25 -08:00
Linus Torvalds
c545cd3276 x86/mm changes for v6.14:
- The biggest changes are the TLB flushing scalability optimizations,
    to update the mm_cpumask lazily and related changes. This feature
    has both a track record and a continued risk of performance regressions,
    so it was already delayed by a cycle - but it's all 100% perfect now™.
    (Rik van Riel)
 
  - Also miscellaneous fixes and cleanups. (Gautam Somani,
    Kirill A. Shutemov, Sebastian Andrzej Siewior)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmeclXoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iDixAAjmTv/3KBuXaW/EoqGkyr/dgJld/Cww5a
 4yyM6pbVOkiP+pmSTiHChhn07A4eB1TMCP0RJHXUgsCr6VLY8+68MdafCMIn9hWK
 mZYbCFF2yWy2EP4a26ifTi/3P355x5WILxJH5K4fHxcsXjRy5LgCLaq0tObEqnZ8
 OAGIBw+g3t7CYurqlKfYiVSUiUG8PbXbS9Bh/0SjRe5FRbJDre3XJy9ks2c83wHU
 anPe5qpkw3mg8hPiFQfv3EYyGe1NhAs9hBMYLKqUyyxZEixymZDsvjYnOe154OMI
 9xk3XpeFFejwvBJ1pfSS3V5svm5sqtnRpZSivUl/gsT7LM65N8RqKMrTvcpT+fm7
 cQs8JK3LP+S2ih3S4wTZRdVGnIQGzqHkp9R6e8T4r9FQ2688mk/OvqJOCZEAcPgx
 VRHiMXtgZ3e8OsMiY+82TGt9wyujCR/kk+hzgXtNC1Lr++jCz848n3UcUe+wvzzw
 Lo8LGGdAzBRviwiwwrRxCYKtlUtkIwbIKtfswv5pfapji2cTHckhvuKAcujpvaXd
 +qgnX8XNVZWoG57tN02jZ8ZgAFgZlV2A03WG5e0c1wb4/3AnGQDGpCEWX2/lMj1J
 U/FFwNA6+jzcVMYyN/LQAETv0Go7sJOVTTie7mAHEhyHvxvb2YfV9VJ60V2WBKn5
 znIuU0l2qyQ=
 =g00u
 -----END PGP SIGNATURE-----

Merge tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Ingo Molnar:

 - The biggest changes are the TLB flushing scalability optimizations,
   to update the mm_cpumask lazily and related changes.

   This feature has both a track record and a continued risk of
   performance regressions, so it was already delayed by a cycle - but
   it's all 100% perfect now™ (Rik van Riel)

 - Also miscellaneous fixes and cleanups. (Gautam Somani, Kirill
   Shutemov, Sebastian Andrzej Siewior)

* tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Remove unnecessary include of <linux/extable.h>
  x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
  x86/mm/selftests: Fix typo in lam.c
  x86/mm/tlb: Only trim the mm_cpumask once a second
  x86/mm/tlb: Also remove local CPU from mm_cpumask if stale
  x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU
  x86/mm/tlb: Update mm_cpumask lazily
2025-01-31 10:39:07 -08:00
Linus Torvalds
c2933b2bef First batch of fixes for 6.14. Nothing really stands out,
but as usual there's a slight concentration of fixes for issues
 added in the last two weeks before the MW, and driver bugs
 from 6.13 which tend to get discovered upon wider distribution.
 
 Including fixes from IPSec, netfilter and Bluetooth.
 
 Current release - regressions:
 
  - net: revert RTNL changes in unregister_netdevice_many_notify()
 
  - Bluetooth: fix possible infinite recursion of btusb_reset
 
  - eth: adjust locking in some old drivers which protect their state
 	with spinlocks to avoid sleeping in atomic; core protects
 	netdev state with a mutex now
 
 Previous releases - regressions:
 
  - eth: mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node()
 
  - eth: bgmac: reduce max frame size to support just 1500 bytes;
 	the jumbo frame support would previously cause OOB writes,
 	but now fails outright
 
  - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted,
 	avoid false detection of MPTCP blackholing
 
 Previous releases - always broken:
 
  - mptcp: handle fastopen disconnect correctly
 
  - xfrm: make sure skb->sk is a full sock before accessing its fields
 
  - xfrm: fix taking a lock with preempt disabled for RT kernels
 
  - usb: ipheth: improve safety of packet metadata parsing; prevent
 	potential OOB accesses
 
  - eth: renesas: fix missing rtnl lock in suspend/resume path
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmebzXsACgkQMUZtbf5S
 IrvGBQ//auOF2yY1sg40fBvc6Hr1jpZBcr+uqTL6Qka1uVOvTFY51hAN54lBt32+
 ixmcHsD0xdcHrr7VrqSXqurQLiGsdwUpnxZFCj/FymQuMunVysEqudvPeKDVHpsw
 JW5c4nJOexEA2viByK9iB23Qq0P3uBoPEnKrbSTVSDvYaXUj6y8Cvt3/vXc+H/tc
 T7GaxHH55NNNPkRz34YU3OWcaZsgkQEcdVpZf4tODPmg7J5VQj8SQeMhk/HI0sdO
 WKjWB0woZkiQECtamqAOXnv47PXd6igv8NALRPlJcKjs0EszUvuYhD/9MEOeghjI
 sjcQn9JnPpG+ca/qFVCSpEEOo2zGVn5dkJT5x26udH+5XHf7Pq+zpJwB6LHo98yF
 bGMpIrF6gi2EnBtS/tRjMyBU9Ut9KiUtjXMvn9EsD1U1FGIbz6wyQLlT0pG0hwJb
 rEdKfegrcyhWKHOD4vH9ciEg/7lgGfsGyfJDktIMdyailZc6tBcxwbdlc+5jRDA9
 0RqGASIXaiX7AC3WOSeQzMgbV+WXdhaX/yrMJL5KBfBTzxG2audnJt1tPN3mbh3z
 NM6M2cnMsoX4QLSiaukJaCL7LWHSTlVttZVg8FGXHj1PejMQQBVjGVvzk2UF55UR
 gV7X9/VkXhmIDAZgThWtOdPLz+ItksfSKiruhUsXust6JgqRuqc=
 =GjBE
 -----END PGP SIGNATURE-----

Merge tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from IPSec, netfilter and Bluetooth.

  Nothing really stands out, but as usual there's a slight concentration
  of fixes for issues added in the last two weeks before the merge
  window, and driver bugs from 6.13 which tend to get discovered upon
  wider distribution.

  Current release - regressions:

   - net: revert RTNL changes in unregister_netdevice_many_notify()

   - Bluetooth: fix possible infinite recursion of btusb_reset

   - eth: adjust locking in some old drivers which protect their state
     with spinlocks to avoid sleeping in atomic; core protects netdev
     state with a mutex now

  Previous releases - regressions:

   - eth:
      - mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node()
      - bgmac: reduce max frame size to support just 1500 bytes; the
        jumbo frame support would previously cause OOB writes, but now
        fails outright

   - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid
     false detection of MPTCP blackholing

  Previous releases - always broken:

   - mptcp: handle fastopen disconnect correctly

   - xfrm:
      - make sure skb->sk is a full sock before accessing its fields
      - fix taking a lock with preempt disabled for RT kernels

   - usb: ipheth: improve safety of packet metadata parsing; prevent
     potential OOB accesses

   - eth: renesas: fix missing rtnl lock in suspend/resume path"

* tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  MAINTAINERS: add Neal to TCP maintainers
  net: revert RTNL changes in unregister_netdevice_many_notify()
  net: hsr: fix fill_frame_info() regression vs VLAN packets
  doc: mptcp: sysctl: blackhole_timeout is per-netns
  mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
  netfilter: nf_tables: reject mismatching sum of field_len with set key length
  net: sh_eth: Fix missing rtnl lock in suspend/resume path
  net: ravb: Fix missing rtnl lock in suspend/resume path
  selftests/net: Add test for loading devbound XDP program in generic mode
  net: xdp: Disallow attaching device-bound programs in generic mode
  tcp: correct handling of extreme memory squeeze
  bgmac: reduce max frame size to support just MTU 1500
  vsock/test: Add test for connect() retries
  vsock/test: Add test for UAF due to socket unbinding
  vsock/test: Introduce vsock_connect_fd()
  vsock/test: Introduce vsock_bind()
  vsock: Allow retrying on connect() failure
  vsock: Keep the binding until socket destruction
  Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection
  Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming
  ...
2025-01-30 12:24:20 -08:00
Linus Torvalds
90cb220062 gpio fixes for v6.14-rc1
- update gpio-sim selftests to not fail now that we no longer allow
   rmdir() on configfs entries of active devices
 - remove leftover code from gpio-mxc
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmebRzwACgkQEacuoBRx
 13KUEQ/+KAAVHAr3l3bFG9POniRk6fZB2kyWeOUvGGmW1qBHjRF50wKsV0+EjhEi
 0jm5HX92tvqTfci/jCkZbw0eIZ72P9QXqgBKo+Hare+7Q0bX/eddBgjNTyaAyjLW
 VHwOoP3M7jdsb5vBmgMOIbsJHQ8Q3TscSN5ndFfQ04qZ0ahAczV86rQQRiFaxhle
 rr8XrICLlOnLS9YV4yYgJhNulIMvhLhcz70gyYNU6UOfEHNm3ZblBgoOWrPAvd+d
 3lZ+4ZDNaAPyBhW2a64FDs1e+/9VpFfrp3CNGecM8tV8swd33atpeNsClPqFPJYR
 pLir57iVbH3FiE76PdifR128toaH3+pSrPNuL7Gi2oVsdA5aN2l0y+YuXIFgpP+7
 /FCLKEfcY1G/Kp9nai0FG3ltAAkbaULi06dYdgXAC+xv391j1p9EI9iz9ZifpAA8
 I1Apa36E+MDgku52eiL16isNsLESE3ky/URsf8zLe8bRUEKCF6YsVoOLe7RDl5CQ
 fDgJx7xALIm8vk2jizmTmMIDVgY5leLgXS4pVHvEVBAg8zZe0OH5SRyYan82pv6t
 asw4RfEu7J7l5k5fgqWf2p6H4cpU2SuI35YdoTjYReGbExabnkylZTyoRjkGLX7o
 NqIktzN4nJzyGM0VAkOVnj5RKTlvCKSVLg0Qk3srXeG42qJ3Cqw=
 =Ql6L
 -----END PGP SIGNATURE-----

Merge tag 'gpio-fixes-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - update gpio-sim selftests to not fail now that we no longer allow
   rmdir() on configfs entries of active devices

 - remove leftover code from gpio-mxc

* tag 'gpio-fixes-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  selftests: gpio: gpio-sim: Fix missing chip disablements
  gpio: mxc: remove dead code after switch to DT-only
2025-01-30 10:19:30 -08:00
Linus Torvalds
d3d90cc289 Provide stable parent and name to ->d_revalidate() instances
Most of the filesystem methods where we care about dentry name
 and parent have their stability guaranteed by the callers;
 ->d_revalidate() is the major exception.
 
 It's easy enough for callers to supply stable values for
 expected name and expected parent of the dentry being
 validated.  That kills quite a bit of boilerplate in
 ->d_revalidate() instances, along with a bunch of races
 where they used to access ->d_name without sufficient
 precautions.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5gkoQAKCRBZ7Krx/gZQ
 6w9FAP4nyxNNWMjE1TwuWR/DNDMYYuw/qn/miZ88B5BUM8hzqgD/W2SjRvcbSaIm
 xSIYpbtKgtqNU34P1PU+dBvL8Utz2AE=
 =TWY8
 -----END PGP SIGNATURE-----

Merge tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs d_revalidate updates from Al Viro:
 "Provide stable parent and name to ->d_revalidate() instances

  Most of the filesystem methods where we care about dentry name and
  parent have their stability guaranteed by the callers;
  ->d_revalidate() is the major exception.

  It's easy enough for callers to supply stable values for expected name
  and expected parent of the dentry being validated. That kills quite a
  bit of boilerplate in ->d_revalidate() instances, along with a bunch
  of races where they used to access ->d_name without sufficient
  precautions"

* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  9p: fix ->rename_sem exclusion
  orangefs_d_revalidate(): use stable parent inode and name passed by caller
  ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
  nfs: fix ->d_revalidate() UAF on ->d_name accesses
  nfs{,4}_lookup_validate(): use stable parent inode passed by caller
  gfs2_drevalidate(): use stable parent inode and name passed by caller
  fuse_dentry_revalidate(): use stable parent inode and name passed by caller
  vfat_revalidate{,_ci}(): use stable parent inode passed by caller
  exfat_d_revalidate(): use stable parent inode passed by caller
  fscrypt_d_revalidate(): use stable parent inode passed by caller
  ceph_d_revalidate(): propagate stable name down into request encoding
  ceph_d_revalidate(): use stable parent inode passed by caller
  afs_d_revalidate(): use stable name and parent inode passed by caller
  Pass parent directory inode and expected name to ->d_revalidate()
  generic_ci_d_compare(): use shortname_storage
  ext4 fast_commit: make use of name_snapshot primitives
  dissolve external_name.u into separate members
  make take_dentry_name_snapshot() lockless
  dcache: back inline names with a struct-wrapped array of unsigned long
  make sure that DNAME_INLINE_LEN is a multiple of word size
2025-01-30 09:13:35 -08:00
Toke Høiland-Jørgensen
f7bf624b1f selftests/net: Add test for loading devbound XDP program in generic mode
Add a test to bpf_offload.py for loading a devbound XDP program in
generic mode, checking that it fails correctly.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250127131344.238147-2-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29 19:04:23 -08:00
Michal Luczaj
4695f64e02 vsock/test: Add test for connect() retries
Deliberately fail a connect() attempt; expect error. Then verify that
subsequent attempt (using the same socket) can still succeed, rather than
fail outright.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-6-1cf57065b770@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29 18:50:37 -08:00
Michal Luczaj
301a62dfb0 vsock/test: Add test for UAF due to socket unbinding
Fail the autobind, then trigger a transport reassign. Socket might get
unbound from unbound_sockets, which then leads to a reference count
underflow.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-5-1cf57065b770@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29 18:50:37 -08:00
Michal Luczaj
ac12b7e291 vsock/test: Introduce vsock_connect_fd()
Distill timeout-guarded vsock_connect_fd(). Adapt callers.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-4-1cf57065b770@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29 18:50:37 -08:00
Michal Luczaj
852a00c428 vsock/test: Introduce vsock_bind()
Add a helper for socket()+bind(). Adapt callers.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-3-1cf57065b770@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29 18:50:37 -08:00
Linus Torvalds
9071080d1e cxl changes for v6.14
- Move HMAT printouts to pr_debug()
 - Add CXL type2 support to cxl_dvsec_rr_decode() in preparation for
   type2 support
 - A series that updates CXL event records to spec r3.1 and related
   changes
 - Refactoring of cxl_find_regblock_instance() to count regblocks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmeaUgcACgkQYGjFFmlT
 OEqheg/+Iz7d1uBmyEcgf9JX0XAoPuG7uAPQcgSDK1u4gG0WxuBn+rc4Vprukml5
 0wn1n6LVQg3lkQp9oNQKwEMt1es7Y2RDOaPd17eiBxFoBPkmHIF8fl2nhoLzCSco
 j23kdKsySW5wcT7FdppVT0cRdMQM/9ecJhR7mribdIxNwmPP2D7knnOYjytQjRFp
 C65EGZ97cn8CAJt7zdBRLNcgzJc4iVfT7jyRbtFvA/Bx4WU7qSCiwh8c2KZ4U3wE
 piCoMnsv/RwMjbGZNmEt7dTC3RIawWajGchWJ9aolXYidWm6sxpx6JOgzaGWJK5q
 19fsb978SnqA6tr4qNmTgohl20gkRiqQZnYWZoKlyIZeBA+IqMdpl0YosSVNvS85
 kBlau9eMmZVIxJlHevQ3+5NBCzzXz2rLnORAJUUCgHnuWeKU/+t31xK2lfbMJet9
 Q6wkJC4kz1u6xLBP1aEnkVtQE0RY1UmUw4aU0L3jiLF6IGRF9wBRW/bFBKTEYLul
 7cL7R5rwFxIwvR1uNIQVVnSTUzHUeA1brQS1I6QJP2SNzVl2uew1Xlx6KrLt3kQw
 zOuQFAvMa3L/HWjfDJYvcxfzhYN2ng3ec3XjyGN75S5jrb7a0PZ5vOmxZ6PFvEZE
 8HFHfQqvpdO0e1SLWgLv7OBENM3GgLovbeSbEhQOlxzjx7QEsFc=
 =Tavn
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull Compute Express Link (CXL) updates from Dave Jiang:
 "A tweak to the HMAT output that was acked by Rafael, a prep patch for
  CXL type2 devices support that's coming soon, refactoring of the CXL
  regblock enumeration code, and a series of patches to update the event
  records to CXL spec r3.1:

   - Move HMAT printouts to pr_debug()

   - Add CXL type2 support to cxl_dvsec_rr_decode() in preparation for
     type2 support

   - A series that updates CXL event records to spec r3.1 and related
     changes

   - Refactoring of cxl_find_regblock_instance() to count regblocks"

* tag 'cxl-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/core/regs: Refactor out functions to count regblocks of given type
  cxl/test: Update test code for event records to CXL spec rev 3.1
  cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
  cxl/events: Update DRAM Event Record to CXL spec rev 3.1
  cxl/events: Update General Media Event Record to CXL spec rev 3.1
  cxl/events: Add Component Identifier formatting for CXL spec rev 3.1
  cxl/events: Update Common Event Record to CXL spec rev 3.1
  cxl/pci: Add CXL Type 1/2 support to cxl_dvsec_rr_decode()
  ACPI/HMAT: Move HMAT messages to pr_debug()
2025-01-29 11:23:22 -08:00
Linus Torvalds
2ab002c755 Driver core and debugfs updates
Here is the big set of driver core and debugfs updates for 6.14-rc1.
 It's coming late in the merge cycle as there are a number of merge
 conflicts with your tree now, and I wanted to make sure they were
 working properly.  To resolve them, look in linux-next, and I will send
 the "fixup" patch as a response to the pull request.
 
 Included in here is a bunch of driver core, PCI, OF, and platform rust
 bindings (all acked by the different subsystem maintainers), hence the
 merge conflict with the rust tree, and some driver core api updates to
 mark things as const, which will also require some fixups due to new
 stuff coming in through other trees in this merge window.
 
 There are also a bunch of debugfs updates from Al, and there is at least
 one user that does have a regression with these, but Al is working on
 tracking down the fix for it.  In my use (and everyone else's linux-next
 use), it does not seem like a big issue at the moment.
 
 Here's a short list of the things in here:
   - driver core bindings for PCI, platform, OF, and some i/o functions.
     We are almost at the "write a real driver in rust" stage now,
     depending on what you want to do.
   - misc device rust bindings and a sample driver to show how to use
     them
   - debugfs cleanups in the fs as well as the users of the fs api for
     places where drivers got it wrong or were unnecessarily doing things
     in complex ways.
   - driver core const work, making more of the api take const * for
     different parameters to make the rust bindings easier overall.
   - other small fixes and updates
 
 All of these have been in linux-next with all of the aforementioned
 merge conflicts, and the one debugfs issue, which looks to be resolved
 "soon".
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ5koPA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymFHACfT5acDKf2Bov2Lc/5u3vBW/R6ChsAnj+LmgVI
 hcDSPodj4szR40RRnzBd
 =u5Ey
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core and debugfs updates from Greg KH:
 "Here is the big set of driver core and debugfs updates for 6.14-rc1.

  Included in here is a bunch of driver core, PCI, OF, and platform rust
  bindings (all acked by the different subsystem maintainers), hence the
  merge conflict with the rust tree, and some driver core api updates to
  mark things as const, which will also require some fixups due to new
  stuff coming in through other trees in this merge window.

  There are also a bunch of debugfs updates from Al, and there is at
  least one user that does have a regression with these, but Al is
  working on tracking down the fix for it. In my use (and everyone
  else's linux-next use), it does not seem like a big issue at the
  moment.

  Here's a short list of the things in here:

   - driver core rust bindings for PCI, platform, OF, and some i/o
     functions.

     We are almost at the "write a real driver in rust" stage now,
     depending on what you want to do.

   - misc device rust bindings and a sample driver to show how to use
     them

   - debugfs cleanups in the fs as well as the users of the fs api for
     places where drivers got it wrong or were unnecessarily doing
     things in complex ways.

   - driver core const work, making more of the api take const * for
     different parameters to make the rust bindings easier overall.

   - other small fixes and updates

  All of these have been in linux-next with all of the aforementioned
  merge conflicts, and the one debugfs issue, which looks to be resolved
  "soon""

* tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits)
  rust: device: Use as_char_ptr() to avoid explicit cast
  rust: device: Replace CString with CStr in property_present()
  devcoredump: Constify 'struct bin_attribute'
  devcoredump: Define 'struct bin_attribute' through macro
  rust: device: Add property_present()
  saner replacement for debugfs_rename()
  orangefs-debugfs: don't mess with ->d_name
  octeontx2: don't mess with ->d_parent or ->d_parent->d_name
  arm_scmi: don't mess with ->d_parent->d_name
  slub: don't mess with ->d_name
  sof-client-ipc-flood-test: don't mess with ->d_name
  qat: don't mess with ->d_name
  xhci: don't mess with ->d_iname
  mtu3: don't mess wiht ->d_iname
  greybus/camera - stop messing with ->d_iname
  mediatek: stop messing with ->d_iname
  netdevsim: don't embed file_operations into your structs
  b43legacy: make use of debugfs_get_aux()
  b43: stop embedding struct file_operations into their objects
  carl9170: stop embedding file_operations into their objects
  ...
2025-01-28 12:25:12 -08:00
Linus Torvalds
e2ee2e9b15 KVM/arm64 updates for 6.14
* New features:
 
   - Support for non-protected guest in protected mode, achieving near
     feature parity with the non-protected mode
 
   - Support for the EL2 timers as part of the ongoing NV support
 
   - Allow control of hardware tracing for nVHE/hVHE
 
 * Improvements, fixes and cleanups:
 
   - Massive cleanup of the debug infrastructure, making it a bit less
     awkward and definitely easier to maintain. This should pave the
     way for further optimisations
 
   - Complete rewrite of pKVM's fixed-feature infrastructure, aligning
     it with the rest of KVM and making the code easier to follow
 
   - Large simplification of pKVM's memory protection infrastructure
 
   - Better handling of RES0/RES1 fields for memory-backed system
     registers
 
   - Add a workaround for Qualcomm's Snapdragon X CPUs, which suffer
     from a pretty nasty timer bug
 
   - Small collection of cleanups and low-impact fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmeYqJcQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNLUhCACxUTMVQXhfW3qbh0UQxPd7XXvjI+Hm7SPS
 wDuVTle4jrFVGHxuZqtgWLmx8hD7bqO965qmFgbevKlwsRY33onH2nbH4i4AcwbA
 jcdM4yMHZI4+Qmnb4G5ZJ89IwjAhHPZTBOV5KRhyHQ/qtRciHHtOgJde7II9fd68
 uIESg4SSSyUzI47YSEHmGVmiBIhdQhq2qust0m6NPFalEGYstPbpluPQ6R1CsDqK
 v14TIAW7t0vSPucBeODxhA5gEa2JsvNi+sqA+DF/ELH2ZqpkuR7rofgMGblaXCSD
 JXa5xamRB9dI5zi8vatwfOzYlog+/gzmPqMh/9JXpiDGHxJe0vlz
 =tQ8F
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull KVM/arm64 updates from Will Deacon:
 "New features:

   - Support for non-protected guest in protected mode, achieving near
     feature parity with the non-protected mode

   - Support for the EL2 timers as part of the ongoing NV support

   - Allow control of hardware tracing for nVHE/hVHE

  Improvements, fixes and cleanups:

   - Massive cleanup of the debug infrastructure, making it a bit less
     awkward and definitely easier to maintain. This should pave the way
     for further optimisations

   - Complete rewrite of pKVM's fixed-feature infrastructure, aligning
     it with the rest of KVM and making the code easier to follow

   - Large simplification of pKVM's memory protection infrastructure

   - Better handling of RES0/RES1 fields for memory-backed system
     registers

   - Add a workaround for Qualcomm's Snapdragon X CPUs, which suffer
     from a pretty nasty timer bug

   - Small collection of cleanups and low-impact fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
  arm64/sysreg: Get rid of TRFCR_ELx SysregFields
  KVM: arm64: nv: Fix doc header layout for timers
  KVM: arm64: nv: Apply RESx settings to sysreg reset values
  KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors
  KVM: arm64: Fix selftests after sysreg field name update
  coresight: Pass guest TRFCR value to KVM
  KVM: arm64: Support trace filtering for guests
  KVM: arm64: coresight: Give TRBE enabled state to KVM
  coresight: trbe: Remove redundant disable call
  arm64/sysreg/tools: Move TRFCR definitions to sysreg
  tools: arm64: Update sysreg.h header files
  KVM: arm64: Drop pkvm_mem_transition for host/hyp donations
  KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing
  KVM: arm64: Drop pkvm_mem_transition for FF-A
  KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
  KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()
  arm64: kvm: Introduce nvhe stack size constants
  KVM: arm64: Fix nVHE stacktrace VA bits mask
  KVM: arm64: Fix FEAT_MTE in pKVM
  Documentation: Update the behaviour of "kvm-arm.mode"
  ...
2025-01-28 09:01:36 -08:00
Linus Torvalds
13845bdc86 Char/Misc/IIO driver updates for 6.14-rc1
Here is the "big" set of char/misc/iio and other smaller driver
 subsystem updates for 6.14-rc1.  Loads of different things in here this
 development cycle, highlights are:
   - ntsync "driver" to handle Windows locking types enabling Wine to
     work much better on many workloads (i.e. games).  The driver
     framework was in 6.13, but now it's enabled and fully working
     properly.  Should make many SteamOS users happy.  Even comes with
     tests!
   - Large IIO driver updates and bugfixes
   - FPGA driver updates
   - Coresight driver updates
   - MHI driver updates
   - PPS driver updatesa
   - const bin_attribute reworking for many drivers
   - binder driver updates
   - smaller driver updates and fixes
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ5fGOQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynatACeLlbkhUT544Va1eOL2TkjfcGxrZUAoJ3ymGC0
 y0N7/+fWL6aS+b4sEilv
 =TU0D
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull Char/Misc/IIO driver updates from Greg KH:
 "Here is the "big" set of char/misc/iio and other smaller driver
  subsystem updates for 6.14-rc1. Loads of different things in here this
  development cycle, highlights are:

   - ntsync "driver" to handle Windows locking types enabling Wine to
     work much better on many workloads (i.e. games). The driver
     framework was in 6.13, but now it's enabled and fully working
     properly. Should make many SteamOS users happy. Even comes with
     tests!

   - Large IIO driver updates and bugfixes

   - FPGA driver updates

   - Coresight driver updates

   - MHI driver updates

   - PPS driver updatesa

   - const bin_attribute reworking for many drivers

   - binder driver updates

   - smaller driver updates and fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits)
  ntsync: Fix reference leaks in the remaining create ioctls.
  spmi: hisi-spmi-controller: Drop duplicated OF node assignment in spmi_controller_probe()
  spmi: Set fwnode for spmi devices
  ntsync: fix a file reference leak in drivers/misc/ntsync.c
  scripts/tags.sh: Don't tag usages of DECLARE_BITMAP
  dt-bindings: interconnect: qcom,msm8998-bwmon: Add SM8750 CPU BWMONs
  dt-bindings: interconnect: OSM L3: Document sm8650 OSM L3 compatible
  dt-bindings: interconnect: qcom-bwmon: Document QCS615 bwmon compatibles
  interconnect: sm8750: Add missing const to static qcom_icc_desc
  memstick: core: fix kernel-doc notation
  intel_th: core: fix kernel-doc warnings
  binder: log transaction code on failure
  iio: dac: ad3552r-hs: clear reset status flag
  iio: dac: ad3552r-common: fix ad3541/2r ranges
  iio: chemical: bme680: Fix uninitialized variable in __bme680_read_raw()
  misc: fastrpc: Fix copy buffer page size
  misc: fastrpc: Fix registered buffer page address
  misc: fastrpc: Deregister device nodes properly in error scenarios
  nvmem: core: improve range check for nvmem_cell_write()
  nvmem: qcom-spmi-sdam: Set size in struct nvmem_config
  ...
2025-01-27 16:51:51 -08:00
Jan Stancek
9b06d5b956 selftests: net/{lib,openvswitch}: extend CFLAGS to keep options from environment
Package build environments like Fedora rpmbuild introduced hardening
options (e.g. -pie -Wl,-z,now) by passing a -spec option to CFLAGS
and LDFLAGS.

Some Makefiles currently override CFLAGS but not LDFLAGS, which leads
to a mismatch and build failure, for example:
  /usr/bin/ld: /tmp/ccd2apay.o: relocation R_X86_64_32 against
    `.rodata.str1.1' can not be used when making a PIE object; recompile with -fPIE
  /usr/bin/ld: failed to set dynamic section sizes: bad value
  collect2: error: ld returned 1 exit status
  make[1]: *** [../../lib.mk:222: tools/testing/selftests/net/lib/csum] Error 1

openvswitch/Makefile CFLAGS currently do not appear to be used, but
fix it anyway for the case when new tests are introduced in future.

Fixes: 1d0dc857b5 ("selftests: drv-net: add checksum tests")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/3d173603ee258f419d0403363765c9f9494ff79a.1737635092.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:45:27 -08:00
Jan Stancek
23b3a7c4a7 selftests: mptcp: extend CFLAGS to keep options from environment
Package build environments like Fedora rpmbuild introduced hardening
options (e.g. -pie -Wl,-z,now) by passing a -spec option to CFLAGS
and LDFLAGS.

mptcp Makefile currently overrides CFLAGS but not LDFLAGS, which leads
to a mismatch and build failure, for example:
  make[1]: *** [../../lib.mk:222: tools/testing/selftests/net/mptcp/mptcp_sockopt] Error 1
  /usr/bin/ld: /tmp/ccqyMVdb.o: relocation R_X86_64_32 against `.rodata.str1.8' can not be used when making a PIE object; recompile with -fPIE
  /usr/bin/ld: failed to set dynamic section sizes: bad value
  collect2: error: ld returned 1 exit status

Fixes: cc937dad85 ("selftests: centralize -D_GNU_SOURCE= to CFLAGS in lib.mk")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/7abc701da9df39c2d6cd15bc3cf9e6cee445cb96.1737621162.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:45:19 -08:00
Jakub Kicinski
50bf398e1c net: netdevsim: try to close UDP port harness races
syzbot discovered that we remove the debugfs files after we free
the netdev. Try to clean up the relevant dir while the device
is still around.

Reported-by: syzbot+2e5de9e3ab986b71d2bf@syzkaller.appspotmail.com
Fixes: 424be63ad8 ("netdevsim: add UDP tunnel port offload support")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250122224503.762705-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:24:34 -08:00
Mickaël Salaün
38567b972a selftests: Handle old glibc without execveat(2)
Add an execveat(2) wrapper because glibc < 2.34 does not have one.  This
fixes the check-exec tests and samples.

Cc: Günther Noack <gnoack@google.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Roberto Sassu <roberto.sassu@huawei.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Stefan Berger <stefanb@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/r/20250114205645.GA2825031@ax162
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20250115144753.311152-1-mic@digikod.net
Signed-off-by: Kees Cook <kees@kernel.org>
2025-01-27 11:37:18 -08:00
Linus Torvalds
9c5968db9e The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
 
 - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the
   page allocator so we end up with the ability to allocate and free
   zero-refcount pages.  So that callers (ie, slab) can avoid a refcount
   inc & dec.
 
 - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use
   large folios other than PMD-sized ones.
 
 - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and
   fixes for this small built-in kernel selftest.
 
 - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of
   the mapletree code.
 
 - "mm: fix format issues and param types" from Keren Sun implements a
   few minor code cleanups.
 
 - "simplify split calculation" from Wei Yang provides a few fixes and a
   test for the mapletree code.
 
 - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes
   continues the work of moving vma-related code into the (relatively) new
   mm/vma.c.
 
 - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
   Hildenbrand cleans up and rationalizes handling of gfp flags in the page
   allocator.
 
 - "readahead: Reintroduce fix for improper RA window sizing" from Jan
   Kara is a second attempt at fixing a readahead window sizing issue.  It
   should reduce the amount of unnecessary reading.
 
 - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
   addresses an issue where "huge" amounts of pte pagetables are
   accumulated
   (https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/).
   Qi's series addresses this windup by synchronously freeing PTE memory
   within the context of madvise(MADV_DONTNEED).
 
 - "selftest/mm: Remove warnings found by adding compiler flags" from
   Muhammad Usama Anjum fixes some build warnings in the selftests code
   when optional compiler warnings are enabled.
 
 - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David
   Hildenbrand tightens the allocator's observance of __GFP_HARDWALL.
 
 - "pkeys kselftests improvements" from Kevin Brodsky implements various
   fixes and cleanups in the MM selftests code, mainly pertaining to the
   pkeys tests.
 
 - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
   estimate application working set size.
 
 - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
   provides some cleanups to memcg's hugetlb charging logic.
 
 - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
   removes the global swap cgroup lock.  A speedup of 10% for a tmpfs-based
   kernel build was demonstrated.
 
 - "zram: split page type read/write handling" from Sergey Senozhatsky
   has several fixes and cleaups for zram in the area of zram_write_page().
   A watchdog softlockup warning was eliminated.
 
 - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky
   cleans up the pagetable destructor implementations.  A rare
   use-after-free race is fixed.
 
 - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
   simplifies and cleans up the debugging code in the VMA merging logic.
 
 - "Account page tables at all levels" from Kevin Brodsky cleans up and
   regularizes the pagetable ctor/dtor handling.  This results in
   improvements in accounting accuracy.
 
 - "mm/damon: replace most damon_callback usages in sysfs with new core
   functions" from SeongJae Park cleans up and generalizes DAMON's sysfs
   file interface logic.
 
 - "mm/damon: enable page level properties based monitoring" from
   SeongJae Park increases the amount of information which is presented in
   response to DAMOS actions.
 
 - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes
   DAMON's long-deprecated debugfs interfaces.  Thus the migration to sysfs
   is completed.
 
 - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter
   Xu cleans up and generalizes the hugetlb reservation accounting.
 
 - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
   removes a never-used feature of the alloc_pages_bulk() interface.
 
 - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
   extends DAMOS filters to support not only exclusion (rejecting), but
   also inclusion (allowing) behavior.
 
 - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
   "introduces a new memory descriptor for zswap.zpool that currently
   overlaps with struct page for now.  This is part of the effort to reduce
   the size of struct page and to enable dynamic allocation of memory
   descriptors."
 
 - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and
   simplifies the swap allocator locking.  A speedup of 400% was
   demonstrated for one workload.  As was a 35% reduction for kernel build
   time with swap-on-zram.
 
 - "mm: update mips to use do_mmap(), make mmap_region() internal" from
   Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
   mmap_region() can be made MM-internal.
 
 - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU
   regressions and otherwise improves MGLRU performance.
 
 - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park
   updates DAMON documentation.
 
 - "Cleanup for memfd_create()" from Isaac Manjarres does that thing.
 
 - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand
   provides various cleanups in the areas of hugetlb folios, THP folios and
   migration.
 
 - "Uncached buffered IO" from Jens Axboe implements the new
   RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache
   reading and writing.  To permite userspace to address issues with
   massive buildup of useless pagecache when reading/writing fast devices.
 
 - "selftests/mm: virtual_address_range: Reduce memory" from Thomas
   Weißschuh fixes and optimizes some of the MM selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5a+cwAKCRDdBJ7gKXxA
 jtoyAP9R58oaOKPJuTizEKKXvh/RpMyD6sYcz/uPpnf+cKTZxQEAqfVznfWlw/Lz
 uC3KRZYhmd5YrxU4o+qjbzp9XWX/xAE=
 =Ib2s
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "The various patchsets are summarized below. Plus of course many
  indivudual patches which are described in their changelogs.

   - "Allocate and free frozen pages" from Matthew Wilcox reorganizes
     the page allocator so we end up with the ability to allocate and
     free zero-refcount pages. So that callers (ie, slab) can avoid a
     refcount inc & dec

   - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
     use large folios other than PMD-sized ones

   - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
     and fixes for this small built-in kernel selftest

   - "mas_anode_descend() related cleanup" from Wei Yang tidies up part
     of the mapletree code

   - "mm: fix format issues and param types" from Keren Sun implements a
     few minor code cleanups

   - "simplify split calculation" from Wei Yang provides a few fixes and
     a test for the mapletree code

   - "mm/vma: make more mmap logic userland testable" from Lorenzo
     Stoakes continues the work of moving vma-related code into the
     (relatively) new mm/vma.c

   - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
     Hildenbrand cleans up and rationalizes handling of gfp flags in the
     page allocator

   - "readahead: Reintroduce fix for improper RA window sizing" from Jan
     Kara is a second attempt at fixing a readahead window sizing issue.
     It should reduce the amount of unnecessary reading

   - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
     addresses an issue where "huge" amounts of pte pagetables are
     accumulated:

       https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/

     Qi's series addresses this windup by synchronously freeing PTE
     memory within the context of madvise(MADV_DONTNEED)

   - "selftest/mm: Remove warnings found by adding compiler flags" from
     Muhammad Usama Anjum fixes some build warnings in the selftests
     code when optional compiler warnings are enabled

   - "mm: don't use __GFP_HARDWALL when migrating remote pages" from
     David Hildenbrand tightens the allocator's observance of
     __GFP_HARDWALL

   - "pkeys kselftests improvements" from Kevin Brodsky implements
     various fixes and cleanups in the MM selftests code, mainly
     pertaining to the pkeys tests

   - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
     estimate application working set size

   - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
     provides some cleanups to memcg's hugetlb charging logic

   - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
     removes the global swap cgroup lock. A speedup of 10% for a
     tmpfs-based kernel build was demonstrated

   - "zram: split page type read/write handling" from Sergey Senozhatsky
     has several fixes and cleaups for zram in the area of
     zram_write_page(). A watchdog softlockup warning was eliminated

   - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
     Brodsky cleans up the pagetable destructor implementations. A rare
     use-after-free race is fixed

   - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
     simplifies and cleans up the debugging code in the VMA merging
     logic

   - "Account page tables at all levels" from Kevin Brodsky cleans up
     and regularizes the pagetable ctor/dtor handling. This results in
     improvements in accounting accuracy

   - "mm/damon: replace most damon_callback usages in sysfs with new
     core functions" from SeongJae Park cleans up and generalizes
     DAMON's sysfs file interface logic

   - "mm/damon: enable page level properties based monitoring" from
     SeongJae Park increases the amount of information which is
     presented in response to DAMOS actions

   - "mm/damon: remove DAMON debugfs interface" from SeongJae Park
     removes DAMON's long-deprecated debugfs interfaces. Thus the
     migration to sysfs is completed

   - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
     Peter Xu cleans up and generalizes the hugetlb reservation
     accounting

   - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
     removes a never-used feature of the alloc_pages_bulk() interface

   - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
     extends DAMOS filters to support not only exclusion (rejecting),
     but also inclusion (allowing) behavior

   - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
     introduces a new memory descriptor for zswap.zpool that currently
     overlaps with struct page for now. This is part of the effort to
     reduce the size of struct page and to enable dynamic allocation of
     memory descriptors

   - "mm, swap: rework of swap allocator locks" from Kairui Song redoes
     and simplifies the swap allocator locking. A speedup of 400% was
     demonstrated for one workload. As was a 35% reduction for kernel
     build time with swap-on-zram

   - "mm: update mips to use do_mmap(), make mmap_region() internal"
     from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
     mmap_region() can be made MM-internal

   - "mm/mglru: performance optimizations" from Yu Zhao fixes a few
     MGLRU regressions and otherwise improves MGLRU performance

   - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
     Park updates DAMON documentation

   - "Cleanup for memfd_create()" from Isaac Manjarres does that thing

   - "mm: hugetlb+THP folio and migration cleanups" from David
     Hildenbrand provides various cleanups in the areas of hugetlb
     folios, THP folios and migration

   - "Uncached buffered IO" from Jens Axboe implements the new
     RWF_DONTCACHE flag which provides synchronous dropbehind for
     pagecache reading and writing. To permite userspace to address
     issues with massive buildup of useless pagecache when
     reading/writing fast devices

   - "selftests/mm: virtual_address_range: Reduce memory" from Thomas
     Weißschuh fixes and optimizes some of the MM selftests"

* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  mm/compaction: fix UBSAN shift-out-of-bounds warning
  s390/mm: add missing ctor/dtor on page table upgrade
  kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
  tools: add VM_WARN_ON_VMG definition
  mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
  seqlock: add missing parameter documentation for raw_seqcount_try_begin()
  mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
  mm/page_alloc: remove the incorrect and misleading comment
  zram: remove zcomp_stream_put() from write_incompressible_page()
  mm: separate move/undo parts from migrate_pages_batch()
  mm/kfence: use str_write_read() helper in get_access_type()
  selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
  kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
  selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
  selftests/mm: vm_util: split up /proc/self/smaps parsing
  selftests/mm: virtual_address_range: unmap chunks after validation
  selftests/mm: virtual_address_range: mmap() without PROT_WRITE
  selftests/memfd/memfd_test: fix possible NULL pointer dereference
  mm: add FGP_DONTCACHE folio creation flag
  mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
  ...
2025-01-26 18:36:23 -08:00
Linus Torvalds
c159dfbdd4 Mainly individually changelogged singleton patches. The patch series in
this pull are:
 
 - "lib min_heap: Improve min_heap safety, testing, and documentation"
   from Kuan-Wei Chiu provides various tightenings to the min_heap library
   code.
 
 - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some
   cleanup and Rust preparation in the xarray library code.
 
 - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes
   pathnames in some code comments.
 
 - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the
   new secs_to_jiffies() in various places where that is appropriate.
 
 - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
   switches two filesystems to the new mount API.
 
 - "Convert ocfs2 to use folios" from Matthew Wilcox does that.
 
 - "Remove get_task_comm() and print task comm directly" from Yafang Shao
   removes now-unneeded calls to get_task_comm() in various places.
 
 - "squashfs: reduce memory usage and update docs" from Phillip Lougher
   implements some memory savings in squashfs and performs some
   maintainability work.
 
 - "lib: clarify comparison function requirements" from Kuan-Wei Chiu
   tightens the sort code's behaviour and adds some maintenance work.
 
 - "nilfs2: protect busy buffer heads from being force-cleared" from
   Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a
   corrupted image.
 
 - "nilfs2: fix kernel-doc comments for function return values" from
   Ryusuke Konishi fixes some nilfs kerneldoc.
 
 - "nilfs2: fix issues with rename operations" from Ryusuke Konishi
   addresses some nilfs BUG_ONs which syzbot was able to trigger.
 
 - "minmax.h: Cleanups and minor optimisations" from David Laight
   does some maintenance work on the min/max library code.
 
 - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work
   on the xarray library code.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5SP5QAKCRDdBJ7gKXxA
 jqN7AQChvwXGG43n4d5SDiA/rH7ddvowQcDqhC9cAMJ1ReR7qwEA8/LIWDE4PdMX
 mJnaZ1/ibpEpearrChCViApQtcyEGQI=
 =ti4E
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Mainly individually changelogged singleton patches. The patch series
  in this pull are:

   - "lib min_heap: Improve min_heap safety, testing, and documentation"
     from Kuan-Wei Chiu provides various tightenings to the min_heap
     library code

   - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms
     some cleanup and Rust preparation in the xarray library code

   - "Update reference to include/asm-<arch>" from Geert Uytterhoeven
     fixes pathnames in some code comments

   - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses
     the new secs_to_jiffies() in various places where that is
     appropriate

   - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
     switches two filesystems to the new mount API

   - "Convert ocfs2 to use folios" from Matthew Wilcox does that

   - "Remove get_task_comm() and print task comm directly" from Yafang
     Shao removes now-unneeded calls to get_task_comm() in various
     places

   - "squashfs: reduce memory usage and update docs" from Phillip
     Lougher implements some memory savings in squashfs and performs
     some maintainability work

   - "lib: clarify comparison function requirements" from Kuan-Wei Chiu
     tightens the sort code's behaviour and adds some maintenance work

   - "nilfs2: protect busy buffer heads from being force-cleared" from
     Ryusuke Konishi fixes an issues in nlifs when the fs is presented
     with a corrupted image

   - "nilfs2: fix kernel-doc comments for function return values" from
     Ryusuke Konishi fixes some nilfs kerneldoc

   - "nilfs2: fix issues with rename operations" from Ryusuke Konishi
     addresses some nilfs BUG_ONs which syzbot was able to trigger

   - "minmax.h: Cleanups and minor optimisations" from David Laight does
     some maintenance work on the min/max library code

   - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance
     work on the xarray library code"

* tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits)
  ocfs2: use str_yes_no() and str_no_yes() helper functions
  include/linux/lz4.h: add some missing macros
  Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
  Xarray: remove repeat check in xas_squash_marks()
  Xarray: distinguish large entries correctly in xas_split_alloc()
  Xarray: move forward index correctly in xas_pause()
  Xarray: do not return sibling entries from xas_find_marked()
  ipc/util.c: complete the kernel-doc function descriptions
  gcov: clang: use correct function param names
  latencytop: use correct kernel-doc format for func params
  minmax.h: remove some #defines that are only expanded once
  minmax.h: simplify the variants of clamp()
  minmax.h: move all the clamp() definitions after the min/max() ones
  minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
  minmax.h: reduce the #define expansion of min(), max() and clamp()
  minmax.h: update some comments
  minmax.h: add whitespace around operators and after commas
  nilfs2: do not update mtime of renamed directory that is not moved
  nilfs2: handle errors that nilfs_prepare_chunk() may return
  CREDITS: fix spelling mistake
  ...
2025-01-26 17:50:53 -08:00
Suren Baghdasaryan
cf929a2863 tools: add VM_WARN_ON_VMG definition
vma tests compilation yields the following error:

vma.c:732:9: error: implicit declaration of function ‘VM_WARN_ON_VMG’

Fix it by adding missing VM_WARN_ON_VMG() definition.

Link: https://lkml.kernel.org/r/20250116181538.759469-1-surenb@google.com
Fixes: e3a7ae85f87c ("mm/debug: prefer VM_WARN_ON_VMG() to report VMG debug warnings")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:46 -08:00
liuye
7882d8fc8f selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
Release memory before exception branch returns to prevent memory leaks

Checking tools/testing/selftests/mm/mkdirty.c ...
tools/testing/selftests/mm/mkdirty.c:283:3: error: Memory leak: src [memleak]
  return;
  ^

Link: https://lkml.kernel.org/r/20250114023838.48589-1-liuye@kylinos.cn
Signed-off-by: liuye <liuye@kylinos.cn>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:45 -08:00
Thomas Weißschuh
3bd6137220 selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
The virtual_address_range selftest reads from the start of each mapping
listed in /proc/self/maps.  However not all mappings are valid to be
arbitrarily accessed.

For example the vvar data used for virtual clocks on x86 [vvar_vclock] can
only be accessed if 1) the kernel configuration enables virtual clocks and
2) the hypervisor provided the data for it.  Only the VDSO itself has the
necessary information to know this.  Since commit e93d2521b2 ("x86/vdso:
Split virtual clock pages into dedicated mapping") the virtual clock data
was split out into its own mapping, leading to EFAULT from read() during
the validation.

Check for the VM_IO flag as a proxy.  It is present for the VVAR mappings
and MMIO ranges can be dangerous to access arbitrarily.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-4-6fd7269934a5@linutronix.de
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com
Fixes: e93d2521b2 ("x86/vdso: Split virtual clock pages into dedicated mapping")
Fixes: 0104096498 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Suggested-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/lkml/e97c2a5d-c815-4936-a767-ac42a3220a90@redhat.com/
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:44 -08:00
Thomas Weißschuh
3c479b5dc6 selftests/mm: vm_util: split up /proc/self/smaps parsing
Upcoming changes want to reuse the /proc/self/smaps parsing logic to parse
the VmFlags field.

As that works differently from the currently parsed HugePage counters,
split up the logic so common functionality can be shared.

While reworking this code, also use the correct sscanf placeholder for the
"uint64_t thp" variable.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-3-6fd7269934a5@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:44 -08:00
Thomas Weißschuh
b2a79f6213 selftests/mm: virtual_address_range: unmap chunks after validation
For each accessed chunk a PTE is created.  More than 1GiB of PTEs is used
in this way.  Remove each PTE after validating a chunk to reduce peak
memory usage.

It is important to only unmap memory that previously mmap()ed, as
unmapping other mappings like the stack, heap or executable mappings will
crash the process.

The mappings read from /proc/self/maps and the return values from mmap()
don't allow a simple correlation due to merging and no guaranteed order. 
To correlate the pointers and mappings use prctl(PR_SET_VMA_ANON_NAME). 
While it introduces a test dependency, other alternatives would introduce
runtime or development overhead.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-2-6fd7269934a5@linutronix.de
Fixes: 0104096498 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:44 -08:00
Thomas Weißschuh
a005145b9c selftests/mm: virtual_address_range: mmap() without PROT_WRITE
Patch series "selftests/mm: virtual_address_range: Reduce memory", v4.

The selftest started failing since commit e93d2521b2 ("x86/vdso: Split
virtual clock pages into dedicated mapping") was merged.  While debugging
I stumbled upon some memory usage optimizations.

With these test now runs on a VM with only 60MiB of memory.


This patch (of 4):

When mapping a larger chunk than physical memory is available with
PROT_WRITE and overcommit is disabled, the mapping will fail.  This will
prevent the test from running on systems with less then ~1GiB of memory
and triggering an inscrutinable test failure.  As the mappings are never
written to anyways, the flag can be removed.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-0-6fd7269934a5@linutronix.de
Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-1-6fd7269934a5@linutronix.de
Fixes: 4e5ce33ceb ("selftests/vm: add a test for virtual address range mapping")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Dev Jain <dev.jain@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:44 -08:00
liuye
73519ded99 selftests/memfd/memfd_test: fix possible NULL pointer dereference
If `name' is NULL, a NULL pointer may be accessed in printf.

Link: https://lkml.kernel.org/r/20250114032115.58638-1-liuye@kylinos.cn
Signed-off-by: liuye <liuye@kylinos.cn>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: "Isaac J. Manjarres" <isaacmanjarres@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Saurav Shah <sauravshah.31@gmail.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:44 -08:00
Hao Ge
bf069012df selftests/mm/cow: modify the incorrect checking parameters
In run_with_memfd_hugetlb(), some error handle have passed incorrect
parameters.  It should be "smem", but it was mistakenly written as "mem".

Let's fix it.

[gehao@kylinos.cn: fix other errant sites, per Anshuman]
  Link: https://lkml.kernel.org/r/20250113050908.93638-1-hao.ge@linux.dev
Link: https://lkml.kernel.org/r/20250113032858.63670-1-hao.ge@linux.dev
Fixes: f8664f3c4a ("selftests/vm: cow: basic COW tests for non-anonymous pages")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:41 -08:00
Zi Yan
dfe61db4a1 selftests/mm: add tests for splitting pmd THPs to all lower orders
Kernel already supports splitting a folio to any lower order. Test it.

[ziy@nvidia.com: no need to test splitting to order-1]
 Link: https://lkml.kernel.org/r/DDA202EA-4664-4F50-A7FD-B00CBB7A624B@nvidia.com
Link: https://lkml.kernel.org/r/20250110235028.96824-2-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Alexander Zhu <alexlzhu@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:40 -08:00
Zi Yan
136c5b40e0 selftests/mm: use selftests framework to print test result
Otherwise the number of tests does not match the reality.

Link: https://lkml.kernel.org/r/20250110235028.96824-1-ziy@nvidia.com
Fixes: 391e869711 ("mm: selftest to verify zero-filled pages are mapped to zeropage")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Alexander Zhu <alexlzhu@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:40 -08:00
Lorenzo Stoakes
f8d4a6cabb mm: make mmap_region() internal
Now that we have removed the one user of mmap_region() outside of mm, make
it internal and add it to vma.c so it can be userland tested.

This ensures that all external memory mappings are performed using the
appropriate interfaces and allows us to modify memory mapping logic as we
see fit.

Additionally expand test stubs to allow for the mmap_region() code to
compile and be userland testable.

Link: https://lkml.kernel.org/r/de5a3c574d35c26237edf20a1d8652d7305709c9.1735819274.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:38 -08:00
Ryan Roberts
b2466bb3b4 selftests/mm: introduce uffd-wp-mremap regression test
Introduce a test that registers a range of memory for
UFFDIO_WRITEPROTECT_MODE_WP without UFFD_FEATURE_EVENT_REMAP.  First check
that the uffd-wp bit is set for every PTE in the range.  Then mremap() the
range to a new location and check that the uffd-wp bit is clear for every
PTE in the range.

Run the test for small folios, all supported THP sizes and all supported
hugetlb sizes, and for swapped out memory, shared and private.

There was previously a bug in the kernel where the uffd-wp bits remained
set in all PTEs for this case, after fixing the kernel, the tests all
pass.

Link: https://lkml.kernel.org/r/20250107144755.1871363-3-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:31 -08:00
SeongJae Park
d8a142058f kunit: configs: remove configs for DAMON debugfs interface tests
It's time to remove DAMON debugfs interface, which has deprecated long
before in February 2023.  Read the cover letter of this patch series for
more details.

Remove kernel configs for running DAMON debugfs interface kunit tests from
the kunit all_tests configuration, to prevent unnecessary noises from
tests.

Link: https://lkml.kernel.org/r/20250106191941.107070-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yanteng Si <si.yanteng@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:29 -08:00
SeongJae Park
859de14931 selftests/damon: remove tests for DAMON debugfs interface
It's time to remove DAMON debugfs interface, which has deprecated long
before in February 2023.  Read the cover letter of this patch series for
more details.

Remove selftests for the interface, to prevent causing unnecessary test
failures.

Link: https://lkml.kernel.org/r/20250106191941.107070-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yanteng Si <si.yanteng@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:29 -08:00
SeongJae Park
ce1d750282 selftests/damon/config: remove configs for DAMON debugfs interface selftests
It's time to remove DAMON debugfs interface, which has deprecated long
before in February 2023.  Read the cover letter of this patch series for
more details.

Remove configs for selftests of it from DAMON selftests config file, to
prevent unnecessary noises from the tests.

[1] https://lore.kernel.org/20230209192009.7885-1-sj@kernel.org

Link: https://lkml.kernel.org/r/20250106191941.107070-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yanteng Si <si.yanteng@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:29 -08:00
Donet Tom
901083d8f5 selftests/mm: add new test cases to the migration test
Added three new test cases to the migration tests:

1. Shared anon THP migration test
This test will mmap shared anon memory, madvise it to
MADV_HUGEPAGE, then do migration entry testing. One thread
will move pages back and forth between nodes whilst other
threads try and access them.

2. Private anon hugetlb migration test
This test will mmap private anon hugetlb memory and then
do the migration entry testing.

3. Shared anon hugetlb migration test
This test will mmap shared anon hugetlb memory and then
do the migration entry testing.

Test results
============
 # ./tools/testing/selftests/mm/migration
 TAP version 13
 1..6
 # Starting 6 tests from 1 test cases.
 #  RUN           migration.private_anon ...
 #            OK  migration.private_anon
 ok 1 migration.private_anon
 #  RUN           migration.shared_anon ...
 #            OK  migration.shared_anon
 ok 2 migration.shared_anon
 #  RUN           migration.private_anon_thp ...
 #            OK  migration.private_anon_thp
 ok 3 migration.private_anon_thp
 #  RUN           migration.shared_anon_thp ...
 #            OK  migration.shared_anon_thp
 ok 4 migration.shared_anon_thp
 #  RUN           migration.private_anon_htlb ...
 #            OK  migration.private_anon_htlb
 ok 5 migration.private_anon_htlb
 #  RUN           migration.shared_anon_htlb ...
 #            OK  migration.shared_anon_htlb
 ok 6 migration.shared_anon_htlb
 # PASSED: 6 / 6 tests passed.
 # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
 #

Link: https://lkml.kernel.org/r/20241219102720.4487-1-donettom@linux.ibm.com
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:21 -08:00
Lorenzo Stoakes
7e8c8fd348 tools: testing: add simple __mmap_region() userland test
Introduce demonstrative, basic, __mmap_region() test upon which we can
base further work upon moving forwards.

This simply asserts that mappings can be made and merges occur as
expected.

As part of this change, fix the security_vm_enough_memory_mm() stub which
was previously incorrectly implemented.

Link: https://lkml.kernel.org/r/20241213162409.41498-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:18 -08:00
Linus Torvalds
647d69605c pci-v6.14-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmeTr8wUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxrMw//TJXH+U6x5LhYvBPD/KZ20ecGHqaA
 eGXrbHAasYbU1CfW7HM0onR8NffOIGoYvQrtefjQAln0w6rTvyFO0xJKLP15vMfN
 hnj+y1WWtKwAkSpu10Cl9nTj8uYRNNSQeoy5kS+1diwuXdby/DlgQONO2APSe9zd
 KMPXJcqSfDJlM5zHrcqqtlxauO9KHInLCc/iutd85AKjvcjOoNHNeZE0pTC0C3gE
 sXYHDqJiS3zdEG6X6mWFo3OzI/Q/7NGlHJ2j0CQaObsgQ9yA7eWkez25ifwZcugc
 TPtjm8DhaDo9/zx0NV9c2dPauHRC6NYUjAflMPK7Aye/41BE1Ag5Ka+tMDgC2i/N
 TbfBxSeArhjnjY+eZwRhrJNNC58TtHTUs69TO7Dbmuwr7cp99MIEDAYI5V6LFAdk
 plKqn1h8FztW5QKRPCgmzy6KTE+WPytiGAGAQFxzIGYkV/QqyvFaVs8FIyOJUIFM
 aDSa6Xy5WLGxmPZ9hPapzEm4ws/HTRpFjNgi/d4rRG5RWMwAxZZa44s9eldhN1D/
 ZwEmF2rJ+U8S7Q+mXPHDlwcsHe5APCbiaTEp4X+e3LNe0i9oxhhaUWG6LrDDmTlQ
 tU5j5daHiBa0nTDL1lfaayJlYX/oJ+IYQrIYzGbnivZv4ZVdPnuWSsOsMOEiLhEt
 4QqCoanqf0mCn2A=
 =O62O
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Batch sizing of multiple BARs while memory decoding is disabled
     instead of disabling/enabling decoding for each BAR individually;
     this optimizes virtualized environments where toggling decoding
     enable is expensive (Alex Williamson)

   - Add host bridge .enable_device() and .disable_device() hooks for
     bridges that need to configure things like Requester ID to StreamID
     mapping when enabling devices (Frank Li)

   - Extend struct pci_ecam_ops with .enable_device() and
     .disable_device() hooks so drivers that use pci_host_common_probe()
     instead of their own .probe() have a way to set the
     .enable_device() callbacks (Marc Zyngier)

   - Drop 'No bus range found' message so we don't complain when DTs
     don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas)

   - Rename the drivers/pci/of_property.c struct of_pci_range to
     of_pci_range_entry to avoid confusion with the global of_pci_range
     in include/linux/of_address.h (Bjorn Helgaas)

  Driver binding:

   - Update resource request API documentation to encourage callers to
     supply a driver name when requesting resources (Philipp Stanner)

   - Export pci_intx_unmanaged() and pcim_intx() (always managed) so
     callers of pci_intx() (which is sometimes managed) can explicitly
     choose the one they need (Philipp Stanner)

   - Convert drivers from pci_intx() to always-managed pcim_intx() or
     never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix,
     pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna,
     ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner)

   - Remove pci_intx_unmanaged() since pci_intx() is now always
     unmanaged and pcim_intx() is always managed (Philipp Stanner)

  Error handling:

   - Unexport pcie_read_tlp_log() to encourage drivers to use PCI core
     logging rather than building their own (Ilpo Järvinen)

   - Move TLP Log handling to its own file (Ilpo Järvinen)

   - Store number of supported End-End TLP Prefixes always so we can
     read the correct number of DWORDs from the TLP Prefix Log (Ilpo
     Järvinen)

   - Read TLP Prefixes in addition to the Header Log in
     pcie_read_tlp_log() (Ilpo Järvinen)

   - Add pcie_print_tlp_log() to consolidate printing of TLP Header and
     Prefix Log (Ilpo Järvinen)

   - Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor
     BIOSes that don't configure it correctly (Takashi Iwai)

  ASPM:

   - Save parent L1 PM Substates config so when we restore it along with
     an endpoint's config, the parent info isn't junk (Jian-Hong Pan)

  Power management:

   - Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because
     the system can't wake up from suspend (Werner Sembach)

  Endpoint framework:

   - Destroy the EPC device in devm_pci_epc_destroy(), which previously
     didn't call devres_release() (Zijun Hu)

   - Finish virtual EP removal in pci_epf_remove_vepf(), which
     previously caused a subsequent pci_epf_add_vepf() to fail with
     -EBUSY (Zijun Hu)

   - Write BAR_MASK before iATU registers in pci_epc_set_bar() so we
     don't depend on the BAR_MASK reset value being larger than the
     requested BAR size (Niklas Cassel)

   - Prevent changing BAR size/flags in pci_epc_set_bar() to prevent
     reads from bypassing the iATU if we reduced the BAR size (Niklas
     Cassel)

   - Verify address alignment when programming iATU so we don't attempt
     to write bits that are read-only because of the BAR size, which
     could lead to directing accesses to the wrong address (Niklas
     Cassel)

   - Implement artpec6 pci_epc_features so we can rely on all drivers
     supporting it so we can use it in EPC core code (Niklas Cassel)

   - Check for BARs of fixed size to prevent endpoint drivers from
     trying to change their size (Niklas Cassel)

   - Verify that requested BAR size is a power of two when endpoint
     driver sets the BAR (Niklas Cassel)

  Endpoint framework tests:

   - Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing
     dma_chan_rx (Mohamed Khalfella)

   - Correct the DMA MEMCPY test so it doesn't fail if the Endpoint
     supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam)

   - Add pci-epf-test and pci_endpoint_test support for capabilities
     (Niklas Cassel)

   - Add Endpoint test for consecutive BARs (Niklas Cassel)

   - Remove redundant comparison from Endpoint BAR test because a > 1MB
     BAR can always be exactly covered by iterating with a 1MB buffer
     (Hans Zhang)

   - Move and convert PCI Endpoint tests from tools/pci to Kselftests
     (Manivannan Sadhasivam)

  Apple PCIe controller driver:

   - Convert StreamID mapping configuration from a bus notifier to the
     .enable_device() and .disable_device() callbacks (Marc Zyngier)

  Freescale i.MX6 PCIe controller driver:

   - Add Requester ID to StreamID mapping configuration when enabling
     devices (Frank Li)

   - Use DWC core suspend/resume functions for imx6 (Frank Li)

   - Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard
     Zhu)

   - Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for
     i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank
     Li)

   - Add DT binding for optional i.MX95 Refclk and driver support to
     enable it if the platform hasn't enabled it (Richard Zhu)

   - Configure PHY based on controller being in Root Complex or Endpoint
     mode (Frank Li)

   - Rely on dbi2 and iATU base addresses from DT via
     dw_pcie_get_resources() instead of hardcoding them (Richard Zhu)

   - Deassert apps_reset in imx_pcie_deassert_core_reset() since it is
     asserted in imx_pcie_assert_core_reset() (Richard Zhu)

   - Add missing reference clock enable or disable logic for IMX6SX,
     IMX7D, IMX8MM (Richard Zhu)

   - Remove redundant imx7d_pcie_init_phy() since
     imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu)

  Freescale Layerscape PCIe controller driver:

   - Simplify by using syscon_regmap_lookup_by_phandle_args() instead
     of syscon_regmap_lookup_by_phandle() followed by
     of_property_read_u32_array() (Krzysztof Kozlowski)

  Marvell MVEBU PCIe controller driver:

   - Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen)

  MediaTek PCIe Gen3 controller driver:

   - Use clk_bulk_prepare_enable() instead of separate
     clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi)

   - Rearrange reset assert/deassert so they're both done in the
     *_power_up() callbacks (Lorenzo Bianconi)

   - Document that Airoha EN7581 requires PHY init and power-on before
     PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo
     Bianconi)

   - Move Airoha EN7581 post-reset delay from the en7581 clock .enable()
     method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi)

   - Sleep instead of delay during Airoha EN7581 power-up, since this is
     a non-atomic context (Lorenzo Bianconi)

   - Skip PERST# assertion on Airoha EN7581 during probe and
     suspend/resume to avoid a hardware defect (Lorenzo Bianconi)

   - Enable async probe to reduce system startup time (Douglas Anderson)

  Microchip PolarFlare PCIe controller driver:

   - Set up the inbound address translation based on whether the
     platform allows coherent or non-coherent DMA (Daire McNamara)

   - Update DT binding such that platforms are DMA-coherent by default
     and must specify 'dma-noncoherent' if needed (Conor Dooley)

  Mobiveil PCIe controller driver:

   - Convert mobiveil-pcie.txt to YAML and update 'interrupt-names'
     and 'reg-names' (Frank Li)

  Qualcomm PCIe controller driver:

   - Add DT SM8550 and SM8650 optional 'global' interrupt for link
     events (Neil Armstrong)

   - Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta
     Mylavarapu)

   - If 'global' IRQ is supported for detection of Link Up events, tell
     DWC core not to wait for link up (Krishna chaitanya chundru)

  Renesas R-Car PCIe controller driver:

   - Avoid passing stack buffer as resource name (King Dix)

  Rockchip PCIe controller driver:

   - Simplify clock and reset handling by using bulk interfaces (Anand
     Moon)

   - Pass typed rockchip_pcie (not void) pointer to
     rockchip_pcie_disable_clocks() (Anand Moon)

   - Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails
     (Dan Carpenter)

  Rockchip DesignWare PCIe controller driver:

   - Use dll_link_up IRQ to detect Link Up and enumerate devices so
     users don't have to manually rescan (Niklas Cassel)

   - Tell DWC core not to wait for link up since the 'sys' interrupt is
     required and detects Link Up events (Niklas Cassel)

  Synopsys DesignWare PCIe controller driver:

   - Don't wait for link up in DWC core if driver can detect Link Up
     event (Krishna chaitanya chundru)

   - Update ICC and OPP votes after Link Up events (Krishna chaitanya
     chundru)

   - Always stop link in dw_pcie_suspend_noirq(), which is required at
     least for i.MX8QM to re-establish link on resume (Richard Zhu)

   - Drop racy and unnecessary LTSSM state check before sending
     PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu)

   - Add struct of_pci_range.parent_bus_addr for devices that need their
     immediate parent bus address, not the CPU address, e.g., to program
     an internal Address Translation Unit (iATU) (Frank Li)

  TI DRA7xx PCIe controller driver:

   - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of
     syscon_regmap_lookup_by_phandle() followed by
     of_parse_phandle_with_fixed_args() or of_property_read_u32_index()
     (Krzysztof Kozlowski)

  Xilinx Versal CPM PCIe controller driver:

   - Add DT binding and driver support for Xilinx Versal CPM5
     (Thippeswamy Havalige)

  MicroSemi Switchtec management driver:

   - Add Microchip PCI100X device IDs (Rakesh Babu Saladi)

  Miscellaneous:

   - Move reset related sysfs code from pci.c to pci-sysfs.c where other
     similar code lives (Ilpo Järvinen)

   - Simplify reset_method_store() memory management by using __free()
     instead of explicit kfree() cleanup (Ilpo Järvinen)

   - Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM
     ACPI hotplug driver (Thomas Weißschuh)

   - Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong
     Zhang)

   - Correct documentation of the 'config_acs=' kernel parameter
     (Akihiko Odaki)"

* tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits)
  PCI: Batch BAR sizing operations
  dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent
  PCI: microchip: Set inbound address translation for coherent or non-coherent mode
  Documentation: Fix pci=config_acs= example
  PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
  PCI: Don't include 'pm_wakeup.h' directly
  selftests: pci_endpoint: Migrate to Kselftest framework
  selftests: Move PCI Endpoint tests from tools/pci to Kselftests
  misc: pci_endpoint_test: Fix IOCTL return value
  dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller
  dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt
  dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML
  PCI: switchtec: Add Microchip PCI100X device IDs
  misc: pci_endpoint_test: Remove redundant 'remainder' test
  misc: pci_endpoint_test: Add consecutive BAR test
  misc: pci_endpoint_test: Add support for capabilities
  PCI: endpoint: pci-epf-test: Add support for capabilities
  PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
  PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
  PCI: dwc: Simplify config resource lookup
  ...
2025-01-25 16:03:40 -08:00
Linus Torvalds
fd56e5104a OpenRISC updates for 6.14
A few updates from me and the community:
 
  * Added support for restartable sequences
  * Migration to Generic built-in DTB from Masahiro Yamada
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE2cRzVK74bBA6Je/xw7McLV5mJ+QFAmeUhJEACgkQw7McLV5m
 J+TpeA//VSJeutnK1s6u4VQbO6P0FM2JkY7yy8oAcitlG5bJ/WrRboNCgshaIRlG
 etoTwEb5llcUL2jDp7217I6Z5XDWUgQwul63PTYeoqEfwvF6kC3DEbOZ8QSfan/x
 F38kUDQWSyPKzt5aeNQZnKBr32jIZ/MfY9+XC2RHrNYhCTSoTqUcKxk8a69sx9cN
 MSxrqcfOsywakVgWsM4kPkWzQD4GI5E67BJBw0cI6ux0pqThQsiUyjtwxbIOZXUe
 /QtaBtGdqYMNEPs4RmoD2qGS3j5GtD3Nxv3adbKs0vpDXHyZj1+SFnIsBU9QBzni
 meXp4qONoIovg4Mh9VK6iZUWd6z6wegxP370bbsJGErzWS11t3p6qRYXg52DRMh8
 w8rB/rHH2n/Dx7T17ec/E+D67/9BiapZrfxaDRjZ1/aa17JYiQeuHFDHRzo4uap2
 TZ1NyPHQYLblmj0XDAsh8Me6PKI2pDL8DWZbFZ9dY7rN3lJP9G7kCdqBsM54NmBp
 M47asri2N2VRhrewWF4fNk9bbf4Ai9ERqShyg211d4fRTg1FhfpGEAKHnu9x2z1D
 isZf6JSrWhSrgQSq1dxvSjUnW7aw+NCPNqJjuKXYpCnxZePAa1O3/7HpipY58qr5
 Ki1TN6x7DuWjR2mTaiIGzT2vktdBP9qFVJYfNneEme7111nRRuA=
 =9RPl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of https://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:

 - Added support for restartable sequences (me)

 - Migration to Generic built-in DTB (Masahiro Yamada)

* tag 'for-linus' of https://github.com/openrisc/linux:
  rseq/selftests: Add support for OpenRISC
  openrisc: Add support for restartable sequences
  openrisc: Add HAVE_REGS_AND_STACK_ACCESS_API support
  openrisc: migrate to the generic rule for built-in DTB
2025-01-25 10:16:56 -08:00
Linus Torvalds
0f8e26b38d Loongarch:
* Clear LLBCTL if secondary mmu mapping changes.
 
 * Add hypercall service support for usermode VMM.
 
 x86:
 
 * Add a comment to kvm_mmu_do_page_fault() to explain why KVM performs a
   direct call to kvm_tdp_page_fault() when RETPOLINE is enabled.
 
 * Ensure that all SEV code is compiled out when disabled in Kconfig, even
   if building with less brilliant compilers.
 
 * Remove a redundant TLB flush on AMD processors when guest CR4.PGE changes.
 
 * Use str_enabled_disabled() to replace open coded strings.
 
 * Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's APICv cache
   prior to every VM-Enter.
 
 * Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities
   instead of just those where KVM needs to manage state and/or explicitly
   enable the feature in hardware.  Along the way, refactor the code to make
   it easier to add features, and to make it more self-documenting how KVM
   is handling each feature.
 
 * Rework KVM's handling of VM-Exits during event vectoring; this plugs holes
   where KVM unintentionally puts the vCPU into infinite loops in some scenarios
   (e.g. if emulation is triggered by the exit), and brings parity between VMX
   and SVM.
 
 * Add pending request and interrupt injection information to the kvm_exit and
   kvm_entry tracepoints respectively.
 
 * Fix a relatively benign flaw where KVM would end up redoing RDPKRU when
   loading guest/host PKRU, due to a refactoring of the kernel helpers that
   didn't account for KVM's pre-checking of the need to do WRPKRU.
 
 * Make the completion of hypercalls go through the complete_hypercall
   function pointer argument, no matter if the hypercall exits to
   userspace or not.  Previously, the code assumed that KVM_HC_MAP_GPA_RANGE
   specifically went to userspace, and all the others did not; the new code
   need not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at
   all whether there was an exit to userspace or not.
 
 * As part of enabling TDX virtual machines, support support separation of
   private/shared EPT into separate roots.  When TDX will be enabled, operations
   on private pages will need to go through the privileged TDX Module via SEAMCALLs;
   as a result, they are limited and relatively slow compared to reading a PTE.
   The patches included in 6.14 allow KVM to keep a mirror of the private EPT in
   host memory, and define entries in kvm_x86_ops to operate on external page
   tables such as the TDX private EPT.
 
 * The recently introduced conversion of the NX-page reclamation kthread to
   vhost_task moved the task under the main process.  The task is created as
   soon as KVM_CREATE_VM was invoked and this, of course, broke userspace that
   didn't expect to see any child task of the VM process until it started
   creating its own userspace threads.  In particular crosvm refuses to fork()
   if procfs shows any child task, so unbreak it by creating the task lazily.
   This is arguably a userspace bug, as there can be other kinds of legitimate
   worker tasks and they wouldn't impede fork(); but it's not like userspace
   has a way to distinguish kernel worker tasks right now.  Should they show
   as "Kthread: 1" in proc/.../status?
 
 x86 - Intel:
 
 * Fix a bug where KVM updates hardware's APICv cache of the highest ISR bit
   while L2 is active, while ultimately results in a hardware-accelerated L1
   EOI effectively being lost.
 
 * Honor event priority when emulating Posted Interrupt delivery during nested
   VM-Enter by queueing KVM_REQ_EVENT instead of immediately handling the
   interrupt.
 
 * Rework KVM's processing of the Page-Modification Logging buffer to reap
   entries in the same order they were created, i.e. to mark gfns dirty in the
   same order that hardware marked the page/PTE dirty.
 
 * Misc cleanups.
 
 Generic:
 
 * Cleanup and harden kvm_set_memory_region(); add proper lockdep assertions when
   setting memory regions and add a dedicated API for setting KVM-internal
   memory regions.  The API can then explicitly disallow all flags for
   KVM-internal memory regions.
 
 * Explicitly verify the target vCPU is online in kvm_get_vcpu() to fix a bug
   where KVM would return a pointer to a vCPU prior to it being fully online,
   and give kvm_for_each_vcpu() similar treatment to fix a similar flaw.
 
 * Wait for a vCPU to come online prior to executing a vCPU ioctl, to fix a
   bug where userspace could coerce KVM into handling the ioctl on a vCPU that
   isn't yet onlined.
 
 * Gracefully handle xarray insertion failures; even though such failures are
   impossible in practice after xa_reserve(), reserving an entry is always followed
   by xa_store() which does not know (or differentiate) whether there was an
   xa_reserve() before or not.
 
 RISC-V:
 
 * Zabha, Svvptc, and Ziccrse extension support for guests.  None of them
   require anything in KVM except for detecting them and marking them
   as supported; Zabha adds byte and halfword atomic operations, while the
   others are markers for specific operation of the TLB and of LL/SC
   instructions respectively.
 
 * Virtualize SBI system suspend extension for Guest/VM
 
 * Support firmware counters which can be used by the guests to collect
   statistics about traps that occur in the host.
 
 Selftests:
 
 * Rework vcpu_get_reg() to return a value instead of using an out-param, and
   update all affected arch code accordingly.
 
 * Convert the max_guest_memory_test into a more generic mmu_stress_test.
   The basic gist of the "conversion" is to have the test do mprotect() on
   guest memory while vCPUs are accessing said memory, e.g. to verify KVM
   and mmu_notifiers are working as intended.
 
 * Play nice with treewrite builds of unsupported architectures, e.g. arm
   (32-bit), as KVM selftests' Makefile doesn't do anything to ensure the
   target architecture is actually one KVM selftests supports.
 
 * Use the kernel's $(ARCH) definition instead of the target triple for arch
   specific directories, e.g. arm64 instead of aarch64, mainly so as not to
   be different from the rest of the kernel.
 
 * Ensure that format strings for logging statements are checked by the
   compiler even when the logging statement itself is disabled.
 
 * Attempt to whack the last LLC references/misses mole in the Intel PMU
   counters test by adding a data load and doing CLFLUSH{OPT} on the data
   instead of the code being executed.  It seems that modern Intel CPUs
   have learned new code prefetching tricks that bypass the PMU counters.
 
 * Fix a flaw in the Intel PMU counters test where it asserts that events
   are counting correctly without actually knowing what the events count
   given the underlying hardware; this can happen if Intel reuses a
   formerly microarchitecture-specific event encoding as an architectural
   event, as was the case for Top-Down Slots.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmeTuzoUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOkBwf8CRNExYaM3j9y2E7mmo6AiL2ug6+J
 Uy5Hai1poY48pPwKC6ke3EWT8WVsgj/Py5pCeHvLojQchWNjCCYNfSQluJdkRxwG
 DgP3QUljSxEJWBeSwyTRcKM+IySi5hZd1IFo3gePFRB829Jpnj05vjbvCyv8gIwU
 y3HXxSYDsViaaFoNg4OlZFsIGis7mtknsZzk++QjuCXmxNa6UCbv3qvE/UkVLhVg
 WH65RTRdjk+EsdwaOMHKuUvQoGa+iM4o39b6bqmw8+ZMK39+y33WeTX/y5RXsp1N
 tUUBRfS+MuuYgC/6LmTr66EkMzoChxk3Dp3kKUaCBcfqRC8PxQag5reZhw==
 =NEaO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Loongarch:

   - Clear LLBCTL if secondary mmu mapping changes

   - Add hypercall service support for usermode VMM

  x86:

   - Add a comment to kvm_mmu_do_page_fault() to explain why KVM
     performs a direct call to kvm_tdp_page_fault() when RETPOLINE is
     enabled

   - Ensure that all SEV code is compiled out when disabled in Kconfig,
     even if building with less brilliant compilers

   - Remove a redundant TLB flush on AMD processors when guest CR4.PGE
     changes

   - Use str_enabled_disabled() to replace open coded strings

   - Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's
     APICv cache prior to every VM-Enter

   - Overhaul KVM's CPUID feature infrastructure to track all vCPU
     capabilities instead of just those where KVM needs to manage state
     and/or explicitly enable the feature in hardware. Along the way,
     refactor the code to make it easier to add features, and to make it
     more self-documenting how KVM is handling each feature

   - Rework KVM's handling of VM-Exits during event vectoring; this
     plugs holes where KVM unintentionally puts the vCPU into infinite
     loops in some scenarios (e.g. if emulation is triggered by the
     exit), and brings parity between VMX and SVM

   - Add pending request and interrupt injection information to the
     kvm_exit and kvm_entry tracepoints respectively

   - Fix a relatively benign flaw where KVM would end up redoing RDPKRU
     when loading guest/host PKRU, due to a refactoring of the kernel
     helpers that didn't account for KVM's pre-checking of the need to
     do WRPKRU

   - Make the completion of hypercalls go through the complete_hypercall
     function pointer argument, no matter if the hypercall exits to
     userspace or not.

     Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically
     went to userspace, and all the others did not; the new code need
     not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at
     all whether there was an exit to userspace or not

   - As part of enabling TDX virtual machines, support support
     separation of private/shared EPT into separate roots.

     When TDX will be enabled, operations on private pages will need to
     go through the privileged TDX Module via SEAMCALLs; as a result,
     they are limited and relatively slow compared to reading a PTE.

     The patches included in 6.14 allow KVM to keep a mirror of the
     private EPT in host memory, and define entries in kvm_x86_ops to
     operate on external page tables such as the TDX private EPT

   - The recently introduced conversion of the NX-page reclamation
     kthread to vhost_task moved the task under the main process. The
     task is created as soon as KVM_CREATE_VM was invoked and this, of
     course, broke userspace that didn't expect to see any child task of
     the VM process until it started creating its own userspace threads.

     In particular crosvm refuses to fork() if procfs shows any child
     task, so unbreak it by creating the task lazily. This is arguably a
     userspace bug, as there can be other kinds of legitimate worker
     tasks and they wouldn't impede fork(); but it's not like userspace
     has a way to distinguish kernel worker tasks right now. Should they
     show as "Kthread: 1" in proc/.../status?

  x86 - Intel:

   - Fix a bug where KVM updates hardware's APICv cache of the highest
     ISR bit while L2 is active, while ultimately results in a
     hardware-accelerated L1 EOI effectively being lost

   - Honor event priority when emulating Posted Interrupt delivery
     during nested VM-Enter by queueing KVM_REQ_EVENT instead of
     immediately handling the interrupt

   - Rework KVM's processing of the Page-Modification Logging buffer to
     reap entries in the same order they were created, i.e. to mark gfns
     dirty in the same order that hardware marked the page/PTE dirty

   - Misc cleanups

  Generic:

   - Cleanup and harden kvm_set_memory_region(); add proper lockdep
     assertions when setting memory regions and add a dedicated API for
     setting KVM-internal memory regions. The API can then explicitly
     disallow all flags for KVM-internal memory regions

   - Explicitly verify the target vCPU is online in kvm_get_vcpu() to
     fix a bug where KVM would return a pointer to a vCPU prior to it
     being fully online, and give kvm_for_each_vcpu() similar treatment
     to fix a similar flaw

   - Wait for a vCPU to come online prior to executing a vCPU ioctl, to
     fix a bug where userspace could coerce KVM into handling the ioctl
     on a vCPU that isn't yet onlined

   - Gracefully handle xarray insertion failures; even though such
     failures are impossible in practice after xa_reserve(), reserving
     an entry is always followed by xa_store() which does not know (or
     differentiate) whether there was an xa_reserve() before or not

  RISC-V:

   - Zabha, Svvptc, and Ziccrse extension support for guests. None of
     them require anything in KVM except for detecting them and marking
     them as supported; Zabha adds byte and halfword atomic operations,
     while the others are markers for specific operation of the TLB and
     of LL/SC instructions respectively

   - Virtualize SBI system suspend extension for Guest/VM

   - Support firmware counters which can be used by the guests to
     collect statistics about traps that occur in the host

  Selftests:

   - Rework vcpu_get_reg() to return a value instead of using an
     out-param, and update all affected arch code accordingly

   - Convert the max_guest_memory_test into a more generic
     mmu_stress_test. The basic gist of the "conversion" is to have the
     test do mprotect() on guest memory while vCPUs are accessing said
     memory, e.g. to verify KVM and mmu_notifiers are working as
     intended

   - Play nice with treewrite builds of unsupported architectures, e.g.
     arm (32-bit), as KVM selftests' Makefile doesn't do anything to
     ensure the target architecture is actually one KVM selftests
     supports

   - Use the kernel's $(ARCH) definition instead of the target triple
     for arch specific directories, e.g. arm64 instead of aarch64,
     mainly so as not to be different from the rest of the kernel

   - Ensure that format strings for logging statements are checked by
     the compiler even when the logging statement itself is disabled

   - Attempt to whack the last LLC references/misses mole in the Intel
     PMU counters test by adding a data load and doing CLFLUSH{OPT} on
     the data instead of the code being executed. It seems that modern
     Intel CPUs have learned new code prefetching tricks that bypass the
     PMU counters

   - Fix a flaw in the Intel PMU counters test where it asserts that
     events are counting correctly without actually knowing what the
     events count given the underlying hardware; this can happen if
     Intel reuses a formerly microarchitecture-specific event encoding
     as an architectural event, as was the case for Top-Down Slots"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (151 commits)
  kvm: defer huge page recovery vhost task to later
  KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault()
  KVM: Disallow all flags for KVM-internal memslots
  KVM: x86: Drop double-underscores from __kvm_set_memory_region()
  KVM: Add a dedicated API for setting KVM-internal memslots
  KVM: Assert slots_lock is held when setting memory regions
  KVM: Open code kvm_set_memory_region() into its sole caller (ioctl() API)
  LoongArch: KVM: Add hypercall service support for usermode VMM
  LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changed
  KVM: SVM: Use str_enabled_disabled() helper in svm_hardware_setup()
  KVM: VMX: read the PML log in the same order as it was written
  KVM: VMX: refactor PML terminology
  KVM: VMX: Fix comment of handle_vmx_instruction()
  KVM: VMX: Reinstate __exit attribute for vmx_exit()
  KVM: SVM: Use str_enabled_disabled() helper in sev_hardware_setup()
  KVM: x86: Avoid double RDPKRU when loading host/guest PKRU
  KVM: x86: Use LVT_TIMER instead of an open coded literal
  RISC-V: KVM: Add new exit statstics for redirected traps
  RISC-V: KVM: Update firmware counters for various events
  RISC-V: KVM: Redirect instruction access fault trap to guest
  ...
2025-01-25 09:55:09 -08:00
Kemeng Shi
7e060df04f Xarray: do not return sibling entries from xas_find_marked()
Patch series "Fixes and cleanups to xarray", v5.

This series contains some random fixes and cleanups to xarray.  Patch 1-2
are fixes and patch 3-6 are cleanups.  More details can be found in
respective patches.


This patch (of 5):

Similar to issue fixed in commit cbc0285433 ("XArray: Do not return
sibling entries from xa_load()"), we may return sibling entries from
xas_find_marked as following:
    Thread A:               Thread B:
                            xa_store_range(xa, entry, 6, 7, gfp);
			    xa_set_mark(xa, 6, mark)
    XA_STATE(xas, xa, 6);
    xas_find_marked(&xas, 7, mark);
    offset = xas_find_chunk(xas, advance, mark);
    [offset is 6 which points to a valid entry]
                            xa_store_range(xa, entry, 4, 7, gfp);
    entry = xa_entry(xa, node, 6);
    [entry is a sibling of 4]
    if (!xa_is_node(entry))
        return entry;

Skip sibling entry like xas_find() does to protect caller from seeing
sibling entry from xas_find_marked() or caller may use sibling entry
as a valid entry and crash the kernel.

Besides, load_race() test is modified to catch mentioned issue and modified
load_race() only passes after this fix is merged.

Here is an example how this bug could be triggerred in tmpfs which
enables large folio in mapping:
Let's take a look at involved racer:
1. How pages could be created and dirtied in shmem file.
write
 ksys_write
  vfs_write
   new_sync_write
    shmem_file_write_iter
     generic_perform_write
      shmem_write_begin
       shmem_get_folio
        shmem_allowable_huge_orders
        shmem_alloc_and_add_folios
        shmem_alloc_folio
        __folio_set_locked
        shmem_add_to_page_cache
         XA_STATE_ORDER(..., index, order)
         xax_store()
      shmem_write_end
       folio_mark_dirty()

2. How dirty pages could be deleted in shmem file.
ioctl
 do_vfs_ioctl
  file_ioctl
   ioctl_preallocate
    vfs_fallocate
     shmem_fallocate
      shmem_truncate_range
       shmem_undo_range
        truncate_inode_folio
         filemap_remove_folio
          page_cache_delete
           xas_store(&xas, NULL);

3. How dirty pages could be lockless searched
sync_file_range
 ksys_sync_file_range
  __filemap_fdatawrite_range
   filemap_fdatawrite_wbc
    do_writepages
     writeback_use_writepage
      writeback_iter
       writeback_get_folio
        filemap_get_folios_tag
         find_get_entry
          folio = xas_find_marked()
          folio_try_get(folio)

Kernel will crash as following:
1.Create               2.Search             3.Delete
/* write page 2,3 */
write
 ...
  shmem_write_begin
   XA_STATE_ORDER(xas, i_pages, index = 2, order = 1)
   xa_store(&xas, folio)
  shmem_write_end
   folio_mark_dirty()

                       /* sync page 2 and page 3 */
                       sync_file_range
                        ...
                         find_get_entry
                          folio = xas_find_marked()
                          /* offset will be 2 */
                          offset = xas_find_chunk()

                                             /* delete page 2 and page 3 */
                                             ioctl
                                              ...
                                               xas_store(&xas, NULL);

/* write page 0-3 */
write
 ...
  shmem_write_begin
   XA_STATE_ORDER(xas, i_pages, index = 0, order = 2)
   xa_store(&xas, folio)
  shmem_write_end
   folio_mark_dirty(folio)

                          /* get sibling entry from offset 2 */
                          entry = xa_entry(.., 2)
                          /* use sibling entry as folio and crash kernel */
                          folio_try_get(folio)

Link: https://lkml.kernel.org/r/20241213122523.12764-1-shikemeng@huaweicloud.com
Link: https://lkml.kernel.org/r/20241213122523.12764-2-shikemeng@huaweicloud.com
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Mattew Wilcox <willy@infradead.org> [English fixes]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:27 -08:00
Linus Torvalds
ae8b53aac3 EFI updates for v6.14
- Increase the headroom in the EFI memory map allocation created by the
   EFI stub. This is needed because event callbacks called during
   ExitBootServices() may cause fragmentation, and reallocation is not
   allowed after that.
 
 - Drop obsolete UGA graphics code and switch to a more ergonomic API to
   traverse handle buffers. Simplify some error paths using a __free()
   helper while at it.
 
 - Fix some W=1 warnings when CONFIG_EFI=n
 
 - Rely on the dentry cache to keep track of the contents of the efivarfs
   filesystem, rather than using a separate linked list.
 
 - Improve and extend efivarfs test cases.
 
 - Synchronize efivarfs with underlying variable store on resume from
   hibernation - this is needed because the firmware itself or another OS
   running on the same machine may have modified it.
 
 - Fix x86 EFI stub build with GCC 15.
 
 - Fix kexec/x86 false positive warning in EFI memory attributes table
   sanity check.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZ5IH+gAKCRAwbglWLn0t
 XHyMAP9Mqn5dD4XT22gvTRUrJuVYFLBlN+9d8ysRMjRVCzGwCQEAvCUJMy5Kje0J
 h9i2InWjjPOVATx5hTrEoIEl96BGOgk=
 =3Hnk
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:

 - Increase the headroom in the EFI memory map allocation created by the
   EFI stub. This is needed because event callbacks called during
   ExitBootServices() may cause fragmentation, and reallocation is not
   allowed after that.

 - Drop obsolete UGA graphics code and switch to a more ergonomic API to
   traverse handle buffers. Simplify some error paths using a __free()
   helper while at it.

 - Fix some W=1 warnings when CONFIG_EFI=n

 - Rely on the dentry cache to keep track of the contents of the
   efivarfs filesystem, rather than using a separate linked list.

 - Improve and extend efivarfs test cases.

 - Synchronize efivarfs with underlying variable store on resume from
   hibernation - this is needed because the firmware itself or another
   OS running on the same machine may have modified it.

 - Fix x86 EFI stub build with GCC 15.

 - Fix kexec/x86 false positive warning in EFI memory attributes table
   sanity check.

* tag 'efi-next-for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (23 commits)
  x86/efi: skip memattr table on kexec boot
  efivarfs: add variable resync after hibernation
  efivarfs: abstract initial variable creation routine
  efi: libstub: Use '-std=gnu11' to fix build with GCC 15
  selftests/efivarfs: add concurrent update tests
  selftests/efivarfs: fix tests for failed write removal
  efivarfs: fix error on write to new variable leaving remnants
  efivarfs: remove unused efivarfs_list
  efivarfs: move variable lifetime management into the inodes
  selftests/efivarfs: add check for disallowing file truncation
  efivarfs: prevent setting of zero size on the inodes in the cache
  efi: sysfb_efi: fix W=1 warnings when EFI is not set
  efi/libstub: Use __free() helper for pool deallocations
  efi/libstub: Use cleanup helpers for freeing copies of the memory map
  efi/libstub: Simplify PCI I/O handle buffer traversal
  efi/libstub: Refactor and clean up GOP resolution picker code
  efi/libstub: Simplify GOP handling code
  efi/libstub: Use C99-style for loop to traverse handle buffer
  x86/efistub: Drop long obsolete UGA support
  efivarfs: make variable_is_present use dcache lookup
  ...
2025-01-24 15:33:33 -08:00
Linus Torvalds
bc8198dc7e sched_ext: Changes for v6.14
- scx_bpf_now() added so that BPF scheduler can access the cached timestamp
   in struct rq to avoid reading TSC multiple times within a locked
   scheduling operation.
 
 - Minor updates to the built-in idle CPU selection logic.
 
 - tool/sched_ext updates and other misc changes.
 
 Pulling sched_ext/for-6.14 into master causes a merge conflict between the
 following two commits (first commit in master, second in for-6.14):
 
   a2a3374c47 sched_ext: idle: Refresh idle masks during idle-to-idle transitions
   9cf9aceed2 sched_ext: idle: use assign_cpu() to update the idle cpumask
 
   static void update_builtin_idle(int cpu, bool idle)
   {
   <<<<<<< HEAD
           if (idle)
                   cpumask_set_cpu(cpu, idle_masks.cpu);
           else
                   cpumask_clear_cpu(cpu, idle_masks.cpu);
   =======
           int cpu = cpu_of(rq);
 
           if (SCX_HAS_OP(update_idle) && !scx_rq_bypassing(rq)) {
                   SCX_CALL_OP(SCX_KF_REST, update_idle, cpu_of(rq), idle);
                   if (!static_branch_unlikely(&scx_builtin_idle_enabled))
                           return;
           }
 
           assign_cpu(cpu, idle_masks.cpu, idle);
   >>>>>>> 987ce79b52
 
 The first commit factored out update_builtin_idle() and the second replaced
 cpumask_set/clear_cpu() calls with assign_cpu(). The conflict can be
 resolved by taking the code from the first and then replacing the
 cpumask_set/clear_cpu() calls with assign_cpu():
 
   static void update_builtin_idle(int cpu, bool idle)
   {
           assign_cpu(cpu, idle_masks.cpu, idle);
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ4/yjw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGc/3AQChJ2zxBLMtQOfNVWgqhAyatCnFquMncl/IeGAs
 Sf4eawD/QHNWYBPx0nOFQS8RZNmUcXuEDeHMy+1MvTUaIDL5MQM=
 =6t0t
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext updates from Tejun Heo:

 - scx_bpf_now() added so that BPF scheduler can access the cached
   timestamp in struct rq to avoid reading TSC multiple times within a
   locked scheduling operation.

 - Minor updates to the built-in idle CPU selection logic.

 - tool/sched_ext updates and other misc changes.

* tag 'sched_ext-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: fix kernel-doc warnings
  sched_ext: Use time helpers in BPF schedulers
  sched_ext: Replace bpf_ktime_get_ns() to scx_bpf_now()
  sched_ext: Add time helpers for BPF schedulers
  sched_ext: Add scx_bpf_now() for BPF scheduler
  sched_ext: Implement scx_bpf_now()
  sched_ext: Relocate scx_enabled() related code
  sched_ext: Add option -l in selftest runner to list all available tests
  sched_ext: Include remaining task time slice in error state dump
  sched_ext: update scx_bpf_dsq_insert() doc for SCX_DSQ_LOCAL_ON
  sched_ext: idle: small CPU iteration refactoring
  sched_ext: idle: introduce check_builtin_idle_enabled() helper
  sched_ext: idle: clarify comments
  sched_ext: idle: use assign_cpu() to update the idle cpumask
  sched_ext: Use str_enabled_disabled() helper in update_selcpu_topology()
  sched_ext: Use sizeof_field for key_len in dsq_hash_params
  tools/sched_ext: Receive updates from SCX repo
  sched_ext: Use the NUMA scheduling domain for NUMA optimizations
2025-01-23 18:49:43 -08:00
Linus Torvalds
e8744fbc83 tracing updates for v6.14:
- Cleanup with guard() and free() helpers
 
   There were several places in the code that had a lot of "goto out" in the
   error paths to either unlock a lock or free some memory that was
   allocated. But this is error prone. Convert the code over to use the
   guard() and free() helpers that let the compiler unlock locks or free
   memory when the function exits.
 
 - Update the Rust tracepoint code to use the C code too
 
   There was some duplication of the tracepoint code for Rust that did the
   same logic as the C code. Add a helper that makes it possible for both
   algorithms to use the same logic in one place.
 
 - Add poll to trace event hist files
 
   It is useful to know when an event is triggered, or even with some
   filtering. Since hist files of events get updated when active and the
   event is triggered, allow applications to poll the hist file and wake up
   when an event is triggered. This will let the application know that the
   event it is waiting for happened.
 
 - Add :mod: command to enable events for current or future modules
 
   The function tracer already has a way to enable functions to be traced in
   modules by writing ":mod:<module>" into set_ftrace_filter. That will
   enable either all the functions for the module if it is loaded, or if it
   is not, it will cache that command, and when the module is loaded that
   matches <module>, its functions will be enabled. This also allows init
   functions to be traced. But currently events do not have that feature.
 
   Add the command where if ':mod:<module>' is written into set_event, then
   either all the modules events are enabled if it is loaded, or cache it so
   that the module's events are enabled when it is loaded. This also works
   from the kernel command line, where "trace_event=:mod:<module>", when the
   module is loaded at boot up, its events will be enabled then.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ5EbMxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkZsAP9Amgx9frSbR1pn1t0I3wVnQx7khgOu
 s/b8Ro+vjTx1/QD/RN2AA7f+HK4F27w3Aqfrs0nKXAPtXWsJ9Epp8raG5w8=
 =Pg+4
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Cleanup with guard() and free() helpers

   There were several places in the code that had a lot of "goto out" in
   the error paths to either unlock a lock or free some memory that was
   allocated. But this is error prone. Convert the code over to use the
   guard() and free() helpers that let the compiler unlock locks or free
   memory when the function exits.

 - Update the Rust tracepoint code to use the C code too

   There was some duplication of the tracepoint code for Rust that did
   the same logic as the C code. Add a helper that makes it possible for
   both algorithms to use the same logic in one place.

 - Add poll to trace event hist files

   It is useful to know when an event is triggered, or even with some
   filtering. Since hist files of events get updated when active and the
   event is triggered, allow applications to poll the hist file and wake
   up when an event is triggered. This will let the application know
   that the event it is waiting for happened.

 - Add :mod: command to enable events for current or future modules

   The function tracer already has a way to enable functions to be
   traced in modules by writing ":mod:<module>" into set_ftrace_filter.
   That will enable either all the functions for the module if it is
   loaded, or if it is not, it will cache that command, and when the
   module is loaded that matches <module>, its functions will be
   enabled. This also allows init functions to be traced. But currently
   events do not have that feature.

   Add the command where if ':mod:<module>' is written into set_event,
   then either all the modules events are enabled if it is loaded, or
   cache it so that the module's events are enabled when it is loaded.
   This also works from the kernel command line, where
   "trace_event=:mod:<module>", when the module is loaded at boot up,
   its events will be enabled then.

* tag 'trace-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
  tracing: Fix output of set_event for some cached module events
  tracing: Fix allocation of printing set_event file content
  tracing: Rename update_cache() to update_mod_cache()
  tracing: Fix #if CONFIG_MODULES to #ifdef CONFIG_MODULES
  selftests/ftrace: Add test that tests event :mod: commands
  tracing: Cache ":mod:" events for modules not loaded yet
  tracing: Add :mod: command to enabled module events
  selftests/tracing: Add hist poll() support test
  tracing/hist: Support POLLPRI event for poll on histogram
  tracing/hist: Add poll(POLLIN) support on hist file
  tracing: Fix using ret variable in tracing_set_tracer()
  tracepoint: Reduce duplication of __DO_TRACE_CALL
  tracing/string: Create and use __free(argv_free) in trace_dynevent.c
  tracing: Switch trace_stat.c code over to use guard()
  tracing: Switch trace_stack.c code over to use guard()
  tracing: Switch trace_osnoise.c code over to use guard() and __free()
  tracing: Switch trace_events_synth.c code over to use guard()
  tracing: Switch trace_events_filter.c code over to use guard()
  tracing: Switch trace_events_trigger.c code over to use guard()
  tracing: Switch trace_events_hist.c code over to use guard()
  ...
2025-01-23 17:51:16 -08:00
Linus Torvalds
7f71554b4e ktest updates for v6.14:
- Fix use of KERNEL_VERSION in newly created output directory
 
   If a new output directory is created (O=/dir), and one of the options uses
   KERNEL_VERSION which will run a "make kernelversion" in the output
   directory, it will fail because there is no config file yet. In this case,
   have it do a "make allnoconfig" which is the minimal needed to run the
   "make kernelversion".
 
 - Remove unused variables
 
 - Fix some typos
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ5AsqBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlfJAQCmRlwC8x+AVy2lij2sTqM8nfB7DfTF
 QMavz/I1xn0f3wD/bxVfN8SebrCBBiICW09bvnWzqzyzgFsjjxRRq+Zzvwc=
 =qfZn
 -----END PGP SIGNATURE-----

Merge tag 'ktest-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest

Pull ktest updates from Steven Rostedt:

 - Fix use of KERNEL_VERSION in newly created output directory

   If a new output directory is created (O=/dir), and one of the options
   uses KERNEL_VERSION which will run a "make kernelversion" in the
   output directory, it will fail because there is no config file yet.

   In this case, have it do a "make allnoconfig" which is the minimal
   needed to run the "make kernelversion".

 - Remove unused variables

 - Fix some typos

* tag 'ktest-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
  ktest.pl: Fix typo "accesing"
  ktest.pl: Fix typo in comment
  ktest.pl: Remove unused declarations in run_bisect_test function
  ktest.pl: Check kernelrelease return in get_version
2025-01-23 17:45:24 -08:00
Linus Torvalds
d0d106a2bd bpf-next-6.14
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmeOu1YACgkQ6rmadz2v
 bTrrHxAAn6eqEsluWnDlzhI0OGsPjvgS00sf+MOeqiXYeS2eJ8yJuKifp38+nIQZ
 lIplsWU2ReUY20eizPqLPnQ7TXZGvLgp08E8yHUoZ0siWanqr9iDRfbZCCNrDMNm
 lMqeR1SLapMws2R/UX9JbvPn2ajIJ6Lb4wxenTfdlW6q+0hAGM6Dt0k/jBod+quq
 /oo+xwG3L0q4APBovJfiAFN2z6IYN03b+zLiOrpIJtMACGewEXnl3m4mkL8ZM/FV
 nZGPIxIUPXCpKTGEkNqxfkrnHN2wZQ4ZSKEJ6lhEEp4jrgCVITaGZ/E7jlx6fZoj
 bbd4YMonIPo9Nhim8p1dt8yYBhKKiE5IXIq0GqlMv5+MvAN8ylrlydpsouW1fu66
 hZ1W1BxbxmrgyF0Bwo9JPOMhBHwMrmD6iH9LgiMpZf0ASeF+q9cJpoSOU5j5E9XB
 LpLIRf5jYTd4wZjhDmrQREReLo+Bng9DlCBu+jjh2+YTz6l6Qed+ETpENcd7lL5i
 IHZVbgD2RVPNJoUfdrd763HfYfDTk+50MF5FIMEyfKHz11if0E/LhBMzto22hm6b
 2f8ruj/8yvg8s2dxEP3ySQgcnynlwEnGxLenUVv7uEOYKeWri1rq+fvTK5ne1OLK
 oHnTlkViwQb74c0r8cFW+nkyfUYTfhhBAql14rl/fMjGDO2KZ10=
 =f2CA
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:
 "A smaller than usual release cycle.

  The main changes are:

   - Prepare selftest to run with GCC-BPF backend (Ihor Solodrai)

     In addition to LLVM-BPF runs the BPF CI now runs GCC-BPF in compile
     only mode. Half of the tests are failing, since support for
     btf_decl_tag is still WIP, but this is a great milestone.

   - Convert various samples/bpf to selftests/bpf/test_progs format
     (Alexis Lothoré and Bastien Curutchet)

   - Teach verifier to recognize that array lookup with constant
     in-range index will always succeed (Daniel Xu)

   - Cleanup migrate disable scope in BPF maps (Hou Tao)

   - Fix bpf_timer destroy path in PREEMPT_RT (Hou Tao)

   - Always use bpf_mem_alloc in bpf_local_storage in PREEMPT_RT (Martin
     KaFai Lau)

   - Refactor verifier lock support (Kumar Kartikeya Dwivedi)

     This is a prerequisite for upcoming resilient spin lock.

   - Remove excessive 'may_goto +0' instructions in the verifier that
     LLVM leaves when unrolls the loops (Yonghong Song)

   - Remove unhelpful bpf_probe_write_user() warning message (Marco
     Elver)

   - Add fd_array_cnt attribute for prog_load command (Anton Protopopov)

     This is a prerequisite for upcoming support for static_branch"

* tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (125 commits)
  selftests/bpf: Add some tests related to 'may_goto 0' insns
  bpf: Remove 'may_goto 0' instruction in opt_remove_nops()
  bpf: Allow 'may_goto 0' instruction in verifier
  selftests/bpf: Add test case for the freeing of bpf_timer
  bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT
  bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
  bpf: Bail out early in __htab_map_lookup_and_delete_elem()
  bpf: Free special fields after unlock in htab_lru_map_delete_node()
  tools: Sync if_xdp.h uapi tooling header
  libbpf: Work around kernel inconsistently stripping '.llvm.' suffix
  bpf: selftests: verifier: Add nullness elision tests
  bpf: verifier: Support eliding map lookup nullness
  bpf: verifier: Refactor helper access type tracking
  bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write
  bpf: verifier: Add missing newline on verbose() call
  selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED
  libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED
  libbpf: Fix return zero when elf_begin failed
  selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test
  veristat: Load struct_ops programs only once
  ...
2025-01-23 08:04:07 -08:00
Jakub Kicinski
965adae5a3 selftests/net: packetdrill: more xfail changes (and a correction)
Recent change to add more cases to XFAIL has a broken regex,
the matching needs a real regex not a glob pattern.

While at it add the cases Willem pointed out during review.

Fixes: 3030e3d57b ("selftests/net: packetdrill: make tcp buf limited timing tests benign")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250121143423.215261-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-23 07:07:41 -08:00
Koichiro Den
f8524ac33c selftests: gpio: gpio-sim: Fix missing chip disablements
Since upstream commit 8bd76b3d3f ("gpio: sim: lock up configfs that an
instantiated device depends on"), rmdir for an active virtual devices
been prohibited.

Update gpio-sim selftest to align with the change.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202501221006.a1ca5dfa-lkp@intel.com
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Link: https://lore.kernel.org/r/20250122043309.304621-1-koichiro.den@canonical.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-01-23 15:44:48 +01:00
Linus Torvalds
21266b8df5 AT_EXECVE_CHECK introduction for v6.14-rc1
- Implement AT_EXECVE_CHECK flag to execveat(2) (Mickaël Salaün)
 
 - Implement EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
   (Mickaël Salaün)
 
 - Add selftests and samples for AT_EXECVE_CHECK (Mickaël Salaün)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ4hO7wAKCRA2KwveOeQk
 u4l+AP9UHO1KwMn3aOt6uFPj7omaoY0vpcB1rx/x5s4efNFHOAD/QjY0f+ND+HzF
 mKLYOIeacGEQi7TNhpnOkGjz6jzSiwg=
 =sMhZ
 -----END PGP SIGNATURE-----

Merge tag 'AT_EXECVE_CHECK-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull AT_EXECVE_CHECK from Kees Cook:

 - Implement AT_EXECVE_CHECK flag to execveat(2) (Mickaël Salaün)

 - Implement EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
   (Mickaël Salaün)

 - Add selftests and samples for AT_EXECVE_CHECK (Mickaël Salaün)

* tag 'AT_EXECVE_CHECK-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  ima: instantiate the bprm_creds_for_exec() hook
  samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
  selftests: ktap_helpers: Fix uninitialized variable
  samples/check-exec: Add set-exec
  selftests/landlock: Add tests for execveat + AT_EXECVE_CHECK
  selftests/exec: Add 32 tests for AT_EXECVE_CHECK and exec securebits
  security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
  exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
2025-01-22 20:34:42 -08:00