Commit Graph

2174 Commits

Author SHA1 Message Date
Linus Torvalds
c604110e66 vfs-6.8.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUxRQAKCRCRxhvAZXjc
 ov/QAQDzvge3oQ9MEymmOiyzzcF+HhAXBr+9oEsYJjFc1p0TsgEA61gXjZo7F1jY
 KBqd6znOZCR+Waj0kIVJRAo/ISRBqQc=
 =0bRl
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual fses.

  Features:

   - Add Jan Kara as VFS reviewer

   - Show correct device and inode numbers in proc/<pid>/maps for vma
     files on stacked filesystems. This is now easily doable thanks to
     the backing file work from the last cycles. This comes with
     selftests

  Cleanups:

   - Remove a redundant might_sleep() from wait_on_inode()

   - Initialize pointer with NULL, not 0

   - Clarify comment on access_override_creds()

   - Rework and simplify eventfd_signal() and eventfd_signal_mask()
     helpers

   - Process aio completions in batches to avoid needless wakeups

   - Completely decouple struct mnt_idmap from namespaces. We now only
     keep the actual idmapping around and don't stash references to
     namespaces

   - Reformat maintainer entries to indicate that a given subsystem
     belongs to fs/

   - Simplify fput() for files that were never opened

   - Get rid of various pointless file helpers

   - Rename various file helpers

   - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from
     last cycle

   - Make relatime_need_update() return bool

   - Use GFP_KERNEL instead of GFP_USER when allocating superblocks

   - Replace deprecated ida_simple_*() calls with their current ida_*()
     counterparts

  Fixes:

   - Fix comments on user namespace id mapping helpers. They aren't
     kernel doc comments so they shouldn't be using /**

   - s/Retuns/Returns/g in various places

   - Add missing parameter documentation on can_move_mount_beneath()

   - Rename i_mapping->private_data to i_mapping->i_private_data

   - Fix a false-positive lockdep warning in pipe_write() for watch
     queues

   - Improve __fget_files_rcu() code generation to improve performance

   - Only notify writer that pipe resizing has finished after setting
     pipe->max_usage otherwise writers are never notified that the pipe
     has been resized and hang

   - Fix some kernel docs in hfsplus

   - s/passs/pass/g in various places

   - Fix kernel docs in ntfs

   - Fix kcalloc() arguments order reported by gcc 14

   - Fix uninitialized value in reiserfs"

* tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
  reiserfs: fix uninit-value in comp_keys
  watch_queue: fix kcalloc() arguments order
  ntfs: dir.c: fix kernel-doc function parameter warnings
  fs: fix doc comment typo fs tree wide
  selftests/overlayfs: verify device and inode numbers in /proc/pid/maps
  fs/proc: show correct device and inode numbers in /proc/pid/maps
  eventfd: Remove usage of the deprecated ida_simple_xx() API
  fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation
  fs/hfsplus: wrapper.c: fix kernel-doc warnings
  fs: add Jan Kara as reviewer
  fs/inode: Make relatime_need_update return bool
  pipe: wakeup wr_wait after setting max_usage
  file: remove __receive_fd()
  file: stop exposing receive_fd_user()
  fs: replace f_rcuhead with f_task_work
  file: remove pointless wrapper
  file: s/close_fd_get_file()/file_close_fd()/g
  Improve __fget_files_rcu() code generation (and thus __fget_light())
  file: massage cleanup of files that failed to open
  fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
  ...
2024-01-08 10:26:08 -08:00
Christian Brauner
3652117f85 eventfd: simplify eventfd_signal()
Ever since the eventfd type was introduced back in 2007 in commit
e1ad7468c7 ("signal/timer/event: eventfd core") the eventfd_signal()
function only ever passed 1 as a value for @n. There's no point in
keeping that additional argument.

Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org
Acked-by: Xu Yilun <yilun.xu@intel.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl
Acked-by: Eric Farman <farman@linux.ibm.com>  # s390
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-28 14:08:38 +01:00
Dan Carpenter
7f3da4b698 xen/events: fix error code in xen_bind_pirq_msi_to_irq()
Return -ENOMEM if xen_irq_init() fails.  currently the code returns an
uninitialized variable or zero.

Fixes: 5dd9ad32d7 ("xen/events: drop xen_allocate_irqs_dynamic()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Juergen Gross <jgross@ssue.com>
Link: https://lore.kernel.org/r/3b9ab040-a92e-4e35-b687-3a95890a9ace@moroto.mountain
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-28 12:48:27 +01:00
Gustavo A. R. Silva
295b202227 xen: privcmd: Replace zero-length array with flex-array member and use __counted_by
Fake flexible arrays (zero-length and one-element arrays) are deprecated,
and should be replaced by flexible-array members. So, replace
zero-length array with a flexible-array member in `struct
privcmd_kernel_ioreq`.

Also annotate array `ports` with `__counted_by()` to prepare for the
coming implementation by GCC and Clang of the `__counted_by` attribute.
Flexible array members annotated with `__counted_by` can have their
accesses bounds-checked at run-time via `CONFIG_UBSAN_BOUNDS` (for array
indexing) and `CONFIG_FORTIFY_SOURCE` (for strcpy/memcpy-family functions).

This fixes multiple -Warray-bounds warnings:
drivers/xen/privcmd.c:1239:30: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1240:30: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1241:30: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1245:33: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1258:67: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]

This results in no differences in binary output.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/ZVZlg3tPMPCRdteh@work
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-17 10:47:19 +01:00
Keith Busch
bff2a2d453 swiotlb-xen: provide the "max_mapping_size" method
There's a bug that when using the XEN hypervisor with bios with large
multi-page bio vectors on NVMe, the kernel deadlocks [1].

The deadlocks are caused by inability to map a large bio vector -
dma_map_sgtable always returns an error, this gets propagated to the block
layer as BLK_STS_RESOURCE and the block layer retries the request
indefinitely.

XEN uses the swiotlb framework to map discontiguous pages into contiguous
runs that are submitted to the PCIe device. The swiotlb framework has a
limitation on the length of a mapping - this needs to be announced with
the max_mapping_size method to make sure that the hardware drivers do not
create larger mappings.

Without max_mapping_size, the NVMe block driver would create large
mappings that overrun the maximum mapping size.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Link: https://lore.kernel.org/stable/ZTNH0qtmint%2FzLJZ@mail-itl/ [1]
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/151bef41-e817-aea9-675-a35fdac4ed@redhat.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-17 10:45:46 +01:00
Juergen Gross
cee96422e8 xen/events: remove some info_for_irq() calls in pirq handling
Instead of the IRQ number user the struct irq_info pointer as parameter
in the internal pirq related functions. This allows to drop some calls
of info_for_irq().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-15 09:51:51 +01:00
Juergen Gross
3fcdaf3d76 xen/events: modify internal [un]bind interfaces
Modify the internal bind- and unbind-interfaces to take a struct
irq_info parameter. When allocating a new IRQ pass the pointer from
the allocating function further up.

This will reduce the number of info_for_irq() calls and make the code
more efficient.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-15 09:29:39 +01:00
Juergen Gross
5dd9ad32d7 xen/events: drop xen_allocate_irqs_dynamic()
Instead of having a common function for allocating a single IRQ or a
consecutive number of IRQs, split up the functionality into the callers
of xen_allocate_irqs_dynamic().

This allows to handle any allocation error in xen_irq_init() gracefully
instead of panicing the system. Let xen_irq_init() return the irq_info
pointer or NULL in case of an allocation error.

Additionally set the IRQ into irq_info already at allocation time, as
otherwise the IRQ would be '0' (which is a valid IRQ number) until
being set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-15 09:02:59 +01:00
Juergen Gross
3bdb0ac350 xen/events: remove some simple helpers from events_base.c
The helper functions type_from_irq() and cpu_from_irq() are just one
line functions used only internally.

Open code them where needed. At the same time modify and rename
get_evtchn_to_irq() to return a struct irq_info instead of the IRQ
number.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-14 10:16:34 +01:00
Juergen Gross
686464514f xen/events: reduce externally visible helper functions
get_evtchn_to_irq() has only one external user while irq_from_evtchn()
provides the same functionality and is exported for a wider user base.
Modify the only external user of get_evtchn_to_irq() to use
irq_from_evtchn() instead and make get_evtchn_to_irq() static.

evtchn_from_irq() and irq_from_virq() have a single external user and
can easily be combined to a new helper irq_evtchn_from_virq() allowing
to drop irq_from_virq() and to make evtchn_from_irq() static.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-14 09:29:28 +01:00
Juergen Gross
f96c6c588c xen/events: remove unused functions
There are no users of xen_irq_from_pirq() and xen_set_irq_pending().

Remove those functions.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-13 15:45:20 +01:00
Juergen Gross
47d9702040 xen/events: fix delayed eoi list handling
When delaying eoi handling of events, the related elements are queued
into the percpu lateeoi list. In case the list isn't empty, the
elements should be sorted by the time when eoi handling is to happen.

Unfortunately a new element will never be queued at the start of the
list, even if it has a handling time lower than all other list
elements.

Fix that by handling that case the same way as for an empty list.

Fixes: e99502f762 ("xen/events: defer eoi in case of excessive number of events")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-13 15:21:08 +01:00
Randy Dunlap
50e865a568 xen/shbuf: eliminate 17 kernel-doc warnings
Don't use kernel-doc markers ("/**") for comments that are not in
kernel-doc format. This prevents multiple kernel-doc warnings:

xen-front-pgdir-shbuf.c:25: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * This structure represents the structure of a shared page
xen-front-pgdir-shbuf.c:37: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Shared buffer ops which are differently implemented
xen-front-pgdir-shbuf.c:65: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Get granted reference to the very first page of the
xen-front-pgdir-shbuf.c:85: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Map granted references of the shared buffer.
xen-front-pgdir-shbuf.c:106: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Unmap granted references of the shared buffer.
xen-front-pgdir-shbuf.c:127: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Free all the resources of the shared buffer.
xen-front-pgdir-shbuf.c:154: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Get the number of pages the page directory consumes itself.
xen-front-pgdir-shbuf.c:164: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Calculate the number of grant references needed to share the buffer
xen-front-pgdir-shbuf.c:176: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Calculate the number of grant references needed to share the buffer
xen-front-pgdir-shbuf.c:194: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Unmap the buffer previously mapped with grant references
xen-front-pgdir-shbuf.c:242: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Map the buffer with grant references provided by the backend.
xen-front-pgdir-shbuf.c:324: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Fill page directory with grant references to the pages of the
xen-front-pgdir-shbuf.c:354: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Fill page directory with grant references to the pages of the
xen-front-pgdir-shbuf.c:393: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Grant references to the frontend's buffer pages.
xen-front-pgdir-shbuf.c:422: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Grant all the references needed to share the buffer.
xen-front-pgdir-shbuf.c:470: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Allocate all required structures to mange shared buffer.
xen-front-pgdir-shbuf.c:510: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Allocate a new instance of a shared buffer.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: lore.kernel.org/r/202311060203.yQrpPZhm-lkp@intel.com
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: xen-devel@lists.xenproject.org
Link: https://lore.kernel.org/r/20231106055631.21520-1-rdunlap@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-13 08:14:42 +01:00
Roger Pau Monne
bfa993b355 acpi/processor: sanitize _OSC/_PDC capabilities for Xen dom0
The Processor capability bits notify ACPI of the OS capabilities, and
so ACPI can adjust the return of other Processor methods taking the OS
capabilities into account.

When Linux is running as a Xen dom0, the hypervisor is the entity
in charge of processor power management, and hence Xen needs to make
sure the capabilities reported by _OSC/_PDC match the capabilities of
the driver in Xen.

Introduce a small helper to sanitize the buffer when running as Xen
dom0.

When Xen supports HWP, this serves as the equivalent of commit
a21211672c ("ACPI / processor: Request native thermal interrupt
handling via _OSC") to avoid SMM crashes.  Xen will set bit
ACPI_PROC_CAP_COLLAB_PROC_PERF (bit 12) in the capability bits and the
_OSC/_PDC call will apply it.

[ jandryuk: Mention Xen HWP's need.  Support _OSC & _PDC ]
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20231108212517.72279-1-jandryuk@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-13 07:22:00 +01:00
Juergen Gross
e64e7c74b9 xen/events: avoid using info_for_irq() in xen_send_IPI_one()
xen_send_IPI_one() is being used by cpuhp_report_idle_dead() after
it calls rcu_report_dead(), meaning that any RCU usage by
xen_send_IPI_one() is a bad idea.

Unfortunately xen_send_IPI_one() is using notify_remote_via_irq()
today, which is using irq_get_chip_data() via info_for_irq(). And
irq_get_chip_data() in turn is using a maple-tree lookup requiring
RCU.

Avoid this problem by caching the ipi event channels in another
percpu variable, allowing the use notify_remote_via_evtchn() in
xen_send_IPI_one().

Fixes: 721255b982 ("genirq: Use a maple tree for interrupt descriptor management")
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-13 07:22:00 +01:00
Linus Torvalds
ecae0bd517 Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
 
 - Kemeng Shi has contributed some compation maintenance work in the
   series "Fixes and cleanups to compaction".
 
 - Joel Fernandes has a patchset ("Optimize mremap during mutual
   alignment within PMD") which fixes an obscure issue with mremap()'s
   pagetable handling during a subsequent exec(), based upon an
   implementation which Linus suggested.
 
 - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the
   following patch series:
 
 	mm/damon: misc fixups for documents, comments and its tracepoint
 	mm/damon: add a tracepoint for damos apply target regions
 	mm/damon: provide pseudo-moving sum based access rate
 	mm/damon: implement DAMOS apply intervals
 	mm/damon/core-test: Fix memory leaks in core-test
 	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
 
 - In the series "Do not try to access unaccepted memory" Adrian Hunter
   provides some fixups for the recently-added "unaccepted memory' feature.
   To increase the feature's checking coverage.  "Plug a few gaps where
   RAM is exposed without checking if it is unaccepted memory".
 
 - In the series "cleanups for lockless slab shrink" Qi Zheng has done
   some maintenance work which is preparation for the lockless slab
   shrinking code.
 
 - Qi Zheng has redone the earlier (and reverted) attempt to make slab
   shrinking lockless in the series "use refcount+RCU method to implement
   lockless slab shrink".
 
 - David Hildenbrand contributes some maintenance work for the rmap code
   in the series "Anon rmap cleanups".
 
 - Kefeng Wang does more folio conversions and some maintenance work in
   the migration code.  Series "mm: migrate: more folio conversion and
   unification".
 
 - Matthew Wilcox has fixed an issue in the buffer_head code which was
   causing long stalls under some heavy memory/IO loads.  Some cleanups
   were added on the way.  Series "Add and use bdev_getblk()".
 
 - In the series "Use nth_page() in place of direct struct page
   manipulation" Zi Yan has fixed a potential issue with the direct
   manipulation of hugetlb page frames.
 
 - In the series "mm: hugetlb: Skip initialization of gigantic tail
   struct pages if freed by HVO" has improved our handling of gigantic
   pages in the hugetlb vmmemmep optimizaton code.  This provides
   significant boot time improvements when significant amounts of gigantic
   pages are in use.
 
 - Matthew Wilcox has sent the series "Small hugetlb cleanups" - code
   rationalization and folio conversions in the hugetlb code.
 
 - Yin Fengwei has improved mlock()'s handling of large folios in the
   series "support large folio for mlock"
 
 - In the series "Expose swapcache stat for memcg v1" Liu Shixin has
   added statistics for memcg v1 users which are available (and useful)
   under memcg v2.
 
 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
   prctl so that userspace may direct the kernel to not automatically
   propagate the denial to child processes.  The series is named "MDWE
   without inheritance".
 
 - Kefeng Wang has provided the series "mm: convert numa balancing
   functions to use a folio" which does what it says.
 
 - In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch
   makes is possible for a process to propagate KSM treatment across
   exec().
 
 - Huang Ying has enhanced memory tiering's calculation of memory
   distances.  This is used to permit the dax/kmem driver to use "high
   bandwidth memory" in addition to Optane Data Center Persistent Memory
   Modules (DCPMM).  The series is named "memory tiering: calculate
   abstract distance based on ACPI HMAT"
 
 - In the series "Smart scanning mode for KSM" Stefan Roesch has
   optimized KSM by teaching it to retain and use some historical
   information from previous scans.
 
 - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the
   series "mm: memcg: fix tracking of pending stats updates values".
 
 - In the series "Implement IOCTL to get and optionally clear info about
   PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits
   us to atomically read-then-clear page softdirty state.  This is mainly
   used by CRIU.
 
 - Hugh Dickins contributed the series "shmem,tmpfs: general maintenance"
   - a bunch of relatively minor maintenance tweaks to this code.
 
 - Matthew Wilcox has increased the use of the VMA lock over file-backed
   page faults in the series "Handle more faults under the VMA lock".  Some
   rationalizations of the fault path became possible as a result.
 
 - In the series "mm/rmap: convert page_move_anon_rmap() to
   folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups
   and folio conversions.
 
 - In the series "various improvements to the GUP interface" Lorenzo
   Stoakes has simplified and improved the GUP interface with an eye to
   providing groundwork for future improvements.
 
 - Andrey Konovalov has sent along the series "kasan: assorted fixes and
   improvements" which does those things.
 
 - Some page allocator maintenance work from Kemeng Shi in the series
   "Two minor cleanups to break_down_buddy_pages".
 
 - In thes series "New selftest for mm" Breno Leitao has developed
   another MM self test which tickles a race we had between madvise() and
   page faults.
 
 - In the series "Add folio_end_read" Matthew Wilcox provides cleanups
   and an optimization to the core pagecache code.
 
 - Nhat Pham has added memcg accounting for hugetlb memory in the series
   "hugetlb memcg accounting".
 
 - Cleanups and rationalizations to the pagemap code from Lorenzo
   Stoakes, in the series "Abstract vma_merge() and split_vma()".
 
 - Audra Mitchell has fixed issues in the procfs page_owner code's new
   timestamping feature which was causing some misbehaviours.  In the
   series "Fix page_owner's use of free timestamps".
 
 - Lorenzo Stoakes has fixed the handling of new mappings of sealed files
   in the series "permit write-sealed memfd read-only shared mappings".
 
 - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
   series "Batch hugetlb vmemmap modification operations".
 
 - Some buffer_head folio conversions and cleanups from Matthew Wilcox in
   the series "Finish the create_empty_buffers() transition".
 
 - As a page allocator performance optimization Huang Ying has added
   automatic tuning to the allocator's per-cpu-pages feature, in the series
   "mm: PCP high auto-tuning".
 
 - Roman Gushchin has contributed the patchset "mm: improve performance
   of accounted kernel memory allocations" which improves their performance
   by ~30% as measured by a micro-benchmark.
 
 - folio conversions from Kefeng Wang in the series "mm: convert page
   cpupid functions to folios".
 
 - Some kmemleak fixups in Liu Shixin's series "Some bugfix about
   kmemleak".
 
 - Qi Zheng has improved our handling of memoryless nodes by keeping them
   off the allocation fallback list.  This is done in the series "handle
   memoryless nodes more appropriately".
 
 - khugepaged conversions from Vishal Moola in the series "Some
   khugepaged folio conversions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA
 jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y
 FgeUPAD1oasg6CP+INZvCj34waNxwAc=
 =E+Y4
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Kemeng Shi has contributed some compation maintenance work in the
     series 'Fixes and cleanups to compaction'

   - Joel Fernandes has a patchset ('Optimize mremap during mutual
     alignment within PMD') which fixes an obscure issue with mremap()'s
     pagetable handling during a subsequent exec(), based upon an
     implementation which Linus suggested

   - More DAMON/DAMOS maintenance and feature work from SeongJae Park i
     the following patch series:

	mm/damon: misc fixups for documents, comments and its tracepoint
	mm/damon: add a tracepoint for damos apply target regions
	mm/damon: provide pseudo-moving sum based access rate
	mm/damon: implement DAMOS apply intervals
	mm/damon/core-test: Fix memory leaks in core-test
	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval

   - In the series 'Do not try to access unaccepted memory' Adrian
     Hunter provides some fixups for the recently-added 'unaccepted
     memory' feature. To increase the feature's checking coverage. 'Plug
     a few gaps where RAM is exposed without checking if it is
     unaccepted memory'

   - In the series 'cleanups for lockless slab shrink' Qi Zheng has done
     some maintenance work which is preparation for the lockless slab
     shrinking code

   - Qi Zheng has redone the earlier (and reverted) attempt to make slab
     shrinking lockless in the series 'use refcount+RCU method to
     implement lockless slab shrink'

   - David Hildenbrand contributes some maintenance work for the rmap
     code in the series 'Anon rmap cleanups'

   - Kefeng Wang does more folio conversions and some maintenance work
     in the migration code. Series 'mm: migrate: more folio conversion
     and unification'

   - Matthew Wilcox has fixed an issue in the buffer_head code which was
     causing long stalls under some heavy memory/IO loads. Some cleanups
     were added on the way. Series 'Add and use bdev_getblk()'

   - In the series 'Use nth_page() in place of direct struct page
     manipulation' Zi Yan has fixed a potential issue with the direct
     manipulation of hugetlb page frames

   - In the series 'mm: hugetlb: Skip initialization of gigantic tail
     struct pages if freed by HVO' has improved our handling of gigantic
     pages in the hugetlb vmmemmep optimizaton code. This provides
     significant boot time improvements when significant amounts of
     gigantic pages are in use

   - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
     rationalization and folio conversions in the hugetlb code

   - Yin Fengwei has improved mlock()'s handling of large folios in the
     series 'support large folio for mlock'

   - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
     added statistics for memcg v1 users which are available (and
     useful) under memcg v2

   - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
     prctl so that userspace may direct the kernel to not automatically
     propagate the denial to child processes. The series is named 'MDWE
     without inheritance'

   - Kefeng Wang has provided the series 'mm: convert numa balancing
     functions to use a folio' which does what it says

   - In the series 'mm/ksm: add fork-exec support for prctl' Stefan
     Roesch makes is possible for a process to propagate KSM treatment
     across exec()

   - Huang Ying has enhanced memory tiering's calculation of memory
     distances. This is used to permit the dax/kmem driver to use 'high
     bandwidth memory' in addition to Optane Data Center Persistent
     Memory Modules (DCPMM). The series is named 'memory tiering:
     calculate abstract distance based on ACPI HMAT'

   - In the series 'Smart scanning mode for KSM' Stefan Roesch has
     optimized KSM by teaching it to retain and use some historical
     information from previous scans

   - Yosry Ahmed has fixed some inconsistencies in memcg statistics in
     the series 'mm: memcg: fix tracking of pending stats updates
     values'

   - In the series 'Implement IOCTL to get and optionally clear info
     about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
     which permits us to atomically read-then-clear page softdirty
     state. This is mainly used by CRIU

   - Hugh Dickins contributed the series 'shmem,tmpfs: general
     maintenance', a bunch of relatively minor maintenance tweaks to
     this code

   - Matthew Wilcox has increased the use of the VMA lock over
     file-backed page faults in the series 'Handle more faults under the
     VMA lock'. Some rationalizations of the fault path became possible
     as a result

   - In the series 'mm/rmap: convert page_move_anon_rmap() to
     folio_move_anon_rmap()' David Hildenbrand has implemented some
     cleanups and folio conversions

   - In the series 'various improvements to the GUP interface' Lorenzo
     Stoakes has simplified and improved the GUP interface with an eye
     to providing groundwork for future improvements

   - Andrey Konovalov has sent along the series 'kasan: assorted fixes
     and improvements' which does those things

   - Some page allocator maintenance work from Kemeng Shi in the series
     'Two minor cleanups to break_down_buddy_pages'

   - In thes series 'New selftest for mm' Breno Leitao has developed
     another MM self test which tickles a race we had between madvise()
     and page faults

   - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
     and an optimization to the core pagecache code

   - Nhat Pham has added memcg accounting for hugetlb memory in the
     series 'hugetlb memcg accounting'

   - Cleanups and rationalizations to the pagemap code from Lorenzo
     Stoakes, in the series 'Abstract vma_merge() and split_vma()'

   - Audra Mitchell has fixed issues in the procfs page_owner code's new
     timestamping feature which was causing some misbehaviours. In the
     series 'Fix page_owner's use of free timestamps'

   - Lorenzo Stoakes has fixed the handling of new mappings of sealed
     files in the series 'permit write-sealed memfd read-only shared
     mappings'

   - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
     series 'Batch hugetlb vmemmap modification operations'

   - Some buffer_head folio conversions and cleanups from Matthew Wilcox
     in the series 'Finish the create_empty_buffers() transition'

   - As a page allocator performance optimization Huang Ying has added
     automatic tuning to the allocator's per-cpu-pages feature, in the
     series 'mm: PCP high auto-tuning'

   - Roman Gushchin has contributed the patchset 'mm: improve
     performance of accounted kernel memory allocations' which improves
     their performance by ~30% as measured by a micro-benchmark

   - folio conversions from Kefeng Wang in the series 'mm: convert page
     cpupid functions to folios'

   - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
     kmemleak'

   - Qi Zheng has improved our handling of memoryless nodes by keeping
     them off the allocation fallback list. This is done in the series
     'handle memoryless nodes more appropriately'

   - khugepaged conversions from Vishal Moola in the series 'Some
     khugepaged folio conversions'"

[ bcachefs conflicts with the dynamically allocated shrinkers have been
  resolved as per Stephen Rothwell in

     https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/

  with help from Qi Zheng.

  The clone3 test filtering conflict was half-arsed by yours truly ]

* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
  mm/damon/sysfs: update monitoring target regions for online input commit
  mm/damon/sysfs: remove requested targets when online-commit inputs
  selftests: add a sanity check for zswap
  Documentation: maple_tree: fix word spelling error
  mm/vmalloc: fix the unchecked dereference warning in vread_iter()
  zswap: export compression failure stats
  Documentation: ubsan: drop "the" from article title
  mempolicy: migration attempt to match interleave nodes
  mempolicy: mmap_lock is not needed while migrating folios
  mempolicy: alloc_pages_mpol() for NUMA policy without vma
  mm: add page_rmappable_folio() wrapper
  mempolicy: remove confusing MPOL_MF_LAZY dead code
  mempolicy: mpol_shared_policy_init() without pseudo-vma
  mempolicy trivia: use pgoff_t in shared mempolicy tree
  mempolicy trivia: slightly more consistent naming
  mempolicy trivia: delete those ancient pr_debug()s
  mempolicy: fix migrate_pages(2) syscall return nr_failed
  kernfs: drop shared NUMA mempolicy hooks
  hugetlbfs: drop shared NUMA mempolicy pretence
  mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
  ...
2023-11-02 19:38:47 -10:00
Linus Torvalds
6ed92e559a SCSI misc on 20231102
Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc,
 scsi_debug) plus the usual assorted minor fixes and updates.  The
 major change this time around is a prep patch for rethreading of the
 driver reset handler API not to take a scsi_cmd structure which starts
 to reduce various drivers' dependence on scsi_cmd in error handling.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZUORLiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQ4WAQDDIhzp
 /PiJBBtt0U9ii/lYqRLrOVnN0extKEgEGO+FbwEAssKgs+5Jn/7XCgdpSrx8Co3/
 0cPXrZGxs7tFpFWLZjM=
 =AlRU
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc,
  scsi_debug) plus the usual assorted minor fixes and updates.

  The major change this time around is a prep patch for rethreading of
  the driver reset handler API not to take a scsi_cmd structure which
  starts to reduce various drivers' dependence on scsi_cmd in error
  handling"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (132 commits)
  scsi: ufs: core: Leave space for '\0' in utf8 desc string
  scsi: ufs: core: Conversion to bool not necessary
  scsi: ufs: core: Fix race between force complete and ISR
  scsi: megaraid: Fix up debug message in megaraid_abort_and_reset()
  scsi: aic79xx: Fix up NULL command in ahd_done()
  scsi: message: fusion: Initialize return value in mptfc_bus_reset()
  scsi: mpt3sas: Fix loop logic
  scsi: snic: Remove useless code in snic_dr_clean_pending_req()
  scsi: core: Add comment to target_destroy in scsi_host_template
  scsi: core: Clean up scsi_dev_queue_ready()
  scsi: pmcraid: Add missing scsi_device_put() in pmcraid_eh_target_reset_handler()
  scsi: target: core: Fix kernel-doc comment
  scsi: pmcraid: Fix kernel-doc comment
  scsi: core: Handle depopulation and restoration in progress
  scsi: ufs: core: Add support for parsing OPP
  scsi: ufs: core: Add OPP support for scaling clocks and regulators
  scsi: ufs: dt-bindings: common: Add OPP table
  scsi: scsi_debug: Add param to control sdev's allow_restart
  scsi: scsi_debug: Add debugfs interface to fail target reset
  scsi: scsi_debug: Add new error injection type: Reset LUN failed
  ...
2023-11-02 15:13:50 -10:00
Linus Torvalds
426ee5196d sysctl-6.7-rc1
To help make the move of sysctls out of kernel/sysctl.c not incur a size
 penalty sysctl has been changed to allow us to not require the sentinel, the
 final empty element on the sysctl array. Joel Granados has been doing all this
 work. On the v6.6 kernel we got the major infrastructure changes required to
 support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove
 the sentinel. Both arch and driver changes have been on linux-next for a bit
 less than a month. It is worth re-iterating the value:
 
   - this helps reduce the overall build time size of the kernel and run time
      memory consumed by the kernel by about ~64 bytes per array
   - the extra 64-byte penalty is no longer inncurred now when we move sysctls
     out from kernel/sysctl.c to their own files
 
 For v6.8-rc1 expect removal of all the sentinels and also then the unneeded
 check for procname == NULL.
 
 The last 2 patches are fixes recently merged by Krister Johansen which allow
 us again to use softlockup_panic early on boot. This used to work but the
 alias work broke it. This is useful for folks who want to detect softlockups
 super early rather than wait and spend money on cloud solutions with nothing
 but an eventual hung kernel. Although this hadn't gone through linux-next it's
 also a stable fix, so we might as well roll through the fixes now.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmVCqKsSHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinEgYQAIpkqRL85DBwems19Uk9A27lkctwZ6Fc
 HdslQCObQTsbuKVimZFP4IL2beUfUE0cfLZCXlzp+4nRDOf6vyhyf3w19jPQtI0Q
 YdqwTk9y6G5VjDsb35QK0+UBloY/kZ1H3/LW4uCwjXTuksUGmWW2Qvey35696Scv
 hDMLADqKQmdpYxLUaNi9QyYbEAjYtOai2ezg3+i7hTG168t1k/Ab2BxIFrPVsCR2
 FAiq05L4ugWjNskdsWBjck05JZsx9SK/qcAxpIPoUm4nGiFNHApXE0E0hs3vsnmn
 WIHIbxCQw8ZlUDlmw4S+0YH3NFFzFbWfmW8k2b0f2qZTJm/rU4KiJfcJVknkAUVF
 raFox6XDW0AUQ9L/NOUJ9ip5rup57GcFrMYocdJ3PPAvvmHKOb1D1O741p75RRcc
 9j7zwfIRrzjPUqzhsQS/GFjdJu3lJNmEBK1AcgrVry6WoItrAzJHKPPDC7TwaNmD
 eXpjxMl1sYzzHqtVh4hn+xkUYphj/6gTGMV8zdo+/FopFswgeJW9G8kHtlEWKDPk
 MRIKwACmfetP6f3ngHunBg+BOipbjCANL7JI0nOhVOQoaULxCCPx+IPJ6GfSyiuH
 AbcjH8DGI7fJbUkBFoF0dsRFZ2gH8ds1PYMbWUJ6x3FtuCuv5iIuvQYoaWU6itm7
 6f0KvCogg0fU
 =Qf50
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull sysctl updates from Luis Chamberlain:
 "To help make the move of sysctls out of kernel/sysctl.c not incur a
  size penalty sysctl has been changed to allow us to not require the
  sentinel, the final empty element on the sysctl array. Joel Granados
  has been doing all this work. On the v6.6 kernel we got the major
  infrastructure changes required to support this. For v6.7-rc1 we have
  all arch/ and drivers/ modified to remove the sentinel. Both arch and
  driver changes have been on linux-next for a bit less than a month. It
  is worth re-iterating the value:

   - this helps reduce the overall build time size of the kernel and run
     time memory consumed by the kernel by about ~64 bytes per array

   - the extra 64-byte penalty is no longer inncurred now when we move
     sysctls out from kernel/sysctl.c to their own files

  For v6.8-rc1 expect removal of all the sentinels and also then the
  unneeded check for procname == NULL.

  The last two patches are fixes recently merged by Krister Johansen
  which allow us again to use softlockup_panic early on boot. This used
  to work but the alias work broke it. This is useful for folks who want
  to detect softlockups super early rather than wait and spend money on
  cloud solutions with nothing but an eventual hung kernel. Although
  this hadn't gone through linux-next it's also a stable fix, so we
  might as well roll through the fixes now"

* tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits)
  watchdog: move softlockup_panic back to early_param
  proc: sysctl: prevent aliased sysctls from getting passed to init
  intel drm: Remove now superfluous sentinel element from ctl_table array
  Drivers: hv: Remove now superfluous sentinel element from ctl_table array
  raid: Remove now superfluous sentinel element from ctl_table array
  fw loader: Remove the now superfluous sentinel element from ctl_table array
  sgi-xp: Remove the now superfluous sentinel element from ctl_table array
  vrf: Remove the now superfluous sentinel element from ctl_table array
  char-misc: Remove the now superfluous sentinel element from ctl_table array
  infiniband: Remove the now superfluous sentinel element from ctl_table array
  macintosh: Remove the now superfluous sentinel element from ctl_table array
  parport: Remove the now superfluous sentinel element from ctl_table array
  scsi: Remove now superfluous sentinel element from ctl_table array
  tty: Remove now superfluous sentinel element from ctl_table array
  xen: Remove now superfluous sentinel element from ctl_table array
  hpet: Remove now superfluous sentinel element from ctl_table array
  c-sky: Remove now superfluous sentinel element from ctl_talbe array
  powerpc: Remove now superfluous sentinel element from ctl_table arrays
  riscv: Remove now superfluous sentinel element from ctl_table array
  x86/vdso: Remove now superfluous sentinel element from ctl_table array
  ...
2023-11-01 20:51:41 -10:00
Linus Torvalds
ca995ce438 xen: branch for v6.7-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZT0uPgAKCRCAXGG7T9hj
 vmIPAQDiCkJzskB9tgvic6/l7HnOUxcKcUmhHSpXbZ21sx9S6QEA06HEXHgs14Ax
 +t51DVNMk4bBm5O8I7z650JpoTbN5ww=
 =Pt1M
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - two small cleanup patches

 - a fix for PCI passthrough under Xen

 - a four patch series speeding up virtio under Xen with user space
   backends

* tag 'for-linus-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled
  xen: privcmd: Add support for ioeventfd
  xen: evtchn: Allow shared registration of IRQ handers
  xen: irqfd: Use _IOW instead of the internal _IOC() macro
  xen: Make struct privcmd_irqfd's layout architecture independent
  xen/xenbus: Add __counted_by for struct read_buffer and use struct_size()
  xenbus: fix error exit in xenbus_init()
2023-11-01 10:46:48 -10:00
Linus Torvalds
3cf3fabccb Locking changes in this cycle are:
- Futex improvements:
 
     - Add the 'futex2' syscall ABI, which is an attempt to get away from the
       multiplex syscall and adds a little room for extentions, while lifting
       some limitations.
 
     - Fix futex PI recursive rt_mutex waiter state bug
 
     - Fix inter-process shared futexes on no-MMU systems
 
     - Use folios instead of pages
 
  - Micro-optimizations of locking primitives:
 
     - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
       architectures, to improve lockref code generation.
 
     - Improve the x86-32 lockref_get_not_zero() main loop by adding
       build-time CMPXCHG8B support detection for the relevant lockref code,
       and by better interfacing the CMPXCHG8B assembly code with the compiler.
 
     - Introduce arch_sync_try_cmpxchg() on x86 to improve sync_try_cmpxchg()
       code generation. Convert some sync_cmpxchg() users to sync_try_cmpxchg().
 
     - Micro-optimize rcuref_put_slowpath()
 
  - Locking debuggability improvements:
 
     - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well
 
     - Enforce atomicity of sched_submit_work(), which is de-facto atomic but
       was un-enforced previously.
 
     - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check semantics
 
     - Fix ww_mutex self-tests
 
     - Clean up const-propagation in <linux/seqlock.h> and simplify
       the API-instantiation macros a bit.
 
  - RT locking improvements:
 
     - Provide the rt_mutex_*_schedule() primitives/helpers and use them
       in the rtmutex code to avoid recursion vs. rtlock on the PI state.
 
     - Add nested blocking lockdep asserts to rt_mutex_lock(), rtlock_lock()
       and rwbase_read_lock().
 
  - Plus misc fixes & cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU877IRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g9jw/+N7rxQ78dmFCYh4UWnLCYvuKP0/ivHErG
 493JcB8MupuA2tfJHIkDdr4aM2mNq2E61w69/WlZAQWWD6pdOhwgF5Xf5eoEcJm0
 vsAhWBGLxihXdtevPuMAx0dEpg3AMp2wc6i5PkN831KdPUgCNsrKq9Bfnfef7/G8
 MQTSHjmtba6jxleyxfEa4tE2xe5PJX825nRfkX2e1cf+stkYua+uJFxVxUfxFWGE
 4pBy70D9OC7MsJ44WWOA1gwkVtMMiBTmRPNjlP8Gz2GQ0f3ERHRwYk3jDHOPHZI6
 0GNt7pE3IMXQn2UuDtfkvv9IFTd+U5qD+APnWIn2ntWXqzGLFqOlmovMrobVn7El
 olYDCyweWPG71m1Qblsb1VK2QjRPQVJ9NAEg8RlDHIu2ThxHbMysDVGPVOYnPFq4
 S8QFpmldzbNoPU4rDJyT1fAmoUIrusBHkl+Us3yGfC74iM+fHnDEvaSoMZbzEdY1
 x/Nocj9XgKEgfXdYzrCWFmZ9xXqHkO25/wDL6yKqBdQtvaEalXuHTT6mQcYxrUPm
 Xx1BPan2Jg7p4u2oOFcVtKewUtRH9KBx8qytr5S+JK4PJbrBsixMnr84HLd/3X2V
 ykYkO+367T5MTYv4TnJDE5vdurzUqekKSCFPY3skPujPJfdLj1vsPzYf9iMkCLdo
 hU2f/R+Wpdk=
 =36Ff
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Info Molnar:
 "Futex improvements:

   - Add the 'futex2' syscall ABI, which is an attempt to get away from
     the multiplex syscall and adds a little room for extentions, while
     lifting some limitations.

   - Fix futex PI recursive rt_mutex waiter state bug

   - Fix inter-process shared futexes on no-MMU systems

   - Use folios instead of pages

  Micro-optimizations of locking primitives:

   - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
     architectures, to improve lockref code generation

   - Improve the x86-32 lockref_get_not_zero() main loop by adding
     build-time CMPXCHG8B support detection for the relevant lockref
     code, and by better interfacing the CMPXCHG8B assembly code with
     the compiler

   - Introduce arch_sync_try_cmpxchg() on x86 to improve
     sync_try_cmpxchg() code generation. Convert some sync_cmpxchg()
     users to sync_try_cmpxchg().

   - Micro-optimize rcuref_put_slowpath()

  Locking debuggability improvements:

   - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well

   - Enforce atomicity of sched_submit_work(), which is de-facto atomic
     but was un-enforced previously.

   - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check
     semantics

   - Fix ww_mutex self-tests

   - Clean up const-propagation in <linux/seqlock.h> and simplify the
     API-instantiation macros a bit

  RT locking improvements:

   - Provide the rt_mutex_*_schedule() primitives/helpers and use them
     in the rtmutex code to avoid recursion vs. rtlock on the PI state.

   - Add nested blocking lockdep asserts to rt_mutex_lock(),
     rtlock_lock() and rwbase_read_lock()

  .. plus misc fixes & cleanups"

* tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  futex: Don't include process MM in futex key on no-MMU
  locking/seqlock: Fix grammar in comment
  alpha: Fix up new futex syscall numbers
  locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts
  locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning
  locking/seqlock: Change __seqprop() to return the function pointer
  locking/seqlock: Simplify SEQCOUNT_LOCKNAME()
  locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()
  locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
  locking/atomic/x86: Introduce arch_sync_try_cmpxchg()
  locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback
  locking/seqlock: Fix typo in comment
  futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic()
  locking/local, arch: Rewrite local_add_unless() as a static inline function
  locking/debug: Fix debugfs API return value checks to use IS_ERR()
  locking/ww_mutex/test: Make sure we bail out instead of livelock
  locking/ww_mutex/test: Fix potential workqueue corruption
  locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup
  futex: Add sys_futex_requeue()
  futex: Add flags2 argument to futex_requeue()
  ...
2023-10-30 12:38:48 -10:00
Marek Marczykowski-Górecki
2c269f42d0 xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled
Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
the table is filled. Then it disables INTx just before clearing MASKALL
bit. Currently this approach is rejected by xen-pciback.
According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).

Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
applies to three places:
 - checking currently enabled interrupts type,
 - transition to MSI/MSI-X - where INTx would be implicitly disabled,
 - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
   enabled, as device should consider INTx disabled anyway in that case

Fixes: 5e29500eba ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20231016131348.1734721-1-marmarek@invisiblethingslab.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-10-16 15:20:31 +02:00
Viresh Kumar
f0d7db7b33 xen: privcmd: Add support for ioeventfd
Virtio guests send VIRTIO_MMIO_QUEUE_NOTIFY notification when they need
to notify the backend of an update to the status of the virtqueue. The
backend or another entity, polls the MMIO address for updates to know
when the notification is sent.

It works well if the backend does this polling by itself. But as we move
towards generic backend implementations, we end up implementing this in
a separate user-space program.

Generally, the Virtio backends are implemented to work with the Eventfd
based mechanism. In order to make such backends work with Xen, another
software layer needs to do the polling and send an event via eventfd to
the backend once the notification from guest is received. This results
in an extra context switch.

This is not a new problem in Linux though. It is present with other
hypervisors like KVM, etc. as well. The generic solution implemented in
the kernel for them is to provide an IOCTL call to pass the address to
poll and eventfd, which lets the kernel take care of polling and raise
an event on the eventfd, instead of handling this in user space (which
involves an extra context switch).

This patch adds similar support for xen.

Inspired by existing implementations for KVM, etc..

This also copies ioreq.h header file (only struct ioreq and related
macros) from Xen's source tree (Top commit 5d84f07fe6bf ("xen/pci: drop
remaining uses of bool_t")).

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/b20d83efba6453037d0c099912813c79c81f7714.1697439990.git.viresh.kumar@linaro.org
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-10-16 15:18:33 +02:00
Viresh Kumar
9e90e58c11 xen: evtchn: Allow shared registration of IRQ handers
Currently the handling of events is supported either in the kernel or
userspace, but not both.

In order to support fast delivery of interrupts from the guest to the
backend, we need to handle the Queue notify part of Virtio protocol in
kernel and the rest in userspace.

Update the interrupt handler registration flag to IRQF_SHARED for event
channels, which would allow multiple entities to bind their interrupt
handler for the same event channel port.

Also increment the reference count of irq_info when multiple entities
try to bind event channel to irqchip, so the unbinding happens only
after all the users are gone.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/99b1edfd3147c6b5d22a5139dab5861e767dc34a.1697439990.git.viresh.kumar@linaro.org
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-10-16 15:18:33 +02:00
Viresh Kumar
8dd765a5d7 xen: Make struct privcmd_irqfd's layout architecture independent
Using indirect pointers in an ioctl command argument means that the
layout is architecture specific, in particular we can't use the same one
from 32-bit compat tasks. The general recommendation is to have __u64
members and use u64_to_user_ptr() to access it from the kernel if we are
unable to avoid the pointers altogether.

Fixes: f8941e6c4c ("xen: privcmd: Add support for irqfd")
Reported-by: Arnd Bergmann <arnd@kernel.org>
Closes: https://lore.kernel.org/all/268a2031-63b8-4c7d-b1e5-8ab83ca80b4a@app.fastmail.com/
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/a4ef0d4a68fc858b34a81fd3f9877d9b6898eb77.1697439990.git.viresh.kumar@linaro.org
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-10-16 15:18:33 +02:00
Gustavo A. R. Silva
d3a2b6b48f xen/xenbus: Add __counted_by for struct read_buffer and use struct_size()
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

While there, use struct_size() helper, instead of the open-coded
version, to calculate the size for the allocation of the whole
flexible structure, including of course, the flexible-array member.

This code was found with the help of Coccinelle, and audited and
fixed manually.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Link: https://lore.kernel.org/r/ZSRMosLuJJS5Y/io@work
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-10-16 15:18:33 +02:00
Juergen Gross
44961b81a9 xenbus: fix error exit in xenbus_init()
In case an error occurs in xenbus_init(), xen_store_domain_type should
be set to XS_UNKNOWN.

Fix one instance where this action is missing.

Fixes: 5b3353949e ("xen: add support for initializing xenstore later as HVM domain")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202304200845.w7m4kXZr-lkp@intel.com/
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Link: https://lore.kernel.org/r/20230822091138.4765-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-10-16 15:18:33 +02:00
Mike Christie
194605d45d scsi: target: Have drivers report if they support direct submissions
In some cases, like with multiple LUN targets or where the target has to
respond to transport level requests from the receiving context it can be
better to defer cmd submission to a helper thread. If the backend driver
blocks on something like request/tag allocation it can block the entire
target submission path and other LUs and transport IO on that session.

In other cases like single LUN targets with storage that can support all
the commands that the target can queue, then it's best to submit the cmd
to the backend from the target's cmd receiving context.

Subsequent commits will allow the user to config what they prefer, but
drivers like loop can't directly submit because they can be called from a
context that can't sleep. And, drivers like vhost-scsi can support direct
submission, but need to keep their default behavior of deferring execution
to avoid possible regressions where the backend can block.

Make the drivers tell LIO core if they support direct submissions and their
current default, so we can prevent users from misconfiguring the system and
initialize devices correctly.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20230928020907.5730-2-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 15:53:57 -04:00
Joel Granados
a5b2aeeac1 xen: Remove now superfluous sentinel element from ctl_table array
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel from balloon_table

Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-10-11 12:16:13 -07:00
Uros Bizjak
ad0a2e4c2f locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
Use sync_try_cmpxchg() instead of sync_cmpxchg(*ptr, old, new) == old
in clear_masked_cond(), clear_linked() and
gnttab_end_foreign_access_ref_v1(). x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg
(and related move instruction in front of cmpxchg), improving the
cmpxchg loop in gnttab_end_foreign_access_ref_v1() from:

     174:	eb 0e                	jmp    184 <...>
     176:	89 d0                	mov    %edx,%eax
     178:	f0 66 0f b1 31       	lock cmpxchg %si,(%rcx)
     17d:	66 39 c2             	cmp    %ax,%dx
     180:	74 11                	je     193 <...>
     182:	89 c2                	mov    %eax,%edx
     184:	89 d6                	mov    %edx,%esi
     186:	66 83 e6 18          	and    $0x18,%si
     18a:	74 ea                	je     176 <...>

to:

     614:	89 c1                	mov    %eax,%ecx
     616:	66 83 e1 18          	and    $0x18,%cx
     61a:	75 11                	jne    62d <...>
     61c:	f0 66 0f b1 0a       	lock cmpxchg %cx,(%rdx)
     621:	75 f1                	jne    614 <...>

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
2023-10-09 18:14:34 +02:00
Juergen Gross
87797fad6c xen/events: replace evtchn_rwlock with RCU
In unprivileged Xen guests event handling can cause a deadlock with
Xen console handling. The evtchn_rwlock and the hvc_lock are taken in
opposite sequence in __hvc_poll() and in Xen console IRQ handling.
Normally this is no problem, as the evtchn_rwlock is taken as a reader
in both paths, but as soon as an event channel is being closed, the
lock will be taken as a writer, which will cause read_lock() to block:

CPU0                     CPU1                CPU2
(IRQ handling)           (__hvc_poll())      (closing event channel)

read_lock(evtchn_rwlock)
                         spin_lock(hvc_lock)
                                             write_lock(evtchn_rwlock)
                                                 [blocks]
spin_lock(hvc_lock)
    [blocks]
                        read_lock(evtchn_rwlock)
                            [blocks due to writer waiting,
                             and not in_interrupt()]

This issue can be avoided by replacing evtchn_rwlock with RCU in
xen_free_irq(). Note that RCU is used only to delay freeing of the
irq_info memory. There is no RCU based dereferencing or replacement of
pointers involved.

In order to avoid potential races between removing the irq_info
reference and handling of interrupts, set the irq_info pointer to NULL
only when freeing its memory. The IRQ itself must be freed at that
time, too, as otherwise the same IRQ number could be allocated again
before handling of the old instance would have been finished.

This is XSA-441 / CVE-2023-34324.

Fixes: 54c9de8989 ("xen/events: add a new "late EOI" evtchn framework")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-10-09 09:21:16 +02:00
Qi Zheng
1ec016bfa1 xenbus/backend: dynamically allocate the xen-backend shrinker
Use new APIs to dynamically allocate the xen-backend shrinker.

Link: https://lkml.kernel.org/r/20230911094444.68966-6-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Chandan Babu R <chandan.babu@oracle.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Chuck Lever <cel@kernel.org>
Cc: Coly Li <colyli@suse.de>
Cc: Dai Ngo <Dai.Ngo@oracle.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Marijn Suijten <marijn.suijten@somainline.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Olga Kornievskaia <kolga@netapp.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sean Paul <sean@poorly.run>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Song Liu <song@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04 10:32:23 -07:00
Juergen Gross
37510dd566 xen: simplify evtchn_do_upcall() call maze
There are several functions involved for performing the functionality
of evtchn_do_upcall():

- __xen_evtchn_do_upcall() doing the real work
- xen_hvm_evtchn_do_upcall() just being a wrapper for
  __xen_evtchn_do_upcall(), exposed for external callers
- xen_evtchn_do_upcall() calling __xen_evtchn_do_upcall(), too, but
  without any user

Simplify this maze by:

- removing the unused xen_evtchn_do_upcall()
- removing xen_hvm_evtchn_do_upcall() as the only left caller of
  __xen_evtchn_do_upcall(), while renaming __xen_evtchn_do_upcall() to
  xen_evtchn_do_upcall()

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-09-19 07:04:49 +02:00
Linus Torvalds
6c1b980a7e dma-maping updates for Linux 6.6
- allow dynamic sizing of the swiotlb buffer, to cater for secure
    virtualization workloads that require all I/O to be bounce buffered
    (Petr Tesarik)
  - move a declaration to a header (Arnd Bergmann)
  - check for memory region overlap in dma-contiguous (Binglei Wang)
  - remove the somewhat dangerous runtime swiotlb-xen enablement and
    unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross)
  - per-node CMA improvements (Yajun Deng)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmTuDHkLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOqvhAApMk2/ceTgVH17sXaKE822+xKvgv377O6TlggMeGG
 W4zA0KD69DNz0AfaaCc5U5f7n8Ld/YY1RsvkHW4b3jgw+KRTeQr0jjitBgP5kP2M
 A1+qxdyJpCTwiPt9s2+JFVPeyZ0s52V6OJODKRG3s0ore55R+U09VySKtASON+q3
 GMKfWqQteKC+thg7NkrQ7JUixuo84oICws+rZn4K9ifsX2O0HYW6aMW0feRfZjJH
 r0TgqZc4RdPTSaF22oapR9Ls39+7hp/pBvoLm5sBNA3cl5C3X4VWo9ERMU1jW9h+
 VYQv39NycUspgskWJmpbU06/+ooYqQlwHSR/vdNusmFIvxo4tf6/UX72YO5F8Dar
 ap0wYGauiEwTjSnhVxPTXk3obWyWEsgFAeRnPdTlH2CNmv38QZU2HLb8eU1pcXxX
 j+WI2Ewy9z22uBVYiPOKpdW1jkSfmlmfPp/8SbAdua7I3YQ90rQN6AvU06zAi/cL
 NQTgO81E4jPkygqAVgS/LeYziWAQ73yM7m9ExThtTgqFtHortwhJ4Fd8XKtvtvEb
 viXAZ/WZtQBv/CIKAW98NhgIDP/SPOT8ym6V35WK+kkNFMS6LMSQUfl9GgbHGyFa
 n9icMm7BmbDtT1+AKNafG9En4DtAf9M9QNidAVOyfrsIk6S0gZoZwvIStkA7on8a
 cNY=
 =kVVr
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-maping updates from Christoph Hellwig:

 - allow dynamic sizing of the swiotlb buffer, to cater for secure
   virtualization workloads that require all I/O to be bounce buffered
   (Petr Tesarik)

 - move a declaration to a header (Arnd Bergmann)

 - check for memory region overlap in dma-contiguous (Binglei Wang)

 - remove the somewhat dangerous runtime swiotlb-xen enablement and
   unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross)

 - per-node CMA improvements (Yajun Deng)

* tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: optimize get_max_slots()
  swiotlb: move slot allocation explanation comment where it belongs
  swiotlb: search the software IO TLB only if the device makes use of it
  swiotlb: allocate a new memory pool when existing pools are full
  swiotlb: determine potential physical address limit
  swiotlb: if swiotlb is full, fall back to a transient memory pool
  swiotlb: add a flag whether SWIOTLB is allowed to grow
  swiotlb: separate memory pool data from other allocator data
  swiotlb: add documentation and rename swiotlb_do_find_slots()
  swiotlb: make io_tlb_default_mem local to swiotlb.c
  swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated
  dma-contiguous: check for memory region overlap
  dma-contiguous: support numa CMA for specified node
  dma-contiguous: support per-numa CMA for all architectures
  dma-mapping: move arch_dma_set_mask() declaration to header
  swiotlb: unexport is_swiotlb_active
  x86: always initialize xen-swiotlb when xen-pcifront is enabling
  xen/pci: add flag for PCI passthrough being possible
2023-08-29 20:32:10 -07:00
Viresh Kumar
f8941e6c4c xen: privcmd: Add support for irqfd
Xen provides support for injecting interrupts to the guests via the
HYPERVISOR_dm_op() hypercall. The same is used by the Virtio based
device backend implementations, in an inefficient manner currently.

Generally, the Virtio backends are implemented to work with the Eventfd
based mechanism. In order to make such backends work with Xen, another
software layer needs to poll the Eventfds and raise an interrupt to the
guest using the Xen based mechanism. This results in an extra context
switch.

This is not a new problem in Linux though. It is present with other
hypervisors like KVM, etc. as well. The generic solution implemented in
the kernel for them is to provide an IOCTL call to pass the interrupt
details and eventfd, which lets the kernel take care of polling the
eventfd and raising of the interrupt, instead of handling this in user
space (which involves an extra context switch).

This patch adds support to inject a specific interrupt to guest using
the eventfd mechanism, by preventing the extra context switch.

Inspired by existing implementations for KVM, etc..

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/8e724ac1f50c2bc1eb8da9b3ff6166f1372570aa.1692697321.git.viresh.kumar@linaro.org
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-22 12:12:50 +02:00
Petr Pavlu
442466e04f xen/xenbus: Avoid a lockdep warning when adding a watch
The following lockdep warning appears during boot on a Xen dom0 system:

[   96.388794] ======================================================
[   96.388797] WARNING: possible circular locking dependency detected
[   96.388799] 6.4.0-rc5-default+ #8 Tainted: G            EL
[   96.388803] ------------------------------------------------------
[   96.388804] xenconsoled/1330 is trying to acquire lock:
[   96.388808] ffffffff82acdd10 (xs_watch_rwsem){++++}-{3:3}, at: register_xenbus_watch+0x45/0x140
[   96.388847]
               but task is already holding lock:
[   96.388849] ffff888100c92068 (&u->msgbuffer_mutex){+.+.}-{3:3}, at: xenbus_file_write+0x2c/0x600
[   96.388862]
               which lock already depends on the new lock.

[   96.388864]
               the existing dependency chain (in reverse order) is:
[   96.388866]
               -> #2 (&u->msgbuffer_mutex){+.+.}-{3:3}:
[   96.388874]        __mutex_lock+0x85/0xb30
[   96.388885]        xenbus_dev_queue_reply+0x48/0x2b0
[   96.388890]        xenbus_thread+0x1d7/0x950
[   96.388897]        kthread+0xe7/0x120
[   96.388905]        ret_from_fork+0x2c/0x50
[   96.388914]
               -> #1 (xs_response_mutex){+.+.}-{3:3}:
[   96.388923]        __mutex_lock+0x85/0xb30
[   96.388930]        xenbus_backend_ioctl+0x56/0x1c0
[   96.388935]        __x64_sys_ioctl+0x90/0xd0
[   96.388942]        do_syscall_64+0x5c/0x90
[   96.388950]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   96.388957]
               -> #0 (xs_watch_rwsem){++++}-{3:3}:
[   96.388965]        __lock_acquire+0x1538/0x2260
[   96.388972]        lock_acquire+0xc6/0x2b0
[   96.388976]        down_read+0x2d/0x160
[   96.388983]        register_xenbus_watch+0x45/0x140
[   96.388990]        xenbus_file_write+0x53d/0x600
[   96.388994]        vfs_write+0xe4/0x490
[   96.389003]        ksys_write+0xb8/0xf0
[   96.389011]        do_syscall_64+0x5c/0x90
[   96.389017]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   96.389023]
               other info that might help us debug this:

[   96.389025] Chain exists of:
                 xs_watch_rwsem --> xs_response_mutex --> &u->msgbuffer_mutex

[   96.413429]  Possible unsafe locking scenario:

[   96.413430]        CPU0                    CPU1
[   96.413430]        ----                    ----
[   96.413431]   lock(&u->msgbuffer_mutex);
[   96.413432]                                lock(xs_response_mutex);
[   96.413433]                                lock(&u->msgbuffer_mutex);
[   96.413434]   rlock(xs_watch_rwsem);
[   96.413436]
                *** DEADLOCK ***

[   96.413436] 1 lock held by xenconsoled/1330:
[   96.413438]  #0: ffff888100c92068 (&u->msgbuffer_mutex){+.+.}-{3:3}, at: xenbus_file_write+0x2c/0x600
[   96.413446]

An ioctl call IOCTL_XENBUS_BACKEND_SETUP (record #1 in the report)
results in calling xenbus_alloc() -> xs_suspend() which introduces
ordering xs_watch_rwsem --> xs_response_mutex. The xenbus_thread()
operation (record #2) creates xs_response_mutex --> &u->msgbuffer_mutex.
An XS_WATCH write to the xenbus file then results in a complain about
the opposite lock order &u->msgbuffer_mutex --> xs_watch_rwsem.

The dependency xs_watch_rwsem --> xs_response_mutex is spurious. Avoid
it and the warning by changing the ordering in xs_suspend(), first
acquire xs_response_mutex and then xs_watch_rwsem. Reverse also the
unlocking order in xs_suspend_cancel() for consistency, but keep
xs_resume() as is because it needs to have xs_watch_rwsem unlocked only
after exiting xs suspend and re-adding all watches.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230607123624.15739-1-petr.pavlu@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-22 08:04:59 +02:00
Yang Li
187b4c0d34 xen: Fix one kernel-doc comment
Use colon to separate parameter name from their specific meaning.
silence the warning:

drivers/xen/grant-table.c:1051: warning: Function parameter or member 'nr_pages' not described in 'gnttab_free_pages'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6030
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230731030037.123946-1-yang.lee@linux.alibaba.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21 15:58:57 +02:00
Li Zetao
035a69586f xen: xenbus: Use helper function IS_ERR_OR_NULL()
Use IS_ERR_OR_NULL() to detect an error pointer or a null pointer
open-coding to simplify the code.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230817014736.3094289-1-lizetao1@huawei.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21 09:55:11 +02:00
Ruan Jinjie
71281ec9c8 xen: Switch to use kmemdup() helper
Use kmemdup() helper instead of open-coding to
simplify the code.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Chen Jiahao <chenjiahao16@huawei.com>
Link: https://lore.kernel.org/r/20230815092434.1206386-1-ruanjinjie@huawei.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21 09:54:05 +02:00
Yue Haibing
3e0d473dcb xen-pciback: Remove unused function declarations
Commit a92336a117 ("xen/pciback: Drop two backends, squash and cleanup some code.")
declared but never implemented these functions.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230808150912.43416-1-yuehaibing@huawei.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21 09:53:22 +02:00
Petr Tesarik
05ee774122 swiotlb: make io_tlb_default_mem local to swiotlb.c
SWIOTLB implementation details should not be exposed to the rest of the
kernel. This will allow to make changes to the implementation without
modifying non-swiotlb code.

To avoid breaking existing users, provide helper functions for the few
required fields.

As a bonus, using a helper function to initialize struct device allows to
get rid of an #ifdef in driver core.

Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-08-01 18:02:09 +02:00
Demi Marie Obenour
c04e989484 xen: speed up grant-table reclaim
When a grant entry is still in use by the remote domain, Linux must put
it on a deferred list.  Normally, this list is very short, because
the PV network and block protocols expect the backend to unmap the grant
first.  However, Qubes OS's GUI protocol is subject to the constraints
of the X Window System, and as such winds up with the frontend unmapping
the window first.  As a result, the list can grow very large, resulting
in a massive memory leak and eventual VM freeze.

To partially solve this problem, make the number of entries that the VM
will attempt to free at each iteration tunable.  The default is still
10, but it can be overridden via a module parameter.

This is Cc: stable because (when combined with appropriate userspace
changes) it fixes a severe performance and stability problem for Qubes
OS users.

Cc: stable@vger.kernel.org
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230726165354.1252-1-demi@invisiblethingslab.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-07-27 07:53:12 +02:00
Rahul Singh
58f6259b7a xen/evtchn: Introduce new IOCTL to bind static evtchn
Xen 4.17 supports the creation of static evtchns. To allow user space
application to bind static evtchns introduce new ioctl
"IOCTL_EVTCHN_BIND_STATIC". Existing IOCTL doing more than binding
that’s why we need to introduce the new IOCTL to only bind the static
event channels.

Static evtchns to be available for use during the lifetime of the
guest. When the application exits, __unbind_from_irq() ends up being
called from release() file operations because of that static evtchns
are getting closed. To avoid closing the static event channel, add the
new bool variable "is_static" in "struct irq_info" to mark the event
channel static when creating the event channel to avoid closing the
static evtchn.

Also, take this opportunity to remove the open-coded version of the
evtchn close in drivers/xen/evtchn.c file and use xen_evtchn_close().

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/ae7329bf1713f83e4aad4f3fa0f316258c40a3e9.1689677042.git.rahul.singh@arm.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-07-26 08:42:34 +02:00
Stefano Stabellini
0d8f7cc805 xenbus: check xen_domain in xenbus_probe_initcall
The same way we already do in xenbus_init.
Fixes the following warning:

[  352.175563] Trying to free already-free IRQ 0
[  352.177355] WARNING: CPU: 1 PID: 88 at kernel/irq/manage.c:1893 free_irq+0xbf/0x350
[...]
[  352.213951] Call Trace:
[  352.214390]  <TASK>
[  352.214717]  ? __warn+0x81/0x170
[  352.215436]  ? free_irq+0xbf/0x350
[  352.215906]  ? report_bug+0x10b/0x200
[  352.216408]  ? prb_read_valid+0x17/0x20
[  352.216926]  ? handle_bug+0x44/0x80
[  352.217409]  ? exc_invalid_op+0x13/0x60
[  352.217932]  ? asm_exc_invalid_op+0x16/0x20
[  352.218497]  ? free_irq+0xbf/0x350
[  352.218979]  ? __pfx_xenbus_probe_thread+0x10/0x10
[  352.219600]  xenbus_probe+0x7a/0x80
[  352.221030]  xenbus_probe_thread+0x76/0xc0

Fixes: 5b3353949e ("xen: add support for initializing xenstore later as HVM domain")
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2307211609140.3118466@ubuntu-linux-20-04-desktop
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-07-25 17:36:36 +02:00
Linus Torvalds
1599932894 xen: branch for v6.5-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZK/pZgAKCRCAXGG7T9hj
 vmQlAQD/xi8BUlCe0a7l6kf7+nMkOWmvpVIrmdxrqQ1Wj4c9FAEA0FuI+XXz2sow
 ov+il7z3UnViGsieeSHTW+Gxdn6Blgc=
 =LzAo
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - a cleanup of the Xen related ELF-notes

 - a fix for virtio handling in Xen dom0 when running Xen in a VM

* tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent
  x86/Xen: tidy xen-head.S
2023-07-13 13:39:36 -07:00
Petr Pavlu
21a235bce1 xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent
When attempting to run Xen on a QEMU/KVM virtual machine with virtio
devices (all x86_64), function xen_dt_get_node() crashes on accessing
bus->bridge->parent->of_node because a bridge of the PCI root bus has no
parent set:

[    1.694192][    T1] BUG: kernel NULL pointer dereference, address: 0000000000000288
[    1.695688][    T1] #PF: supervisor read access in kernel mode
[    1.696297][    T1] #PF: error_code(0x0000) - not-present page
[    1.696297][    T1] PGD 0 P4D 0
[    1.696297][    T1] Oops: 0000 [#1] PREEMPT SMP NOPTI
[    1.696297][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.7-1-default #1 openSUSE Tumbleweed a577eae57964bb7e83477b5a5645a1781df990f0
[    1.696297][    T1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[    1.696297][    T1] RIP: e030:xen_virtio_restricted_mem_acc+0xd9/0x1c0
[    1.696297][    T1] Code: 45 0c 83 e8 c9 a3 ea ff 31 c0 eb d7 48 8b 87 40 ff ff ff 48 89 c2 48 8b 40 10 48 85 c0 75 f4 48 8b 82 10 01 00 00 48 8b 40 40 <48> 83 b8 88 02 00 00 00 0f 84 45 ff ff ff 66 90 31 c0 eb a5 48 89
[    1.696297][    T1] RSP: e02b:ffffc90040013cc8 EFLAGS: 00010246
[    1.696297][    T1] RAX: 0000000000000000 RBX: ffff888006c75000 RCX: 0000000000000029
[    1.696297][    T1] RDX: ffff888005ed1000 RSI: ffffc900400f100c RDI: ffff888005ee30d0
[    1.696297][    T1] RBP: ffff888006c75010 R08: 0000000000000001 R09: 0000000330000006
[    1.696297][    T1] R10: ffff888005850028 R11: 0000000000000002 R12: ffffffff830439a0
[    1.696297][    T1] R13: 0000000000000000 R14: ffff888005657900 R15: ffff888006e3e1e8
[    1.696297][    T1] FS:  0000000000000000(0000) GS:ffff88804a000000(0000) knlGS:0000000000000000
[    1.696297][    T1] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.696297][    T1] CR2: 0000000000000288 CR3: 0000000002e36000 CR4: 0000000000050660
[    1.696297][    T1] Call Trace:
[    1.696297][    T1]  <TASK>
[    1.696297][    T1]  virtio_features_ok+0x1b/0xd0
[    1.696297][    T1]  virtio_dev_probe+0x19c/0x270
[    1.696297][    T1]  really_probe+0x19b/0x3e0
[    1.696297][    T1]  __driver_probe_device+0x78/0x160
[    1.696297][    T1]  driver_probe_device+0x1f/0x90
[    1.696297][    T1]  __driver_attach+0xd2/0x1c0
[    1.696297][    T1]  bus_for_each_dev+0x74/0xc0
[    1.696297][    T1]  bus_add_driver+0x116/0x220
[    1.696297][    T1]  driver_register+0x59/0x100
[    1.696297][    T1]  virtio_console_init+0x7f/0x110
[    1.696297][    T1]  do_one_initcall+0x47/0x220
[    1.696297][    T1]  kernel_init_freeable+0x328/0x480
[    1.696297][    T1]  kernel_init+0x1a/0x1c0
[    1.696297][    T1]  ret_from_fork+0x29/0x50
[    1.696297][    T1]  </TASK>
[    1.696297][    T1] Modules linked in:
[    1.696297][    T1] CR2: 0000000000000288
[    1.696297][    T1] ---[ end trace 0000000000000000 ]---

The PCI root bus is in this case created from ACPI description via
acpi_pci_root_add() -> pci_acpi_scan_root() -> acpi_pci_root_create() ->
pci_create_root_bus() where the last function is called with
parent=NULL. It indicates that no parent is present and then
bus->bridge->parent is NULL too.

Fix the problem by checking bus->bridge->parent in xen_dt_get_node() for
NULL first.

Fixes: ef8ae384b4 ("xen/virtio: Handle PCI devices which Host controller is described in DT")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/20230621131214.9398-2-petr.pavlu@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-07-04 08:18:21 +02:00
Linus Torvalds
6e17c6de3d - Yosry Ahmed brought back some cgroup v1 stats in OOM logs.
- Yosry has also eliminated cgroup's atomic rstat flushing.
 
 - Nhat Pham adds the new cachestat() syscall.  It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability.
 
 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning.
 
 - Lorenzo Stoakes has done some maintanance work on the get_user_pages()
   interface.
 
 - Liam Howlett continues with cleanups and maintenance work to the maple
   tree code.  Peng Zhang also does some work on maple tree.
 
 - Johannes Weiner has done some cleanup work on the compaction code.
 
 - David Hildenbrand has contributed additional selftests for
   get_user_pages().
 
 - Thomas Gleixner has contributed some maintenance and optimization work
   for the vmalloc code.
 
 - Baolin Wang has provided some compaction cleanups,
 
 - SeongJae Park continues maintenance work on the DAMON code.
 
 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting.
 
 - Christoph Hellwig has some cleanups for the filemap/directio code.
 
 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the provided
   APIs rather than open-coding accesses.
 
 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings.
 
 - John Hubbard has a series of fixes to the MM selftesting code.
 
 - ZhangPeng continues the folio conversion campaign.
 
 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock.
 
 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from
   128 to 8.
 
 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management.
 
 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code.
 
 - Vishal Moola also has done some folio conversion work.
 
 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJejewAKCRDdBJ7gKXxA
 joggAPwKMfT9lvDBEUnJagY7dbDPky1cSYZdJKxxM2cApGa42gEA6Cl8HRAWqSOh
 J0qXCzqaaN8+BuEyLGDVPaXur9KirwY=
 =B7yQ
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:

 - Yosry Ahmed brought back some cgroup v1 stats in OOM logs

 - Yosry has also eliminated cgroup's atomic rstat flushing

 - Nhat Pham adds the new cachestat() syscall. It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability

 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning

 - Lorenzo Stoakes has done some maintanance work on the
   get_user_pages() interface

 - Liam Howlett continues with cleanups and maintenance work to the
   maple tree code. Peng Zhang also does some work on maple tree

 - Johannes Weiner has done some cleanup work on the compaction code

 - David Hildenbrand has contributed additional selftests for
   get_user_pages()

 - Thomas Gleixner has contributed some maintenance and optimization
   work for the vmalloc code

 - Baolin Wang has provided some compaction cleanups,

 - SeongJae Park continues maintenance work on the DAMON code

 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting

 - Christoph Hellwig has some cleanups for the filemap/directio code

 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the
   provided APIs rather than open-coding accesses

 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings

 - John Hubbard has a series of fixes to the MM selftesting code

 - ZhangPeng continues the folio conversion campaign

 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock

 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
   from 128 to 8

 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management

 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code

 - Vishal Moola also has done some folio conversion work

 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch

* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
  mm/hugetlb: remove hugetlb_set_page_subpool()
  mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
  hugetlb: revert use of page_cache_next_miss()
  Revert "page cache: fix page_cache_next/prev_miss off by one"
  mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
  mm: memcg: rename and document global_reclaim()
  mm: kill [add|del]_page_to_lru_list()
  mm: compaction: convert to use a folio in isolate_migratepages_block()
  mm: zswap: fix double invalidate with exclusive loads
  mm: remove unnecessary pagevec includes
  mm: remove references to pagevec
  mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
  mm: remove struct pagevec
  net: convert sunrpc from pagevec to folio_batch
  i915: convert i915_gpu_error to use a folio_batch
  pagevec: rename fbatch_count()
  mm: remove check_move_unevictable_pages()
  drm: convert drm_gem_put_pages() to use a folio_batch
  i915: convert shmem_sg_free_table() to use a folio_batch
  scatterlist: add sg_set_folio()
  ...
2023-06-28 10:28:11 -07:00
Linus Torvalds
72dc6db7e3 workqueue: Ordered workqueue creation cleanups
For historical reasons, unbound workqueues with max concurrency limit of 1
 are considered ordered, even though the concurrency limit hasn't been
 system-wide for a long time. This creates ambiguity around whether ordered
 execution is actually required for correctness, which was actually confusing
 for e.g. btrfs (btrfs updates are being routed through the btrfs tree).
 
 There aren't that many users in the tree which use the combination and there
 are pending improvements to unbound workqueue affinity handling which will
 make inadvertent use of ordered workqueue a bigger loss. This pull request
 clarifies the situation for most of them by updating the ones which require
 ordered execution to use alloc_ordered_workqueue().
 
 There are some conversions being routed through subsystem-specific trees and
 likely a few stragglers. Once they're all converted, workqueue can trigger a
 warning on unbound + @max_active==1 usages and eventually drop the implicit
 ordered behavior.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoKnA4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGc5SAQDOtjML7Cx9AYzbY5+nYc0wTebRRTXGeOu7A3Xy
 j50rVgEAjHgvHLIdmeYmVhCeHOSN4q7Wn5AOwaIqZalOhfLyKQk=
 =hs79
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull ordered workqueue creation updates from Tejun Heo:
 "For historical reasons, unbound workqueues with max concurrency limit
  of 1 are considered ordered, even though the concurrency limit hasn't
  been system-wide for a long time.

  This creates ambiguity around whether ordered execution is actually
  required for correctness, which was actually confusing for e.g. btrfs
  (btrfs updates are being routed through the btrfs tree).

  There aren't that many users in the tree which use the combination and
  there are pending improvements to unbound workqueue affinity handling
  which will make inadvertent use of ordered workqueue a bigger loss.

  This clarifies the situation for most of them by updating the ones
  which require ordered execution to use alloc_ordered_workqueue().

  There are some conversions being routed through subsystem-specific
  trees and likely a few stragglers. Once they're all converted,
  workqueue can trigger a warning on unbound + @max_active==1 usages and
  eventually drop the implicit ordered behavior"

* tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  rxrpc: Use alloc_ordered_workqueue() to create ordered workqueues
  net: qrtr: Use alloc_ordered_workqueue() to create ordered workqueues
  net: wwan: t7xx: Use alloc_ordered_workqueue() to create ordered workqueues
  dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues
  media: amphion: Use alloc_ordered_workqueue() to create ordered workqueues
  scsi: NCR5380: Use default @max_active for hostdata->work_q
  media: coda: Use alloc_ordered_workqueue() to create ordered workqueues
  crypto: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues
  wifi: ath10/11/12k: Use alloc_ordered_workqueue() to create ordered workqueues
  wifi: mwifiex: Use default @max_active for workqueues
  wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq
  xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues
  virt: acrn: Use alloc_ordered_workqueue() to create ordered workqueues
  net: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues
  net: thunderx: Use alloc_ordered_workqueue() to create ordered workqueues
  greybus: Use alloc_ordered_workqueue() to create ordered workqueues
  powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues
2023-06-27 16:46:06 -07:00
Ryan Roberts
c33c794828 mm: ptep_get() conversion
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper.  This means that by default, the accesses change from a
C dereference to a READ_ONCE().  This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.

But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte.  Arch code
is deliberately not converted, as the arch code knows best.  It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.

Conversion was done using Coccinelle:

----

// $ make coccicheck \
//          COCCI=ptepget.cocci \
//          SPFLAGS="--include-headers" \
//          MODE=patch

virtual patch

@ depends on patch @
pte_t *v;
@@

- *v
+ ptep_get(v)

----

Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so.  This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.

Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot.  The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get().  HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined.  Fix by continuing to do a direct dereference
when MMU=n.  This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.

Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:25 -07:00
Linus Torvalds
4e893b5aa4 xen: branch for v6.4-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZHGVRAAKCRCAXGG7T9hj
 vqtJAQDizKasLE7tSnfs/FrZ/4xPaDLe3bQifMx2C1dtYCjRcAD/ciZSa1L0WzZP
 dpEZnlYRzsR3bwLktQEMQFOvlbh1SwE=
 =K860
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - a double free fix in the Xen pvcalls backend driver

 - a fix for a regression causing the MSI related sysfs entries to not
   being created in Xen PV guests

 - a fix in the Xen blkfront driver for handling insane input data
   better

* tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/pci/xen: populate MSI sysfs entries
  xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
  xen/blkfront: Only check REQ_FUA for writes
2023-05-27 09:42:56 -07:00
Dan Carpenter
8fafac202d xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
In the pvcalls_new_active_socket() function, most error paths call
pvcalls_back_release_active(fedata->dev, fedata, map) which calls
sock_release() on "sock".  The bug is that the caller also frees sock.

Fix this by making every error path in pvcalls_new_active_socket()
release the sock, and don't free it in the caller.

Fixes: 5db4d286a8 ("xen/pvcalls: implement connect command")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/e5f98dc2-0305-491f-a860-71bbd1398a2f@kili.mountain
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-05-24 17:25:43 +02:00
Tejun Heo
715557b02c xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues
BACKGROUND
==========

When multiple work items are queued to a workqueue, their execution order
doesn't match the queueing order. They may get executed in any order and
simultaneously. When fully serialized execution - one by one in the queueing
order - is needed, an ordered workqueue should be used which can be created
with alloc_ordered_workqueue().

However, alloc_ordered_workqueue() was a later addition. Before it, an
ordered workqueue could be obtained by creating an UNBOUND workqueue with
@max_active==1. This originally was an implementation side-effect which was
broken by 4c16bd327c ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered"). Because there were users that depended on the ordered execution,
5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
made workqueue allocation path to implicitly promote UNBOUND workqueues w/
@max_active==1 to ordered workqueues.

While this has worked okay, overloading the UNBOUND allocation interface
this way creates other issues. It's difficult to tell whether a given
workqueue actually needs to be ordered and users that legitimately want a
min concurrency level wq unexpectedly gets an ordered one instead. With
planned UNBOUND workqueue updates to improve execution locality and more
prevalence of chiplet designs which can benefit from such improvements, this
isn't a state we wanna be in forever.

This patch series audits all callsites that create an UNBOUND workqueue w/
@max_active==1 and converts them to alloc_ordered_workqueue() as necessary.

WHAT TO LOOK FOR
================

The conversions are from

  alloc_workqueue(WQ_UNBOUND | flags, 1, args..)

to 

  alloc_ordered_workqueue(flags, args...)

which don't cause any functional changes. If you know that fully ordered
execution is not ncessary, please let me know. I'll drop the conversion and
instead add a comment noting the fact to reduce confusion while conversion
is in progress.

If you aren't fully sure, it's completely fine to let the conversion
through. The behavior will stay exactly the same and we can always
reconsider later.

As there are follow-up workqueue core changes, I'd really appreciate if the
patch can be routed through the workqueue tree w/ your acks. Thanks.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: xen-devel@lists.xenproject.org
2023-05-08 15:33:12 -10:00
Linus Torvalds
35fab9271b xen: branch for v6.4-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZEolJQAKCRCAXGG7T9hj
 vuVMAP9B3WLzszen3/XCM2E6sZurtmD+YPkUrbES2AsEE1PH3gEA73ZxM1C+gvKS
 5be7Dksgeyqyqrwhb9/VOyHU3pmyrAw=
 =zirQ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - some cleanups in the Xen blkback driver

 - fix potential sleeps under lock in various Xen drivers

* tag 'for-linus-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/blkback: move blkif_get_x86_*_req() into blkback.c
  xen/blkback: simplify free_persistent_gnts() interface
  xen/blkback: remove stale prototype
  xen/blkback: fix white space code style issues
  xen/pvcalls: don't call bind_evtchn_to_irqhandler() under lock
  xen/scsiback: don't call scsiback_free_translation_entry() under lock
  xen/pciback: don't call pcistub_device_put() under lock
2023-04-27 17:27:06 -07:00
Linus Torvalds
888d3c9f7f sysctl-6.4-rc1
This pull request goes with only a few sysctl moves from the
 kernel/sysctl.c file, the rest of the work has been put towards
 deprecating two API calls which incur recursion and prevent us
 from simplifying the registration process / saving memory per
 move. Most of the changes have been soaking on linux-next since
 v6.3-rc3.
 
 I've slowed down the kernel/sysctl.c moves due to Matthew Wilcox's
 feedback that we should see if we could *save* memory with these
 moves instead of incurring more memory. We currently incur more
 memory since when we move a syctl from kernel/sysclt.c out to its
 own file we end up having to add a new empty sysctl used to register
 it. To achieve saving memory we want to allow syctls to be passed
 without requiring the end element being empty, and just have our
 registration process rely on ARRAY_SIZE(). Without this, supporting
 both styles of sysctls would make the sysctl registration pretty
 brittle, hard to read and maintain as can be seen from Meng Tang's
 efforts to do just this [0]. Fortunately, in order to use ARRAY_SIZE()
 for all sysctl registrations also implies doing the work to deprecate
 two API calls which use recursion in order to support sysctl
 declarations with subdirectories.
 
 And so during this development cycle quite a bit of effort went into
 this deprecation effort. I've annotated the following two APIs are
 deprecated and in few kernel releases we should be good to remove them:
 
   * register_sysctl_table()
   * register_sysctl_paths()
 
 During this merge window we should be able to deprecate and unexport
 register_sysctl_paths(), we can probably do that towards the end
 of this merge window.
 
 Deprecating register_sysctl_table() will take a bit more time but
 this pull request goes with a few example of how to do this.
 
 As it turns out each of the conversions to move away from either of
 these two API calls *also* saves memory. And so long term, all these
 changes *will* prove to have saved a bit of memory on boot.
 
 The way I see it then is if remove a user of one deprecated call, it
 gives us enough savings to move one kernel/sysctl.c out from the
 generic arrays as we end up with about the same amount of bytes.
 
 Since deprecating register_sysctl_table() and register_sysctl_paths()
 does not require maintainer coordination except the final unexport
 you'll see quite a bit of these changes from other pull requests, I've
 just kept the stragglers after rc3.
 
 Most of these changes have been soaking on linux-next since around rc3.
 
 [0] https://lkml.kernel.org/r/ZAD+cpbrqlc5vmry@bombadil.infradead.org
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmRHAjQSHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinTzgQAI/uKHKi0VlUR1l2Psl0XbseUVueuyj3
 ZDxSJpbVUmsoDf2MlLjzB8mYE3ricnNTDbLr7qOyA6pXdM1N0mY5LQmRVRu8/ffd
 2T1hQ5pl7YnJdWP5dPhcF9Y+jnu1tjX1MW5DS4fzllwK7FnD86HuIruGq52RAPS/
 /FH+BD9eodLWWXk6A/o2GFqoWxPKQI0GLxEYWa7Hg7yt8E/3PQL9QsRzn8i6U+HW
 BrN/+G3YD1VCCzXu0UAeXnm+i1Z7CdvqNdZuSkvE3DObiZ5WpOS+/i7FrDB7zdiu
 zAbHaifHnDPtcK3w2ZodbLAAwEWD/mG4iwIjE2kgIMVYxBv7TFDBRREXAWYAevIT
 UUuZnWDQsGaWdjywrebaUycEfd6dytKyan0fTXgMFkcoWRjejhitfdM2iZDdQROg
 q453p4HqOw4vTrhy4ov4zOX7J3EFiBzpZdl+SmLqcXk+jbLVb/Q9snUWz1AFtHBl
 gHoP5bS82uVktGG3MsObjgTzYYMQjO9YGIrVuW1VP9uWs8WaoWx6M9FQJIIhtwE+
 h6wG2s7CjuFWnS0/IxWmDOn91QyUn1w7ohiz9TuvYj/5GLSBpBDGCJHsNB5T2WS1
 qbQRaZ2Kg3j9TeyWfXxdlxBx7bt3ni+J/IXDY0zom2sTpGHKl8D2g5AzmEXJDTpl
 kd7Z3gsmwhDh
 =0U0W
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull sysctl updates from Luis Chamberlain:
 "This only does a few sysctl moves from the kernel/sysctl.c file, the
  rest of the work has been put towards deprecating two API calls which
  incur recursion and prevent us from simplifying the registration
  process / saving memory per move. Most of the changes have been
  soaking on linux-next since v6.3-rc3.

  I've slowed down the kernel/sysctl.c moves due to Matthew Wilcox's
  feedback that we should see if we could *save* memory with these moves
  instead of incurring more memory. We currently incur more memory since
  when we move a syctl from kernel/sysclt.c out to its own file we end
  up having to add a new empty sysctl used to register it. To achieve
  saving memory we want to allow syctls to be passed without requiring
  the end element being empty, and just have our registration process
  rely on ARRAY_SIZE(). Without this, supporting both styles of sysctls
  would make the sysctl registration pretty brittle, hard to read and
  maintain as can be seen from Meng Tang's efforts to do just this [0].
  Fortunately, in order to use ARRAY_SIZE() for all sysctl registrations
  also implies doing the work to deprecate two API calls which use
  recursion in order to support sysctl declarations with subdirectories.

  And so during this development cycle quite a bit of effort went into
  this deprecation effort. I've annotated the following two APIs are
  deprecated and in few kernel releases we should be good to remove
  them:

   - register_sysctl_table()
   - register_sysctl_paths()

  During this merge window we should be able to deprecate and unexport
  register_sysctl_paths(), we can probably do that towards the end of
  this merge window.

  Deprecating register_sysctl_table() will take a bit more time but this
  pull request goes with a few example of how to do this.

  As it turns out each of the conversions to move away from either of
  these two API calls *also* saves memory. And so long term, all these
  changes *will* prove to have saved a bit of memory on boot.

  The way I see it then is if remove a user of one deprecated call, it
  gives us enough savings to move one kernel/sysctl.c out from the
  generic arrays as we end up with about the same amount of bytes.

  Since deprecating register_sysctl_table() and register_sysctl_paths()
  does not require maintainer coordination except the final unexport
  you'll see quite a bit of these changes from other pull requests, I've
  just kept the stragglers after rc3"

Link: https://lkml.kernel.org/r/ZAD+cpbrqlc5vmry@bombadil.infradead.org [0]

* tag 'sysctl-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (29 commits)
  fs: fix sysctls.c built
  mm: compaction: remove incorrect #ifdef checks
  mm: compaction: move compaction sysctl to its own file
  mm: memory-failure: Move memory failure sysctls to its own file
  arm: simplify two-level sysctl registration for ctl_isa_vars
  ia64: simplify one-level sysctl registration for kdump_ctl_table
  utsname: simplify one-level sysctl registration for uts_kern_table
  ntfs: simplfy one-level sysctl registration for ntfs_sysctls
  coda: simplify one-level sysctl registration for coda_table
  fs/cachefiles: simplify one-level sysctl registration for cachefiles_sysctls
  xfs: simplify two-level sysctl registration for xfs_table
  nfs: simplify two-level sysctl registration for nfs_cb_sysctls
  nfs: simplify two-level sysctl registration for nfs4_cb_sysctls
  lockd: simplify two-level sysctl registration for nlm_sysctls
  proc_sysctl: enhance documentation
  xen: simplify sysctl registration for balloon
  md: simplify sysctl registration
  hv: simplify sysctl registration
  scsi: simplify sysctl registration with register_sysctl()
  csky: simplify alignment sysctl registration
  ...
2023-04-27 16:52:33 -07:00
Linus Torvalds
b68ee1c613 SCSI misc on 20230426
Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target,
 mpi3mr, hisi_sas, arcmsr).  The major core change is the
 constification of the host templates (which touches everything) along
 with other minor fixups and clean ups.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZEmJACYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishU4FAP0WYhFC
 rkbY203/+ErUuwvOKum0VwJKUowCaUD0MBwScAD+Ok/NWobmjdXUBbPUbvVkr+hE
 8B/xs9hodX+1fVJcVG0=
 =fS/j
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target,
  mpi3mr, hisi_sas, arcmsr).

  The major core change is the constification of the host templates
  (which touches everything) along with other minor fixups and clean
  ups"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
  scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command()
  scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately
  scsi: cxlflash: s/semahpore/semaphore/
  scsi: lpfc: Silence an incorrect device output
  scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation
  scsi: scsi_debug: Fix missing error code in scsi_debug_init()
  scsi: hisi_sas: Work around build failure in suspend function
  scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()
  scsi: mpt3sas: Fix an issue when driver is being removed
  scsi: mpt3sas: Remove HBA BIOS version in the kernel log
  scsi: target: core: Fix invalid memory access
  scsi: scsi_debug: Drop sdebug_queue
  scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts
  scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store()
  scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued()
  scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll()
  scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd
  scsi: scsi_debug: Use scsi_block_requests() to block queues
  scsi: scsi_debug: Protect block_unblock_all_queues() with mutex
  scsi: scsi_debug: Change shost list lock to a mutex
  ...
2023-04-26 15:39:25 -07:00
Juergen Gross
c66bb48edd xen/pvcalls: don't call bind_evtchn_to_irqhandler() under lock
bind_evtchn_to_irqhandler() shouldn't be called under spinlock, as it
can sleep.

This requires to move the calls of create_active() out of the locked
regions. This is no problem, as the worst which could happen would be
a spurious call of the interrupt handler, causing a spurious wake_up().

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/lkml/Y+JUIl64UDmdkboh@kadam/
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/20230403092711.15285-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-04-24 07:27:10 +02:00
Juergen Gross
b2c042cc80 xen/scsiback: don't call scsiback_free_translation_entry() under lock
scsiback_free_translation_entry() shouldn't be called under spinlock,
as it can sleep.

This requires to split removing a translation entry from the v2p list
from actually calling kref_put() for the entry.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/lkml/Y+JUIl64UDmdkboh@kadam/
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Link: https://lore.kernel.org/r/20230328084602.20729-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-04-24 07:27:10 +02:00
Juergen Gross
fae65ef3a1 xen/pciback: don't call pcistub_device_put() under lock
pcistub_device_put() shouldn't be called under spinlock, as it can
sleep.

For this reason pcistub_device_get_pci_dev() needs to be modified:
instead of always calling pcistub_device_get() just do the call of
pcistub_device_get() only if it is really needed. This removes the
need to call pcistub_device_put().

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/lkml/Y+JUIl64UDmdkboh@kadam/
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Link: https://lore.kernel.org/r/20230328084549.20695-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-04-24 07:27:10 +02:00
Luis Chamberlain
9f17a75b2d xen: simplify sysctl registration for balloon
register_sysctl_table() is a deprecated compatibility wrapper.
register_sysctl_init() can do the directory creation for you so just
use that.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
2023-04-13 11:49:20 -07:00
Roger Pau Monne
073828e954 ACPI: processor: Fix evaluating _PDC method when running as Xen dom0
In ACPI systems, the OS can direct power management, as opposed to the
firmware.  This OS-directed Power Management is called OSPM.  Part of
telling the firmware that the OS going to direct power management is
making ACPI "_PDC" (Processor Driver Capabilities) calls.  These _PDC
methods must be evaluated for every processor object.  If these _PDC
calls are not completed for every processor it can lead to
inconsistency and later failures in things like the CPU frequency
driver.

In a Xen system, the dom0 kernel is responsible for system-wide power
management.  The dom0 kernel is in charge of OSPM.  However, the
number of CPUs available to dom0 can be different than the number of
CPUs physically present on the system.

This leads to a problem: the dom0 kernel needs to evaluate _PDC for
all the processors, but it can't always see them.

In dom0 kernels, ignore the existing ACPI method for determining if a
processor is physically present because it might not be accurate.
Instead, ask the hypervisor for this information.

Fix this by introducing a custom function to use when running as Xen
dom0 in order to check whether a processor object matches a CPU that's
online.  Such checking is done using the existing information fetched
by the Xen pCPU subsystem, extending it to also store the ACPI ID.

This ensures that _PDC method gets evaluated for all physically online
CPUs, regardless of the number of CPUs made available to dom0.

Fixes: 5d554a7bb0 ("ACPI: processor: add internal processor_physically_present()")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-22 19:36:31 +01:00
Linus Torvalds
0eb392ec09 xen: branch for v6.3-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZBQKJwAKCRCAXGG7T9hj
 vuVgAQDhvr5mBFNqFxIfTnE8+oEsnYb0OgmR+9U3h+ECDB0P0gEAmR1fAee441YE
 2DWOAlvjmqoI2K8DTTabizXvm7x3bQk=
 =jcYl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - cleanup for xen time handling

 - enable the VGA console in a Xen PVH dom0

 - cleanup in the xenfs driver

* tag 'for-linus-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: remove unnecessary (void*) conversions
  x86/PVH: obtain VGA console info in Dom0
  x86/xen/time: cleanup xen_tsc_safe_clocksource
  xen: update arch/x86/include/asm/xen/cpuid.h
2023-03-17 10:45:49 -07:00
Dmitry Bogdanov
355c3d6135 scsi: xen-scsiback: Remove default fabric ops callouts
Remove callouts that are identical to the default implementations in TCM
Core.

Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Link: https://lore.kernel.org/r/20230313181110.20566-10-d.bogdanov@yadro.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-16 23:36:36 -04:00
Yu Zhe
7ad2c39860 xen: remove unnecessary (void*) conversions
Pointer variables of void * type do not require type cast.

Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230316083954.4223-1-yuzhe@nfschina.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-03-16 12:04:00 +01:00
Linus Torvalds
a93e884edf Driver core changes for 6.3-rc1
Here is the large set of driver core changes for 6.3-rc1.
 
 There's a lot of changes this development cycle, most of the work falls
 into two different categories:
   - fw_devlink fixes and updates.  This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.
   - driver core changes to work to make struct bus_type able to be moved
     into read-only memory (i.e. const)  The recent work with Rust has
     pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only making
     things safer overall.  This is the contuation of that work (started
     last release with kobject changes) in moving struct bus_type to be
     constant.  We didn't quite make it for this release, but the
     remaining patches will be finished up for the release after this
     one, but the groundwork has been laid for this effort.
 
 Other than that we have in here:
   - debugfs memory leak fixes in some subsystems
   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.
   - cacheinfo rework and fixes
   - Other tiny fixes, full details are in the shortlog
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/ipdg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynL3gCgwzbcWu0So3piZyLiJKxsVo9C2EsAn3sZ9gN6
 6oeFOjD3JDju3cQsfGgd
 =Su6W
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.3-rc1.

  There's a lot of changes this development cycle, most of the work
  falls into two different categories:

   - fw_devlink fixes and updates. This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.

   - driver core changes to work to make struct bus_type able to be
     moved into read-only memory (i.e. const) The recent work with Rust
     has pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only
     making things safer overall. This is the contuation of that work
     (started last release with kobject changes) in moving struct
     bus_type to be constant. We didn't quite make it for this release,
     but the remaining patches will be finished up for the release after
     this one, but the groundwork has been laid for this effort.

  Other than that we have in here:

   - debugfs memory leak fixes in some subsystems

   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.

   - cacheinfo rework and fixes

   - Other tiny fixes, full details are in the shortlog

  All of these have been in linux-next for a while with no reported
  problems"

[ Geert Uytterhoeven points out that that last sentence isn't true, and
  that there's a pending report that has a fix that is queued up - Linus ]

* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
  debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
  OPP: fix error checking in opp_migrate_dentry()
  debugfs: update comment of debugfs_rename()
  i3c: fix device.h kernel-doc warnings
  dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
  driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
  Revert "driver core: add error handling for devtmpfs_create_node()"
  Revert "devtmpfs: add debug info to handle()"
  Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
  driver core: cpu: don't hand-override the uevent bus_type callback.
  devtmpfs: remove return value of devtmpfs_delete_node()
  devtmpfs: add debug info to handle()
  driver core: add error handling for devtmpfs_create_node()
  driver core: bus: update my copyright notice
  driver core: bus: add bus_get_dev_root() function
  driver core: bus: constify bus_unregister()
  driver core: bus: constify some internal functions
  driver core: bus: constify bus_get_kset()
  driver core: bus: constify bus_register/unregister_notifier()
  driver core: remove private pointer from struct bus_type
  ...
2023-02-24 12:58:55 -08:00
Linus Torvalds
3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds
06e1a81c48 A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon by Andy
 
 - Preparatory refactoring and cleanup work in the efivar layer by Johan,
   which is needed to accommodate the Snapdragon arm64 laptops that
   expose their EFI variable store via a TEE secure world API.
 
 - Enhancements to the EFI memory map handling so that Xen dom0 can
   safely access EFI configuration tables (Demi Marie)
 
 - Wire up the newly introduced IBT/BTI flag in the EFI memory attributes
   table, so that firmware that is generated with ENDBR/BTI landing pads
   will be mapped with enforcement enabled.
 
 - Clean up how we check and print the EFI revision exposed by the
   firmware.
 
 - Incorporate EFI memory attributes protocol definition contributed by
   Evgeniy and wire it up in the EFI zboot code. This ensures that these
   images can execute under new and stricter rules regarding the default
   memory permissions for EFI page allocations. (More work is in progress
   here)
 
 - CPER header cleanup by Dan Williams
 
 - Use a raw spinlock to protect the EFI runtime services stack on arm64
   to ensure the correct semantics under -rt. (Pierre)
 
 - EFI framebuffer quirk for Lenovo Ideapad by Darrell.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPzuwsACgkQw08iOZLZ
 jyS7dwwAm95DlDxFIQi4FmTm2mqJws9PyDrkfaAK1CoyqCgeOLQT2FkVolgr8jne
 pwpwCTXtYP8y0BZvdQEIjpAq/BHKaD3GJSPfl7lo+pnUu68PpsFWaV6EdT33KKfj
 QeF0MnUvrqUeTFI77D+S0ZW2zxdo9eCcahF3TPA52/bEiiDHWBF8Qm9VHeQGklik
 zoXA15ft3mgITybgjEA0ncGrVZiBMZrYoMvbdkeoedfw02GN/eaQn8d2iHBtTDEh
 3XNlo7ONX0v50cjt0yvwFEA0AKo0o7R1cj+ziKH/bc4KjzIiCbINhy7blroSq+5K
 YMlnPHuj8Nhv3I+MBdmn/nxRCQeQsE4RfRru04hfNfdcqjAuqwcBvRXvVnjWKZHl
 CmUYs+p/oqxrQ4BjiHfw0JKbXRsgbFI6o3FeeLH9kzI9IDUPpqu3Ma814FVok9Ai
 zbOCrJf5tEtg5tIavcUESEMBuHjEafqzh8c7j7AAqbaNjlihsqosDy9aYoarEi5M
 f/tLec86
 =+pOz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "A healthy mix of EFI contributions this time:

   - Performance tweaks for efifb earlycon (Andy)

   - Preparatory refactoring and cleanup work in the efivar layer, which
     is needed to accommodate the Snapdragon arm64 laptops that expose
     their EFI variable store via a TEE secure world API (Johan)

   - Enhancements to the EFI memory map handling so that Xen dom0 can
     safely access EFI configuration tables (Demi Marie)

   - Wire up the newly introduced IBT/BTI flag in the EFI memory
     attributes table, so that firmware that is generated with ENDBR/BTI
     landing pads will be mapped with enforcement enabled

   - Clean up how we check and print the EFI revision exposed by the
     firmware

   - Incorporate EFI memory attributes protocol definition and wire it
     up in the EFI zboot code (Evgeniy)

     This ensures that these images can execute under new and stricter
     rules regarding the default memory permissions for EFI page
     allocations (More work is in progress here)

   - CPER header cleanup (Dan Williams)

   - Use a raw spinlock to protect the EFI runtime services stack on
     arm64 to ensure the correct semantics under -rt (Pierre)

   - EFI framebuffer quirk for Lenovo Ideapad (Darrell)"

* tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
  firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3
  arm64: efi: Make efi_rt_lock a raw_spinlock
  efi: Add mixed-mode thunk recipe for GetMemoryAttributes
  efi: x86: Wire up IBT annotation in memory attributes table
  efi: arm64: Wire up BTI annotation in memory attributes table
  efi: Discover BTI support in runtime services regions
  efi/cper, cxl: Remove cxl_err.h
  efi: Use standard format for printing the EFI revision
  efi: Drop minimum EFI version check at boot
  efi: zboot: Use EFI protocol to remap code/data with the right attributes
  efi/libstub: Add memory attribute protocol definitions
  efi: efivars: prevent double registration
  efi: verify that variable services are supported
  efivarfs: always register filesystem
  efi: efivars: add efivars printk prefix
  efi: Warn if trying to reserve memory under Xen
  efi: Actually enable the ESRT under Xen
  efi: Apply allowlist to EFI configuration tables when running under Xen
  efi: xen: Implement memory descriptor lookup based on hypercall
  efi: memmap: Disregard bogus entries instead of returning them
  ...
2023-02-23 14:41:48 -08:00
Linus Torvalds
5b7c4cabbb Networking changes for 6.3.
Core
 ----
 
  - Add dedicated kmem_cache for typical/small skb->head, avoid having
    to access struct page at kfree time, and improve memory use.
 
  - Introduce sysctl to set default RPS configuration for new netdevs.
 
  - Define Netlink protocol specification format which can be used
    to describe messages used by each family and auto-generate parsers.
    Add tools for generating kernel data structures and uAPI headers.
 
  - Expose all net/core sysctls inside netns.
 
  - Remove 4s sleep in netpoll if carrier is instantly detected on boot.
 
  - Add configurable limit of MDB entries per port, and port-vlan.
 
  - Continue populating drop reasons throughout the stack.
 
  - Retire a handful of legacy Qdiscs and classifiers.
 
 Protocols
 ---------
 
  - Support IPv4 big TCP (TSO frames larger than 64kB).
 
  - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
    on socket by socket basis.
 
  - Track and report in procfs number of MPTCP sockets used.
 
  - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP
    path manager.
 
  - IPv6: don't check net.ipv6.route.max_size and rely on garbage
    collection to free memory (similarly to IPv4).
 
  - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
 
  - ICMP: add per-rate limit counters.
 
  - Add support for user scanning requests in ieee802154.
 
  - Remove static WEP support.
 
  - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
    reporting.
 
  - WiFi 7 EHT channel puncturing support (client & AP).
 
 BPF
 ---
 
  - Add a rbtree data structure following the "next-gen data structure"
    precedent set by recently added linked list, that is, by using
    kfunc + kptr instead of adding a new BPF map type.
 
  - Expose XDP hints via kfuncs with initial support for RX hash and
    timestamp metadata.
 
  - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key
    to better support decap on GRE tunnel devices not operating
    in collect metadata.
 
  - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
 
  - Remove the need for trace_printk_lock for bpf_trace_printk
    and bpf_trace_vprintk helpers.
 
  - Extend libbpf's bpf_tracing.h support for tracing arguments of
    kprobes/uprobes and syscall as a special case.
 
  - Significantly reduce the search time for module symbols
    by livepatch and BPF.
 
  - Enable cpumasks to be used as kptrs, which is useful for tracing
    programs tracking which tasks end up running on which CPUs in
    different time intervals.
 
  - Add support for BPF trampoline on s390x and riscv64.
 
  - Add capability to export the XDP features supported by the NIC.
 
  - Add __bpf_kfunc tag for marking kernel functions as kfuncs.
 
  - Add cgroup.memory=nobpf kernel parameter option to disable BPF
    memory accounting for container environments.
 
 Netfilter
 ---------
 
  - Remove the CLUSTERIP target. It has been marked as obsolete
    for years, and we still have WARN splats wrt. races of
    the out-of-band /proc interface installed by this target.
 
  - Add 'destroy' commands to nf_tables. They are identical to
    the existing 'delete' commands, but do not return an error if
    the referenced object (set, chain, rule...) did not exist.
 
 Driver API
 ----------
 
  - Improve cpumask_local_spread() locality to help NICs set the right
    IRQ affinity on AMD platforms.
 
  - Separate C22 and C45 MDIO bus transactions more clearly.
 
  - Introduce new DCB table to control DSCP rewrite on egress.
 
  - Support configuration of Physical Layer Collision Avoidance (PLCA)
    Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
    shared medium Ethernet.
 
  - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
    preemption of low priority frames by high priority frames.
 
  - Add support for controlling MACSec offload using netlink SET.
 
  - Rework devlink instance refcounts to allow registration and
    de-registration under the instance lock. Split the code into multiple
    files, drop some of the unnecessarily granular locks and factor out
    common parts of netlink operation handling.
 
  - Add TX frame aggregation parameters (for USB drivers).
 
  - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
    messages with notifications for debug.
 
  - Allow offloading of UDP NEW connections via act_ct.
 
  - Add support for per action HW stats in TC.
 
  - Support hardware miss to TC action (continue processing in SW from
    a specific point in the action chain).
 
  - Warn if old Wireless Extension user space interface is used with
    modern cfg80211/mac80211 drivers. Do not support Wireless Extensions
    for Wi-Fi 7 devices at all. Everyone should switch to using nl80211
    interface instead.
 
  - Improve the CAN bit timing configuration. Use extack to return error
    messages directly to user space, update the SJW handling, including
    the definition of a new default value that will benefit CAN-FD
    controllers, by increasing their oscillator tolerance.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - nVidia BlueField-3 support (control traffic driver)
    - Ethernet support for imx93 SoCs
    - Motorcomm yt8531 gigabit Ethernet PHY
    - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
    - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
    - Amlogic gxl MDIO mux
 
  - WiFi:
    - RealTek RTL8188EU (rtl8xxxu)
    - Qualcomm Wi-Fi 7 devices (ath12k)
 
  - CAN:
    - Renesas R-Car V4H
 
 Drivers
 -------
 
  - Bluetooth:
    - Set Per Platform Antenna Gain (PPAG) for Intel controllers.
 
  - Ethernet NICs:
    - Intel (1G, igc):
      - support TSN / Qbv / packet scheduling features of i226 model
    - Intel (100G, ice):
      - use GNSS subsystem instead of TTY
      - multi-buffer XDP support
      - extend support for GPIO pins to E823 devices
    - nVidia/Mellanox:
      - update the shared buffer configuration on PFC commands
      - implement PTP adjphase function for HW offset control
      - TC support for Geneve and GRE with VF tunnel offload
      - more efficient crypto key management method
      - multi-port eswitch support
    - Netronome/Corigine:
      - add DCB IEEE support
      - support IPsec offloading for NFP3800
    - Freescale/NXP (enetc):
      - enetc: support XDP_REDIRECT for XDP non-linear buffers
      - enetc: improve reconfig, avoid link flap and waiting for idle
      - enetc: support MAC Merge layer
    - Other NICs:
      - sfc/ef100: add basic devlink support for ef100
      - ionic: rx_push mode operation (writing descriptors via MMIO)
      - bnxt: use the auxiliary bus abstraction for RDMA
      - r8169: disable ASPM and reset bus in case of tx timeout
      - cpsw: support QSGMII mode for J721e CPSW9G
      - cpts: support pulse-per-second output
      - ngbe: add an mdio bus driver
      - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
      - r8152: handle devices with FW with NCM support
      - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
      - virtio-net: support multi buffer XDP
      - virtio/vsock: replace virtio_vsock_pkt with sk_buff
      - tsnep: XDP support
 
  - Ethernet high-speed switches:
    - nVidia/Mellanox (mlxsw):
      - add support for latency TLV (in FW control messages)
    - Microchip (sparx5):
      - separate explicit and implicit traffic forwarding rules, make
        the implicit rules always active
      - add support for egress DSCP rewrite
      - IS0 VCAP support (Ingress Classification)
      - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.)
      - ES2 VCAP support (Egress Access Control)
      - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1)
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - add MAB (port auth) offload support
      - enable PTP receive for mv88e6390
    - NXP (ocelot):
      - support MAC Merge layer
      - support for the the vsc7512 internal copper phys
    - Microchip:
      - lan9303: convert to PHYLINK
      - lan966x: support TC flower filter statistics
      - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
      - lan937x: support Credit Based Shaper configuration
      - ksz9477: support Energy Efficient Ethernet
    - other:
      - qca8k: convert to regmap read/write API, use bulk operations
      - rswitch: Improve TX timestamp accuracy
 
  - Intel WiFi (iwlwifi):
    - EHT (Wi-Fi 7) rate reporting
    - STEP equalizer support: transfer some STEP (connection to radio
      on platforms with integrated wifi) related parameters from the
      BIOS to the firmware.
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - IPQ5018 support
    - Fine Timing Measurement (FTM) responder role support
    - channel 177 support
 
  - MediaTek WiFi (mt76):
    - per-PHY LED support
    - mt7996: EHT (Wi-Fi 7) support
    - Wireless Ethernet Dispatch (WED) reset support
    - switch to using page pool allocator
 
  - RealTek WiFi (rtw89):
    - support new version of Bluetooth co-existance
 
  - Mobile:
    - rmnet: support TX aggregation.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP1VIYACgkQMUZtbf5S
 IrvsChAApz0rNL/sPKxXTEfxZ1tN7D3sYxYKQPomxvl5BV+MvicrLddJy3KmzEFK
 nnJNO3nuRNuH422JQ/ylZ4mGX1opa6+5QJb0UINImXUI7Fm8HHBIuPGkv7d5CheZ
 7JexFqjPJXUy9nPyh1Rra+IA9AcRd2U7jeGEZR38wb99bHJQj5Bzdk20WArEB0el
 n44aqg49LXH71bSeXRz77x5SjkwVtYiccQxLcnmTbjLU2xVraLvI2J+wAhHnVXWW
 9lrU1+V4Ex2Xcd1xR0L0cHeK+meP1TrPRAeF+JDpVI3a/zJiE7cZjfHdG/jH5xWl
 leZJqghVozrZQNtewWWO7XhUFhMDgFu3W/1vNLjSHPZEqaz1JpM67J1+ql6s63l4
 LMWoXbcYZz+SL9ZRCoPkbGue/5fKSHv8/Jl9Sh58+eTS+c/zgN8uFGRNFXLX1+EP
 n8uvt985PxMd6x1+dHumhOUzxnY4Sfi1vjitSunTsNFQ3Cmp4SO0IfBVJWfLUCuC
 xz5hbJGJJbSpvUsO+HWyCg83E5OWghRE/Onpt2jsQSZCrO9HDg4FRTEf3WAMgaqc
 edb5KfbRZPTJQM08gWdluXzSk1nw3FNP2tXW4XlgUrEbjb+fOk0V9dQg2gyYTxQ1
 Nhvn8ZQPi6/GMMELHAIPGmmW1allyOGiAzGlQsv8EmL+OFM6WDI=
 =xXhC
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Add dedicated kmem_cache for typical/small skb->head, avoid having
     to access struct page at kfree time, and improve memory use.

   - Introduce sysctl to set default RPS configuration for new netdevs.

   - Define Netlink protocol specification format which can be used to
     describe messages used by each family and auto-generate parsers.
     Add tools for generating kernel data structures and uAPI headers.

   - Expose all net/core sysctls inside netns.

   - Remove 4s sleep in netpoll if carrier is instantly detected on
     boot.

   - Add configurable limit of MDB entries per port, and port-vlan.

   - Continue populating drop reasons throughout the stack.

   - Retire a handful of legacy Qdiscs and classifiers.

  Protocols:

   - Support IPv4 big TCP (TSO frames larger than 64kB).

   - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
     on socket by socket basis.

   - Track and report in procfs number of MPTCP sockets used.

   - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
     manager.

   - IPv6: don't check net.ipv6.route.max_size and rely on garbage
     collection to free memory (similarly to IPv4).

   - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).

   - ICMP: add per-rate limit counters.

   - Add support for user scanning requests in ieee802154.

   - Remove static WEP support.

   - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
     reporting.

   - WiFi 7 EHT channel puncturing support (client & AP).

  BPF:

   - Add a rbtree data structure following the "next-gen data structure"
     precedent set by recently added linked list, that is, by using
     kfunc + kptr instead of adding a new BPF map type.

   - Expose XDP hints via kfuncs with initial support for RX hash and
     timestamp metadata.

   - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
     better support decap on GRE tunnel devices not operating in collect
     metadata.

   - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.

   - Remove the need for trace_printk_lock for bpf_trace_printk and
     bpf_trace_vprintk helpers.

   - Extend libbpf's bpf_tracing.h support for tracing arguments of
     kprobes/uprobes and syscall as a special case.

   - Significantly reduce the search time for module symbols by
     livepatch and BPF.

   - Enable cpumasks to be used as kptrs, which is useful for tracing
     programs tracking which tasks end up running on which CPUs in
     different time intervals.

   - Add support for BPF trampoline on s390x and riscv64.

   - Add capability to export the XDP features supported by the NIC.

   - Add __bpf_kfunc tag for marking kernel functions as kfuncs.

   - Add cgroup.memory=nobpf kernel parameter option to disable BPF
     memory accounting for container environments.

  Netfilter:

   - Remove the CLUSTERIP target. It has been marked as obsolete for
     years, and we still have WARN splats wrt races of the out-of-band
     /proc interface installed by this target.

   - Add 'destroy' commands to nf_tables. They are identical to the
     existing 'delete' commands, but do not return an error if the
     referenced object (set, chain, rule...) did not exist.

  Driver API:

   - Improve cpumask_local_spread() locality to help NICs set the right
     IRQ affinity on AMD platforms.

   - Separate C22 and C45 MDIO bus transactions more clearly.

   - Introduce new DCB table to control DSCP rewrite on egress.

   - Support configuration of Physical Layer Collision Avoidance (PLCA)
     Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
     shared medium Ethernet.

   - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
     preemption of low priority frames by high priority frames.

   - Add support for controlling MACSec offload using netlink SET.

   - Rework devlink instance refcounts to allow registration and
     de-registration under the instance lock. Split the code into
     multiple files, drop some of the unnecessarily granular locks and
     factor out common parts of netlink operation handling.

   - Add TX frame aggregation parameters (for USB drivers).

   - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
     messages with notifications for debug.

   - Allow offloading of UDP NEW connections via act_ct.

   - Add support for per action HW stats in TC.

   - Support hardware miss to TC action (continue processing in SW from
     a specific point in the action chain).

   - Warn if old Wireless Extension user space interface is used with
     modern cfg80211/mac80211 drivers. Do not support Wireless
     Extensions for Wi-Fi 7 devices at all. Everyone should switch to
     using nl80211 interface instead.

   - Improve the CAN bit timing configuration. Use extack to return
     error messages directly to user space, update the SJW handling,
     including the definition of a new default value that will benefit
     CAN-FD controllers, by increasing their oscillator tolerance.

  New hardware / drivers:

   - Ethernet:
      - nVidia BlueField-3 support (control traffic driver)
      - Ethernet support for imx93 SoCs
      - Motorcomm yt8531 gigabit Ethernet PHY
      - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
      - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
      - Amlogic gxl MDIO mux

   - WiFi:
      - RealTek RTL8188EU (rtl8xxxu)
      - Qualcomm Wi-Fi 7 devices (ath12k)

   - CAN:
      - Renesas R-Car V4H

  Drivers:

   - Bluetooth:
      - Set Per Platform Antenna Gain (PPAG) for Intel controllers.

   - Ethernet NICs:
      - Intel (1G, igc):
         - support TSN / Qbv / packet scheduling features of i226 model
      - Intel (100G, ice):
         - use GNSS subsystem instead of TTY
         - multi-buffer XDP support
         - extend support for GPIO pins to E823 devices
      - nVidia/Mellanox:
         - update the shared buffer configuration on PFC commands
         - implement PTP adjphase function for HW offset control
         - TC support for Geneve and GRE with VF tunnel offload
         - more efficient crypto key management method
         - multi-port eswitch support
      - Netronome/Corigine:
         - add DCB IEEE support
         - support IPsec offloading for NFP3800
      - Freescale/NXP (enetc):
         - support XDP_REDIRECT for XDP non-linear buffers
         - improve reconfig, avoid link flap and waiting for idle
         - support MAC Merge layer
      - Other NICs:
         - sfc/ef100: add basic devlink support for ef100
         - ionic: rx_push mode operation (writing descriptors via MMIO)
         - bnxt: use the auxiliary bus abstraction for RDMA
         - r8169: disable ASPM and reset bus in case of tx timeout
         - cpsw: support QSGMII mode for J721e CPSW9G
         - cpts: support pulse-per-second output
         - ngbe: add an mdio bus driver
         - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
         - r8152: handle devices with FW with NCM support
         - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
         - virtio-net: support multi buffer XDP
         - virtio/vsock: replace virtio_vsock_pkt with sk_buff
         - tsnep: XDP support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add support for latency TLV (in FW control messages)
      - Microchip (sparx5):
         - separate explicit and implicit traffic forwarding rules, make
           the implicit rules always active
         - add support for egress DSCP rewrite
         - IS0 VCAP support (Ingress Classification)
         - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
           etc.)
         - ES2 VCAP support (Egress Access Control)
         - support for Per-Stream Filtering and Policing (802.1Q,
           8.6.5.1)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - add MAB (port auth) offload support
         - enable PTP receive for mv88e6390
      - NXP (ocelot):
         - support MAC Merge layer
         - support for the the vsc7512 internal copper phys
      - Microchip:
         - lan9303: convert to PHYLINK
         - lan966x: support TC flower filter statistics
         - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
         - lan937x: support Credit Based Shaper configuration
         - ksz9477: support Energy Efficient Ethernet
      - other:
         - qca8k: convert to regmap read/write API, use bulk operations
         - rswitch: Improve TX timestamp accuracy

   - Intel WiFi (iwlwifi):
      - EHT (Wi-Fi 7) rate reporting
      - STEP equalizer support: transfer some STEP (connection to radio
        on platforms with integrated wifi) related parameters from the
        BIOS to the firmware.

   - Qualcomm 802.11ax WiFi (ath11k):
      - IPQ5018 support
      - Fine Timing Measurement (FTM) responder role support
      - channel 177 support

   - MediaTek WiFi (mt76):
      - per-PHY LED support
      - mt7996: EHT (Wi-Fi 7) support
      - Wireless Ethernet Dispatch (WED) reset support
      - switch to using page pool allocator

   - RealTek WiFi (rtw89):
      - support new version of Bluetooth co-existance

   - Mobile:
      - rmnet: support TX aggregation"

* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
  page_pool: add a comment explaining the fragment counter usage
  net: ethtool: fix __ethtool_dev_mm_supported() implementation
  ethtool: pse-pd: Fix double word in comments
  xsk: add linux/vmalloc.h to xsk.c
  sefltests: netdevsim: wait for devlink instance after netns removal
  selftest: fib_tests: Always cleanup before exit
  net/mlx5e: Align IPsec ASO result memory to be as required by hardware
  net/mlx5e: TC, Set CT miss to the specific ct action instance
  net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
  net/mlx5: Refactor tc miss handling to a single function
  net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  net/sched: flower: Support hardware miss to tc action
  net/sched: flower: Move filter handle initialization earlier
  net/sched: cls_api: Support hardware miss to tc action
  net/sched: Rename user cookie and act cookie
  sfc: fix builds without CONFIG_RTC_LIB
  sfc: clean up some inconsistent indentings
  net/mlx4_en: Introduce flexible array to silence overflow warning
  net: lan966x: Fix possible deadlock inside PTP
  net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
  ...
2023-02-21 18:24:12 -08:00
Thomas Weißschuh
4ecc96cba8 xen: sysfs: make kobj_type structure constant
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definition to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230216-kobj_type-xen-v1-1-742423de7d71@weissschuh.net
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-18 16:50:21 +01:00
Gustavo A. R. Silva
9a0450ada8 xen: Replace one-element array with flexible-array member
One-element arrays are deprecated, and we are replacing them with flexible
array members instead. So, replace one-element array with flexible-array
member in struct xen_page_directory.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].

This results in no differences in binary output.

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/255
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/Y9xjN6Wa3VslgXeX@work
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13 09:15:45 +01:00
Oleksandr Tyshchenko
2062f9fb64 xen/grant-dma-iommu: Implement a dummy probe_device() callback
Update stub IOMMU driver (which main purpose is to reuse generic
IOMMU device-tree bindings by Xen grant DMA-mapping layer on Arm)
according to the recent changes done in the following
commit 57365a04c9 ("iommu: Move bus setup to IOMMU device registration").

With probe_device() callback being called during IOMMU device registration,
the uninitialized callback just leads to the "kernel NULL pointer
dereference" issue during boot. Fix that by adding a dummy callback.

Looks like the release_device() callback is not mandatory to be
implemented as IOMMU framework makes sure that callback is initialized
before dereferencing.

Reported-by: Viresh Kumar <viresh.kumar@linaro.org>
Fixes: 57365a04c9 ("iommu: Move bus setup to IOMMU device registration")
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/20230208153649.3604857-1-olekstysh@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13 07:22:08 +01:00
Volodymyr Babchuk
c70b7741dd xen/pvcalls-back: fix permanently masked event channel
There is a sequence of events that can lead to a permanently masked
event channel, because xen_irq_lateeoi() is newer called. This happens
when a backend receives spurious write event from a frontend. In this
case pvcalls_conn_back_write() returns early and it does not clears the
map->write counter. As map->write > 0, pvcalls_back_ioworker() returns
without calling xen_irq_lateeoi(). This leaves the event channel in
masked state, a backend does not receive any new events from a
frontend and the whole communication stops.

Move atomic_set(&map->write, 0) to the very beginning of
pvcalls_conn_back_write() to fix this issue.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reported-by: Oleksii Moisieiev <oleksii_moisieiev@epam.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230119211037.1234931-1-volodymyr_babchuk@epam.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13 06:53:20 +01:00
David Woodhouse
3e8cd711c3 xen: Allow platform PCI interrupt to be shared
When we don't use the per-CPU vector callback, we ask Xen to deliver event
channel interrupts as INTx on the PCI platform device. As such, it can be
shared with INTx on other PCI devices.

Set IRQF_SHARED, and make it return IRQ_HANDLED or IRQ_NONE according to
whether the evtchn_upcall_pending flag was actually set. Now I can share
the interrupt:

 11:         82          0   IO-APIC  11-fasteoi   xen-platform-pci, ens4

Drop the IRQF_TRIGGER_RISING. It has no effect when the IRQ is shared,
and besides, the only effect it was having even beforehand was to trigger
a debug message in both I/OAPIC and legacy PIC cases:

[    0.915441] genirq: No set_type function for IRQ 11 (IO-APIC)
[    0.951939] genirq: No set_type function for IRQ 11 (XT-PIC)

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/f9a29a68d05668a3636dd09acd94d970269eaec6.camel@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13 06:53:20 +01:00
Per Bilse
415dab3c17 drivers/xen/hypervisor: Expose Xen SIF flags to userspace
/proc/xen is a legacy pseudo filesystem which predates Xen support
getting merged into Linux.  It has largely been replaced with more
normal locations for data (/sys/hypervisor/ for info, /dev/xen/ for
user devices).  We want to compile xenfs support out of the dom0 kernel.

There is one item which only exists in /proc/xen, namely
/proc/xen/capabilities with "control_d" being the signal of "you're in
the control domain".  This ultimately comes from the SIF flags provided
at VM start.

This patch exposes all SIF flags in /sys/hypervisor/start_flags/ as
boolean files, one for each bit, returning '1' if set, '0' otherwise.
Two known flags, 'privileged' and 'initdomain', are explicitly named,
and all remaining flags can be accessed via generically named files,
as suggested by Andrew Cooper.

Signed-off-by: Per Bilse <per.bilse@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230103130213.2129753-1-per.bilse@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13 06:53:19 +01:00
Suren Baghdasaryan
1c71222e5f mm: replace vma->vm_flags direct modifications with modifier calls
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

[akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo]
Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09 16:51:39 -08:00
Greg Kroah-Hartman
2a81ada32f driver core: make struct bus_type.uevent() take a const *
The uevent() callback in struct bus_type should not be modifying the
device that is passed into it, so mark it as a const * and propagate the
function signature changes out into all relevant subsystems that use
this callback.

Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-27 13:45:52 +01:00
Peilin Ye
40e0b09081 net/sock: Introduce trace_sk_data_ready()
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready()
callback implementations.  For example:

<...>
  iperf-609  [002] .....  70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
  iperf-609  [002] .....  70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
<...>

Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23 11:26:50 +00:00
Demi Marie Obenour
c0fecaa44d efi: Apply allowlist to EFI configuration tables when running under Xen
As it turns out, Xen does not guarantee that EFI boot services data
regions in memory are preserved, which means that EFI configuration
tables pointing into such memory regions may be corrupted before the
dom0 OS has had a chance to inspect them.

This is causing problems for Qubes OS when it attempts to perform system
firmware updates, which requires that the contents of the EFI System
Resource Table are valid when the fwupd userspace program runs.

However, other configuration tables such as the memory attributes table
or the runtime properties table are equally affected, and so we need a
comprehensive workaround that works for any table type.

So when running under Xen, check the EFI memory descriptor covering the
start of the table, and disregard the table if it does not reside in
memory that is preserved by Xen.

Co-developed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-23 11:33:24 +01:00
Demi Marie Obenour
aca1d27ac3 efi: xen: Implement memory descriptor lookup based on hypercall
Xen on x86 boots dom0 in EFI mode but without providing a memory map.
This means that some consistency checks we would like to perform on
configuration tables or other data structures in memory are not
currently possible.  Xen does, however, expose EFI memory descriptor
info via a Xen hypercall, so let's wire that up instead.  It turns out
that the returned information is not identical to what Linux's
efi_mem_desc_lookup would return: the address returned is the address
passed to the hypercall, and the size returned is the number of bytes
remaining in the configuration table.  However, none of the callers of
efi_mem_desc_lookup() currently care about this.  In the future, Xen may
gain a hypercall that returns the actual start address, which can be
used instead.

Co-developed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-22 10:14:15 +01:00
Linus Torvalds
bad8c4a850 xen: branch for v6.2-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY76ohgAKCRCAXGG7T9hj
 vo8fAP0XJ94B7asqcN4W3EyeyfqxUf1eZvmWRhrbKqpLnmHLaQEA/uJBkXL49Zj7
 TTcbxR1coJ/hPwhtmONU4TNtCZ+RXw0=
 =2Ib5
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - two cleanup patches

 - a fix of a memory leak in the Xen pvfront driver

 - a fix of a locking issue in the Xen hypervisor console driver

* tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvcalls: free active map buffer on pvcalls_front_free_map
  hvc/xen: lock console list traversal
  x86/xen: Remove the unused function p2m_index()
  xen: make remove callback of xen driver void returned
2023-01-12 17:02:20 -06:00
Oleksii Moisieiev
f57034cede xen/pvcalls: free active map buffer on pvcalls_front_free_map
Data buffer for active map is allocated in alloc_active_ring and freed
in free_active_ring function, which is used only for the error
cleanup. pvcalls_front_release is calling pvcalls_front_free_map which
ends foreign access for this buffer, but doesn't free allocated pages.
Call free_active_ring to clean all allocated resources.

Signed-off-by: Oleksii Moisieiev <oleksii_moisieiev@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/6a762ee32dd655cbb09a4aa0e2307e8919761311.1671531297.git.oleksii_moisieiev@epam.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-01-09 08:05:20 +01:00
Dawei Li
7cffcade57 xen: make remove callback of xen driver void returned
Since commit fc7a6209d5 ("bus: Make remove callback return void")
forces bus_type::remove be void-returned, it doesn't make much sense for
any bus based driver implementing remove callbalk to return non-void to
its caller.

This change is for xen bus based drivers.

Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dawei Li <set_pte_at@outlook.com>
Link: https://lore.kernel.org/r/TYCP286MB23238119AB4DF190997075C9CAE39@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-12-15 16:06:10 +01:00
Linus Torvalds
a594533df0 drm for 6.2:
Initial accel subsystem support. There are no drivers yet, just the framework.
 
 New driver:
 - ofdrm - replacement for offb
 
 fbdev:
 - add support for nomodeset
 
 fourcc:
 - add Vivante tiled modifier
 
 core:
 - atomic-helpers: CRTC primary plane test fixes, fb access hooks
 - connector: TV API consistency, cmdline parser improvements
 - send connector hotplug on cleanup
 - sort makefile objects
 
 tests:
 - sort kunit tests
 - improve DP-MST tests
 - add kunit helpers to create a device
 
 sched:
 - module param for scheduling policy
 - refcounting fix
 
 buddy:
 - add back random seed log
 
 ttm:
 - convert ttm_resource to size_t
 - optimize pool allocations
 
 edid:
 - HFVSDB parsing support fixes
 - logging/debug improvements
 - DSC quirks
 
 dma-buf:
 - Add unlocked vmap and attachment mapping
 - move drivers to common locking convention
 - locking improvements
 
 firmware:
 - new API for rPI firmware and vc4
 
 xilinx:
 - zynqmp: displayport bridge support
 - dpsub fix
 
 bridge:
 - adv7533: Remove dynamic lane switching
 - it6505: Runtime PM support, sync improvements
 - ps8640: Handle AUX defer messages
 - tc358775: Drop soft-reset over I2C
 
 panel:
 - panel-edp: Add INX N116BGE-EA2 C2 and C4 support.
 - Jadard JD9365DA-H3
 - NewVision NV3051D
 
 amdgpu:
 - DCN support on ARM
 - DCN 2.1 secure display
 - Sienna Cichlid mode2 reset fixes
 - new GC 11.x firmware versions
 - drop AMD specific DSC workarounds in favour of drm code
 - clang warning fixes
 - scheduler rework
 - SR-IOV fixes
 - GPUVM locking fixes
 - fix memory leak in CS IOCTL error path
 - flexible array updates
 - enable new GC/PSP/SMU/NBIO IP
 - GFX preemption support for gfx9
 
 amdkfd:
 - cache size fixes
 - userptr fixes
 - enable cooperative launch on gfx 10.3
 - enable GC 11.0.4 KFD support
 
 radeon:
 - replace kmap with kmap_local_page
 - ACPI ref count fix
 - HDA audio notifier support
 
 i915:
 - DG2 enabled by default
 - MTL enablement work
 - hotplug refactoring
 - VBT improvements
 - Display and watermark refactoring
 - ADL-P workaround
 - temp disable runtime_pm for discrete-
 - fix for A380 as a secondary GPU
 - Wa_18017747507 for DG2
 - CS timestamp support fixes for gen5 and earlier
 - never purge busy TTM objects
 - use i915_sg_dma_sizes for all backends
 - demote GuC kernel contexts to normal priority
 - gvt: refactor for new MDEV interface
 - enable DC power states on eDP ports
 - fix gen 2/3 workarounds
 
 nouveau:
 - fix page fault handling
 - Ampere acceleration support
 - driver stability improvements
 - nva3 backlight support
 
 msm:
 - MSM_INFO_GET_FLAGS support
 - DPU: XR30 and P010 image formats
 - Qualcomm SM6115 support
 - DSI PHY support for QCM2290
 - HDMI: refactored dev init path
 - remove exclusive-fence hack
 - fix speed-bin detection
 - enable clamp to idle on 7c3
 - improved hangcheck detection
 
 vmwgfx:
 - fb and cursor refactoring
 - convert to generic hashtable
 - cursor improvements
 
 etnaviv:
 - hw workarounds
 - softpin MMU fixes
 
 ast:
 - atomic gamma LUT support
 - convert to SHMEM
 
 lcdif:
 - support YUV planes
 - Increase DMA burst size
 - FIFO threshold tuning
 
 meson:
 - fix return type of cvbs mode_valid
 
 mgag200:
 - fix PLL setup on some revisions
 
 sun4i:
 - A100 and D1 support
 
 udl:
 - modesetting improvements
 - hot unplug support
 
 vc4:
 - support PAL-M
 - fix regression preventing 4K @ 60Hz
 - fix NULL ptr deref
 
 v3d:
 - switch to drm managed resources
 
 renesas:
 - RZ/G2L DSI support
 - DU Kconfig cleanup
 
 mediatek:
 - fixup dpi and hdmi
 - MT8188 dpi support
 - MT8195 AFBC support
 
 tegra:
 - NVDEC hardware on Tegra234 SoC
 
 hdlcd:
 - switch to drm managed resources
 
 ingenic:
 - fix registration error path
 
 hisilicon:
 - convert to drm_mode_init
 
 maildp:
 - use managed resources
 
 mtk:
 - use drm_mode_init
 
 rockchip:
 - use drm_mode_copy
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmOXxI0ACgkQDHTzWXnE
 hr4NyBAAojK3N+XJf2b8LWuRKsShCr5FXlteEDxiYGLeB8/g4x3LztSfHgUg0iuS
 nP1m7Cx4snXcVNS6iyOsoZVq1EGUAWvv+mPWJe1UywjpyqtciTVQ11GEHRvI/w+V
 GRvkhmt/TsoZA0QIlS2MaOmhn9j17QOcuYTUjYdyRL4tsrHWrTASH5W1Jt2xmDyw
 5FUJvfukPWm100DVWbh6hWbCKL22bDDF/nj1H+G6hYSyTjVbk7wZ0vy2m6TnIHNF
 iyBHBIzFPg3BveiSlKe6aVX7Gq2d8bfqjHsgN5f1qcS4ejWEkHLVxJtBdOB+fOSC
 7o8Ms7WHi1AmnkOVCGRIjJ0cJrLZu2HDlyhViguAO1XQ3Jvuo/4WW3mplv+YPOMc
 c+P/zuPG42d4lrISuB8wspTdOgxmqpZDkg3HE6n1+jiVR0u4hTTYktoPnLsHX6KG
 l/l2B6aVAxE4b6P0q3ofYoAnk5rNsb1YUS+a8kC6f97TQ3gmOsN75iZXD/ASHg2r
 ozhh2wcFxIPkJhE7vqLWPIBCWQs93sGyQXoI7Q0TJaIAZTXV0VmO1BIofetpVImE
 7FhDC4wvBedXywN8NYUEFbCTOnIcDMteM/i6S1ns78s5UjDa5osPuS5I02VT1lbN
 tvnJoHNkhCt13lJz63b0HNFm3cPKoRosCQhJeshyUYaFKs+evL0=
 =pABG
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "The biggest highlight is that the accel subsystem framework is merged.
  Hopefully for 6.3 we will be able to line up a driver to use it.

  In drivers land, i915 enables DG2 support by default now, and nouveau
  has a big stability refactoring and initial ampere support, AMD
  includes new hw IP support and should build on ARM again. There is
  also an ofdrm driver to take over offb on platforms it's used.

  Stuff outside my tree, the dma-buf patches hit a few places, the vc4
  firmware changes also do, and i915 has some interactions with MEI for
  discrete GPUs. I think all of those should have been acked/reviewed by
  relevant parties.

  New driver:
   - ofdrm - replacement for offb

  fbdev:
   - add support for nomodeset

  fourcc:
   - add Vivante tiled modifier

  core:
   - atomic-helpers: CRTC primary plane test fixes, fb access hooks
   - connector: TV API consistency, cmdline parser improvements
   - send connector hotplug on cleanup
   - sort makefile objects

  tests:
   - sort kunit tests
   - improve DP-MST tests
   - add kunit helpers to create a device

  sched:
   - module param for scheduling policy
   - refcounting fix

  buddy:
   - add back random seed log

  ttm:
   - convert ttm_resource to size_t
   - optimize pool allocations

  edid:
   - HFVSDB parsing support fixes
   - logging/debug improvements
   - DSC quirks

  dma-buf:
   - Add unlocked vmap and attachment mapping
   - move drivers to common locking convention
   - locking improvements

  firmware:
   - new API for rPI firmware and vc4

  xilinx:
   - zynqmp: displayport bridge support
   - dpsub fix

  bridge:
   - adv7533: Remove dynamic lane switching
   - it6505: Runtime PM support, sync improvements
   - ps8640: Handle AUX defer messages
   - tc358775: Drop soft-reset over I2C

  panel:
   - panel-edp: Add INX N116BGE-EA2 C2 and C4 support.
   - Jadard JD9365DA-H3
   - NewVision NV3051D

  amdgpu:
   - DCN support on ARM
   - DCN 2.1 secure display
   - Sienna Cichlid mode2 reset fixes
   - new GC 11.x firmware versions
   - drop AMD specific DSC workarounds in favour of drm code
   - clang warning fixes
   - scheduler rework
   - SR-IOV fixes
   - GPUVM locking fixes
   - fix memory leak in CS IOCTL error path
   - flexible array updates
   - enable new GC/PSP/SMU/NBIO IP
   - GFX preemption support for gfx9

  amdkfd:
   - cache size fixes
   - userptr fixes
   - enable cooperative launch on gfx 10.3
   - enable GC 11.0.4 KFD support

  radeon:
   - replace kmap with kmap_local_page
   - ACPI ref count fix
   - HDA audio notifier support

  i915:
   - DG2 enabled by default
   - MTL enablement work
   - hotplug refactoring
   - VBT improvements
   - Display and watermark refactoring
   - ADL-P workaround
   - temp disable runtime_pm for discrete-
   - fix for A380 as a secondary GPU
   - Wa_18017747507 for DG2
   - CS timestamp support fixes for gen5 and earlier
   - never purge busy TTM objects
   - use i915_sg_dma_sizes for all backends
   - demote GuC kernel contexts to normal priority
   - gvt: refactor for new MDEV interface
   - enable DC power states on eDP ports
   - fix gen 2/3 workarounds

  nouveau:
   - fix page fault handling
   - Ampere acceleration support
   - driver stability improvements
   - nva3 backlight support

  msm:
   - MSM_INFO_GET_FLAGS support
   - DPU: XR30 and P010 image formats
   - Qualcomm SM6115 support
   - DSI PHY support for QCM2290
   - HDMI: refactored dev init path
   - remove exclusive-fence hack
   - fix speed-bin detection
   - enable clamp to idle on 7c3
   - improved hangcheck detection

  vmwgfx:
   - fb and cursor refactoring
   - convert to generic hashtable
   - cursor improvements

  etnaviv:
   - hw workarounds
   - softpin MMU fixes

  ast:
   - atomic gamma LUT support
   - convert to SHMEM

  lcdif:
   - support YUV planes
   - Increase DMA burst size
   - FIFO threshold tuning

  meson:
   - fix return type of cvbs mode_valid

  mgag200:
   - fix PLL setup on some revisions

  sun4i:
   - A100 and D1 support

  udl:
   - modesetting improvements
   - hot unplug support

  vc4:
   - support PAL-M
   - fix regression preventing 4K @ 60Hz
   - fix NULL ptr deref

  v3d:
   - switch to drm managed resources

  renesas:
   - RZ/G2L DSI support
   - DU Kconfig cleanup

  mediatek:
   - fixup dpi and hdmi
   - MT8188 dpi support
   - MT8195 AFBC support

  tegra:
   - NVDEC hardware on Tegra234 SoC

  hdlcd:
   - switch to drm managed resources

  ingenic:
   - fix registration error path

  hisilicon:
   - convert to drm_mode_init

  maildp:
   - use managed resources

  mtk:
   - use drm_mode_init

  rockchip:
   - use drm_mode_copy"

* tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drm: (1397 commits)
  drm/amdgpu: fix mmhub register base coding error
  drm/amdgpu: add tmz support for GC IP v11.0.4
  drm/amdgpu: enable GFX Clock Gating control for GC IP v11.0.4
  drm/amdgpu: enable GFX Power Gating for GC IP v11.0.4
  drm/amdgpu: enable GFX IP v11.0.4 CG support
  drm/amdgpu: Make amdgpu_ring_mux functions as static
  drm/amdgpu: generally allow over-commit during BO allocation
  drm/amd/display: fix array index out of bound error in DCN32 DML
  drm/amd/display: 3.2.215
  drm/amd/display: set optimized required for comp buf changes
  drm/amd/display: Add debug option to skip PSR CRTC disable
  drm/amd/display: correct DML calc error of UrgentLatency
  drm/amd/display: correct static_screen_event_mask
  drm/amd/display: Ensure commit_streams returns the DC return code
  drm/amd/display: read invalid ddc pin status cause engine busy
  drm/amd/display: Bypass DET swath fill check for max clocks
  drm/amd/display: Disable uclk pstate for subvp pipes
  drm/amd/display: Fix DCN2.1 default DSC clocks
  drm/amd/display: Enable dp_hdmi21_pcon support
  drm/amd/display: prevent seamless boot on displays that don't have the preferred dig
  ...
2022-12-13 11:59:58 -08:00
Linus Torvalds
75f4d9af8b iov_iter work; most of that is about getting rid of
direction misannotations and (hopefully) preventing
 more of the same for the future.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY5ZzQAAKCRBZ7Krx/gZQ
 65RZAP4nTkvOn0NZLVFkuGOx8pgJelXAvrteyAuecVL8V6CR4AD40qCVY51PJp8N
 MzwiRTeqnGDxTTF7mgd//IB6hoatAA==
 =bcvF
 -----END PGP SIGNATURE-----

Merge tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull iov_iter updates from Al Viro:
 "iov_iter work; most of that is about getting rid of direction
  misannotations and (hopefully) preventing more of the same for the
  future"

* tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  use less confusing names for iov_iter direction initializers
  iov_iter: saner checks for attempt to copy to/from iterator
  [xen] fix "direction" argument of iov_iter_kvec()
  [vhost] fix 'direction' argument of iov_iter_{init,bvec}()
  [target] fix iov_iter_bvec() "direction" argument
  [s390] memcpy_real(): WRITE is "data source", not destination...
  [s390] zcore: WRITE is "data source", not destination...
  [infiniband] READ is "data destination", not source...
  [fsi] WRITE is "data source", not destination...
  [s390] copy_oldmem_kernel() - WRITE is "data source", not destination
  csum_and_copy_to_iter(): handle ITER_DISCARD
  get rid of unlikely() on page_copy_sane() calls
2022-12-12 18:29:54 -08:00
Linus Torvalds
456ed864fd ACPI updates for 6.2-rc1
- Update the ACPICA code in the kernel to the 20221020 upstream
    version and fix a couple of issues in it:
 
    * Make acpi_ex_load_op() match upstream implementation (Rafael
      Wysocki).
    * Add support for loong_arch-specific APICs in MADT (Huacai Chen).
    * Add support for fixed PCIe wake event (Huacai Chen).
    * Add EBDA pointer sanity checks (Vit Kabele).
    * Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele).
    * Add CCEL table support to both compiler/disassembler (Kuppuswamy
      Sathyanarayanan).
    * Add a couple of new UUIDs to the known UUID list (Bob Moore).
    * Add support for FFH Opregion special context data (Sudeep Holla).
    * Improve warning message for "invalid ACPI name" (Bob Moore).
    * Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT
      table (Alison Schofield).
    * Prepare IORT support for revision E.e (Robin Murphy).
    * Finish support for the CDAT table (Bob Moore).
    * Fix error code path in acpi_ds_call_control_method() (Rafael
      Wysocki).
    * Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li
      Zetao).
    * Update the version of the ACPICA code in the kernel (Bob Moore).
 
  - Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device
    enumeration code (Giulio Benetti).
 
  - Change the return type of the ACPI driver remove callback to void and
    update its users accordingly (Dawei Li).
 
  - Add general support for FFH address space type and implement the low-
    level part of it for ARM64 (Sudeep Holla).
 
  - Fix stale comments in the ACPI tables parsing code and make it print
    more messages related to MADT (Hanjun Guo, Huacai Chen).
 
  - Replace invocations of generic library functions with more kernel-
    specific counterparts in the ACPI sysfs interface (Christophe JAILLET,
    Xu Panda).
 
  - Print full name paths of ACPI power resource objects during
    enumeration (Kane Chen).
 
  - Eliminate a compiler warning regarding a missing function prototype
    in the ACPI power management code (Sudeep Holla).
 
  - Fix and clean up the ACPI processor driver (Rafael Wysocki, Li Zhong,
    Colin Ian King, Sudeep Holla).
 
  - Add quirk for the HP Pavilion Gaming 15-cx0041ur to the ACPI EC
    driver (Mia Kanashi).
 
  - Add some mew ACPI backlight handling quirks and update some existing
    ones (Hans de Goede).
 
  - Make the ACPI backlight driver prefer the native backlight control
    over vendor backlight control when possible (Hans de Goede).
 
  - Drop unsetting ACPI APEI driver data on remove (Uwe Kleine-König).
 
  - Use xchg_release() instead of cmpxchg() for updating new GHES cache
    slots (Ard Biesheuvel).
 
  - Clean up the ACPI APEI code (Sudeep Holla, Christophe JAILLET, Jay Lu).
 
  - Add new I2C device enumeration quirks for Medion Lifetab S10346 and
    Lenovo Yoga Tab 3 Pro (YT3-X90F) (Hans de Goede).
 
  - Make the ACPI battery driver notify user space about adding new
    battery hooks and removing the existing ones (Armin Wolf).
 
  - Modify the pfr_update and pfr_telemetry drivers to use ACPI_FREE()
    for freeing acpi_object structures to help diagnostics (Wang ShaoBo).
 
  - Make the ACPI fan driver use sysfs_emit_at() in its sysfs interface
    code (ye xingchen).
 
  - Fix the _FIF package extraction failure handling in the ACPI fan
    driver (Hanjun Guo).
 
  - Fix the PCC mailbox handling error code path (Huisong Li).
 
  - Avoid using PCC Opregions if there is no platform interrupt allocated
    for this purpose (Huisong Li).
 
  - Use sysfs_emit() instead of scnprintf() in the ACPI PAD driver and
    CPPC library (ye xingchen).
 
  - Fix some kernel-doc issues in the ACPI GSI processing code (Xiongfeng
    Wang).
 
  - Fix name memory leak in pnp_alloc_dev() (Yang Yingliang).
 
  - Do not disable PNP devices on suspend when they cannot be re-enabled
    on resume (Hans de Goede).
 
  - Clean up the ACPI thermal driver a bit (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmOXV10SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxuOwP/2zew6val2Jf7I/Yxf1iQLlRyGmhFnaH
 wpltJvBjlHjAUKnPQ/kLYK9fjuUY5HVgjOE03WpwhFUpmhftYTrSkhoVkJ1Mw9Zl
 RNOAEgCG484ThHiTIVp/dMPxrtfuqpdbamhWX3Q51IfXjGW8Vc/lDxIa3k/JQxyq
 ko8GFPCoebJrSCfuwaAf2+xSQaf6dq4jpL/rlIk+nYMMB9mQmXhNEhc+l97NaCe8
 MyCIGynyNbhGsIlwdHRvTp04EIe8h0Z1+Dyns7g/TrzHj3Aezy7QVZbn8sKdZWa1
 W/Ck9QST5tfpDWyr+hUXxUJjEn4Yy+GXjM2xON0EMx5q+JD9XsOpwWOVwTR7CS5s
 FwEd6I89SC8OZM86AgMtnGxygjpK24R/kGzHjhG15IQCsypc8Rvzoxl0L0YVoon/
 UTkE57GzNWVzu0pY/oXJc2aT7lVqFXMFZ6ft/zHnBRnQmrcIi+xgDO5ni5KxctFN
 TVFwbAMCuwVx6IOcVQCZM2g4aJw426KpUn19fKnXvPwR5UIufBaCzSKWMiYrtdXr
 O5BM8ElYuyKCWGYEE0GSMjZygyDpyY6ENLH7s7P1IEmFyigBzaaGBbKm108JJq4V
 eCWJYTAx8pAptsU/vfuMvEQ1ErfhZ3TTokA5Lv0uPf53VcAnWDb7EAbW6ZGMwFSI
 IaV6cv6ILoqO
 =GVzp
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and PNP updates from Rafael Wysocki:
 "These include new code (for instance, support for the FFH address
  space type and support for new firmware data structures in ACPICA),
  some new quirks (mostly related to backlight handling and I2C
  enumeration), a number of fixes and a fair amount of cleanups all
  over.

  Specifics:

   - Update the ACPICA code in the kernel to the 20221020 upstream
     version and fix a couple of issues in it:
      - Make acpi_ex_load_op() match upstream implementation (Rafael
        Wysocki)
      - Add support for loong_arch-specific APICs in MADT (Huacai Chen)
      - Add support for fixed PCIe wake event (Huacai Chen)
      - Add EBDA pointer sanity checks (Vit Kabele)
      - Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele)
      - Add CCEL table support to both compiler/disassembler (Kuppuswamy
        Sathyanarayanan)
      - Add a couple of new UUIDs to the known UUID list (Bob Moore)
      - Add support for FFH Opregion special context data (Sudeep
        Holla)
      - Improve warning message for "invalid ACPI name" (Bob Moore)
      - Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT
        table (Alison Schofield)
      - Prepare IORT support for revision E.e (Robin Murphy)
      - Finish support for the CDAT table (Bob Moore)
      - Fix error code path in acpi_ds_call_control_method() (Rafael
        Wysocki)
      - Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li
        Zetao)
      - Update the version of the ACPICA code in the kernel (Bob Moore)

   - Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device
     enumeration code (Giulio Benetti)

   - Change the return type of the ACPI driver remove callback to void
     and update its users accordingly (Dawei Li)

   - Add general support for FFH address space type and implement the
     low- level part of it for ARM64 (Sudeep Holla)

   - Fix stale comments in the ACPI tables parsing code and make it
     print more messages related to MADT (Hanjun Guo, Huacai Chen)

   - Replace invocations of generic library functions with more kernel-
     specific counterparts in the ACPI sysfs interface (Christophe
     JAILLET, Xu Panda)

   - Print full name paths of ACPI power resource objects during
     enumeration (Kane Chen)

   - Eliminate a compiler warning regarding a missing function prototype
     in the ACPI power management code (Sudeep Holla)

   - Fix and clean up the ACPI processor driver (Rafael Wysocki, Li
     Zhong, Colin Ian King, Sudeep Holla)

   - Add quirk for the HP Pavilion Gaming 15-cx0041ur to the ACPI EC
     driver (Mia Kanashi)

   - Add some mew ACPI backlight handling quirks and update some
     existing ones (Hans de Goede)

   - Make the ACPI backlight driver prefer the native backlight control
     over vendor backlight control when possible (Hans de Goede)

   - Drop unsetting ACPI APEI driver data on remove (Uwe Kleine-König)

   - Use xchg_release() instead of cmpxchg() for updating new GHES cache
     slots (Ard Biesheuvel)

   - Clean up the ACPI APEI code (Sudeep Holla, Christophe JAILLET, Jay
     Lu)

   - Add new I2C device enumeration quirks for Medion Lifetab S10346 and
     Lenovo Yoga Tab 3 Pro (YT3-X90F) (Hans de Goede)

   - Make the ACPI battery driver notify user space about adding new
     battery hooks and removing the existing ones (Armin Wolf)

   - Modify the pfr_update and pfr_telemetry drivers to use ACPI_FREE()
     for freeing acpi_object structures to help diagnostics (Wang
     ShaoBo)

   - Make the ACPI fan driver use sysfs_emit_at() in its sysfs interface
     code (ye xingchen)

   - Fix the _FIF package extraction failure handling in the ACPI fan
     driver (Hanjun Guo)

   - Fix the PCC mailbox handling error code path (Huisong Li)

   - Avoid using PCC Opregions if there is no platform interrupt
     allocated for this purpose (Huisong Li)

   - Use sysfs_emit() instead of scnprintf() in the ACPI PAD driver and
     CPPC library (ye xingchen)

   - Fix some kernel-doc issues in the ACPI GSI processing code
     (Xiongfeng Wang)

   - Fix name memory leak in pnp_alloc_dev() (Yang Yingliang)

   - Do not disable PNP devices on suspend when they cannot be
     re-enabled on resume (Hans de Goede)

   - Clean up the ACPI thermal driver a bit (Rafael Wysocki)"

* tag 'acpi-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (67 commits)
  ACPI: x86: Add skip i2c clients quirk for Medion Lifetab S10346
  ACPI: APEI: EINJ: Refactor available_error_type_show()
  ACPI: APEI: EINJ: Fix formatting errors
  ACPI: processor: perflib: Adjust acpi_processor_notify_smm() return value
  ACPI: processor: perflib: Rearrange acpi_processor_notify_smm()
  ACPI: processor: perflib: Rearrange unregistration routine
  ACPI: processor: perflib: Drop redundant parentheses
  ACPI: processor: perflib: Adjust white space
  ACPI: processor: idle: Drop unnecessary statements and parens
  ACPI: thermal: Adjust critical.flags.valid check
  ACPI: fan: Convert to use sysfs_emit_at() API
  ACPICA: Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage()
  ACPI: battery: Call power_supply_changed() when adding hooks
  ACPI: use sysfs_emit() instead of scnprintf()
  ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Tab 3 Pro (YT3-X90F)
  ACPI: APEI: Remove a useless include
  PNP: Do not disable devices on suspend when they cannot be re-enabled on resume
  ACPI: processor: Silence missing prototype warnings
  ACPI: processor_idle: Silence missing prototype warnings
  ACPI: PM: Silence missing prototype warning
  ...
2022-12-12 13:38:17 -08:00
Rafael J. Wysocki
45494d77f2 Merge branches 'acpi-scan', 'acpi-bus', 'acpi-tables' and 'acpi-sysfs'
Merge ACPI changes related to device enumeration, device object
managenet, operation region handling, table parsing and sysfs
interface:

 - Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device
   enumeration code (Giulio Benetti).

 - Change the return type of the ACPI driver remove callback to void and
   update its users accordingly (Dawei Li).

 - Add general support for FFH address space type and implement the low-
   level part of it for ARM64 (Sudeep Holla).

 - Fix stale comments in the ACPI tables parsing code and make it print
   more messages related to MADT (Hanjun Guo, Huacai Chen).

 - Replace invocations of generic library functions with more kernel-
   specific counterparts in the ACPI sysfs interface (Christophe JAILLET,
   Xu Panda).

* acpi-scan:
  ACPI: scan: substitute empty_zero_page with helper ZERO_PAGE(0)

* acpi-bus:
  ACPI: FFH: Silence missing prototype warnings
  ACPI: make remove callback of ACPI driver void
  ACPI: bus: Fix the _OSC capability check for FFH OpRegion
  arm64: Add architecture specific ACPI FFH Opregion callbacks
  ACPI: Implement a generic FFH Opregion handler

* acpi-tables:
  ACPI: tables: Fix the stale comments for acpi_locate_initial_tables()
  ACPI: tables: Print CORE_PIC information when MADT is parsed

* acpi-sysfs:
  ACPI: sysfs: use sysfs_emit() to instead of scnprintf()
  ACPI: sysfs: Use kstrtobool() instead of strtobool()
2022-12-12 14:55:44 +01:00
Harshit Mogalapalli
8b997b2bb2 xen/privcmd: Fix a possible warning in privcmd_ioctl_mmap_resource()
As 'kdata.num' is user-controlled data, if user tries to allocate
memory larger than(>=) MAX_ORDER, then kcalloc() will fail, it
creates a stack trace and messes up dmesg with a warning.

Call trace:
-> privcmd_ioctl
--> privcmd_ioctl_mmap_resource

Add __GFP_NOWARN in order to avoid too large allocation warning.
This is detected by static analysis using smatch.

Fixes: 3ad0876554 ("xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20221126050745.778967-1-harshit.m.mogalapalli@oracle.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-12-05 13:54:29 +01:00
Oleksandr Tyshchenko
ef8ae384b4 xen/virtio: Handle PCI devices which Host controller is described in DT
Use the same "xen-grant-dma" device concept for the PCI devices
behind device-tree based PCI Host controller, but with one modification.
Unlike for platform devices, we cannot use generic IOMMU bindings
(iommus property), as we need to support more flexible configuration.
The problem is that PCI devices under the single PCI Host controller
may have the backends running in different Xen domains and thus have
different endpoints ID (backend domains ID).

Add ability to deal with generic PCI-IOMMU bindings (iommu-map/
iommu-map-mask properties) which allows us to describe relationship
between PCI devices and backend domains ID properly.

To avoid having to look up for the PCI Host bridge twice and reduce
the amount of checks pass an extra struct device_node *np to
xen_dt_grant_init_backend_domid().

So with current patch the code expects iommus property for the platform
devices and iommu-map/iommu-map-mask properties for PCI devices.

The example of generated by the toolstack iommu-map property
for two PCI devices 0000:00:01.0 and 0000:00:02.0 whose
backends are running in different Xen domains with IDs 1 and 2
respectively:
iommu-map = <0x08 0xfde9 0x01 0x08 0x10 0xfde9 0x02 0x08>;

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/20221025162004.8501-3-olekstysh@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-12-05 12:59:49 +01:00
Oleksandr Tyshchenko
035e3a4321 xen/virtio: Optimize the setup of "xen-grant-dma" devices
This is needed to avoid having to parse the same device-tree
several times for a given device.

For this to work we need to install the xen_virtio_restricted_mem_acc
callback in Arm's xen_guest_init() which is same callback as x86's
PV and HVM modes already use and remove the manual assignment in
xen_setup_dma_ops(). Also we need to split the code to initialize
backend_domid into a separate function.

Prior to current patch we parsed the device-tree three times:
1. xen_setup_dma_ops()->...->xen_is_dt_grant_dma_device()
2. xen_setup_dma_ops()->...->xen_dt_grant_init_backend_domid()
3. xen_virtio_mem_acc()->...->xen_is_dt_grant_dma_device()

With current patch we parse the device-tree only once in
xen_virtio_restricted_mem_acc()->...->xen_dt_grant_init_backend_domid()

Other benefits are:
- Not diverge from x86 when setting up Xen grant DMA ops
- Drop several global functions

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/20221025162004.8501-2-olekstysh@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-12-05 12:59:49 +01:00
Al Viro
de4eda9de2 use less confusing names for iov_iter direction initializers
READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25 13:01:55 -05:00
Al Viro
fc02f33787 [xen] fix "direction" argument of iov_iter_kvec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25 13:01:21 -05:00
Dave Airlie
d47f958083 Linux 6.1-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmN6wAgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG0EYH/3/RO90NbrFItraN
 Lzr+d3VdbGjTu8xd1M+PRTmwh3zxLpB+Jwqr0T0A2gzL9B/D+AUPUJdrCVbv9DqS
 FLJAVqoeV20dNBAHSffOOLPsgCZ+Eu+LzlNN7Iqde0e8cyZICFMNktitui84Xm/i
 1NgFVgz9OZ6+aieYvUj3FrFq0p8GTIaC/oybDZrxYKcO8ZzKVMJ11swRw10wwq0g
 qOOECvV3w7wlQ8upQZkzFxItKFc7EexZI6R4elXeGSJJ9Hlc092dv/zsKB9dwV+k
 WcwkJrZRoezYXzgGBFxUcQtzi+ethjrPjuJuM1rYLUSIcfIW/0lkaSLgRoBu8D+I
 1GfXkXs=
 =gt6P
 -----END PGP SIGNATURE-----

Backmerge tag 'v6.1-rc6' into drm-next

Linux 6.1-rc6

This is needed for drm-misc-next and tegra.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-11-24 11:05:43 +10:00
Dawei Li
6c0eb5ba35 ACPI: make remove callback of ACPI driver void
For bus-based driver, device removal is implemented as:
1 device_remove()->
2   bus->remove()->
3     driver->remove()

Driver core needs no inform from callee(bus driver) about the
result of remove callback. In that case, commit fc7a6209d5
("bus: Make remove callback return void") forces bus_type::remove
be void-returned.

Now we have the situation that both 1 & 2 of calling chain are
void-returned, so it does not make much sense for 3(driver->remove)
to return non-void to its caller.

So the basic idea behind this change is making remove() callback of
any bus-based driver to be void-returned.

This change, for itself, is for device drivers based on acpi-bus.

Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Lee Jones <lee@kernel.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dawei Li <set_pte_at@outlook.com>
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>  # for drivers/platform/surface/*
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23 19:11:22 +01:00
Linus Torvalds
cc675d22e4 xen: branch for v6.1-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY3TN0wAKCRCAXGG7T9hj
 voAOAP4i3FjRj/ilXohox3F7iyPsRbFrGnayYcHRPeFF8UPz8QEAzyLP/FBGbmho
 sSuhcmb6r9foGKri7zyTKHIA4bkz4Qo=
 =/KaG
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Two trivial cleanups, and three simple fixes"

* tag 'for-linus-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/platform-pci: use define instead of literal number
  xen/platform-pci: add missing free_irq() in error path
  xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too
  xen/pcpu: fix possible memory leak in register_pcpu()
  x86/xen: Use kstrtobool() instead of strtobool()
2022-11-16 10:49:06 -08:00
Juergen Gross
4abb77fc55 xen/platform-pci: use define instead of literal number
Instead of "0x01" use the HVM_PARAM_CALLBACK_TYPE_PCI_INTX define from
the interface header in get_callback_via().

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-11-15 07:34:13 +01:00
ruanjinjie
c53717e1e3 xen/platform-pci: add missing free_irq() in error path
free_irq() is missing in case of error in platform_pci_probe(), fix that.

Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Link: https://lore.kernel.org/r/20221114112124.1965611-1-ruanjinjie@huawei.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-11-14 13:29:14 +01:00
Marek Marczykowski-Górecki
5e29500eba xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too
When Xen domain configures MSI-X, the usual approach is to enable MSI-X
together with masking all of them via the config space, then fill the
table and only then clear PCI_MSIX_FLAGS_MASKALL. Allow doing this via
QEMU running in a stub domain.

Previously, when changing PCI_MSIX_FLAGS_MASKALL was not allowed, the
whole write was aborted, preventing change to the PCI_MSIX_FLAGS_ENABLE
bit too.

Note the Xen hypervisor intercepts this write anyway, and may keep the
PCI_MSIX_FLAGS_MASKALL bit set if it wishes to. It will store the
guest-requested state and will apply it eventually.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20221114103110.1519413-1-marmarek@invisiblethingslab.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-11-14 13:29:10 +01:00
Yang Yingliang
da36a2a76b xen/pcpu: fix possible memory leak in register_pcpu()
In device_add(), dev_set_name() is called to allocate name, if it returns
error, the name need be freed. As comment of device_register() says, it
should use put_device() to give up the reference in the error path. So fix
this by calling put_device(), then the name can be freed in kobject_cleanup().

Fixes: f65c9bb3fb ("xen/pcpu: Xen physical cpus online/offline sys interface")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20221110152441.401630-1-yangyingliang@huawei.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-11-14 13:29:07 +01:00
Dave Airlie
b837d3db9a drm-misc-next for 6.2:
UAPI Changes:
   - Documentation for page-flip flags
 
 Cross-subsystem Changes:
   - dma-buf: Add unlocked variant of vmapping and attachment-mapping
     functions
 
 Core Changes:
   - atomic-helpers: CRTC primary plane test fixes
   - connector: TV API consistency improvements, cmdline parsing
     improvements
   - crtc-helpers: Introduce drm_crtc_helper_atomic_check() helper
   - edid: Fixes for HFVSDB parsing,
   - fourcc: Addition of the Vivante tiled modifier
   - makefile: Sort and reorganize the objects files
   - mode_config: Remove fb_base from drm_mode_config_funcs
   - sched: Add a module parameter to change the scheduling policy,
     refcounting fix for fences
   - tests: Sort the Kunit tests in the Makefile, improvements to the
     DP-MST tests
   - ttm: Remove unnecessary drm_mm_clean() call
 
 Driver Changes:
   - New driver: ofdrm
   - Move all drivers to a common dma-buf locking convention
   - bridge:
     - adv7533: Remove dynamic lane switching
     - it6505: Runtime PM support
     - ps8640: Handle AUX defer messages
     - tc358775: Drop soft-reset over I2C
   - ast: Atomic Gamma LUT Support, Convert to SHMEM, various
     improvements
   - lcdif: Support for YUV planes
   - mgag200: Fix PLL Setup on some revisions
   - udl: Modesetting improvements, hot-unplug support
   - vc4: Fix support for PAL-M
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCY1D3fQAKCRDj7w1vZxhR
 xZZgAPoCSfyU3+M3ALT0vQ/OpktE5tuwUBMac2Lxkqgx4dnQ0gEAnYeez0Dedod8
 HNdgBH7FTklYa/zT8n17SwHUOzJJ5gc=
 =YvuW
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2022-10-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 6.2:

UAPI Changes:
  - Documentation for page-flip flags

Cross-subsystem Changes:
  - dma-buf: Add unlocked variant of vmapping and attachment-mapping
    functions

Core Changes:
  - atomic-helpers: CRTC primary plane test fixes
  - connector: TV API consistency improvements, cmdline parsing
    improvements
  - crtc-helpers: Introduce drm_crtc_helper_atomic_check() helper
  - edid: Fixes for HFVSDB parsing,
  - fourcc: Addition of the Vivante tiled modifier
  - makefile: Sort and reorganize the objects files
  - mode_config: Remove fb_base from drm_mode_config_funcs
  - sched: Add a module parameter to change the scheduling policy,
    refcounting fix for fences
  - tests: Sort the Kunit tests in the Makefile, improvements to the
    DP-MST tests
  - ttm: Remove unnecessary drm_mm_clean() call

Driver Changes:
  - New driver: ofdrm
  - Move all drivers to a common dma-buf locking convention
  - bridge:
    - adv7533: Remove dynamic lane switching
    - it6505: Runtime PM support
    - ps8640: Handle AUX defer messages
    - tc358775: Drop soft-reset over I2C
  - ast: Atomic Gamma LUT Support, Convert to SHMEM, various
    improvements
  - lcdif: Support for YUV planes
  - mgag200: Fix PLL Setup on some revisions
  - udl: Modesetting improvements, hot-unplug support
  - vc4: Fix support for PAL-M

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20221020072405.g3o4hxuk75gmeumw@houat
2022-10-25 11:42:18 +10:00
Linus Torvalds
1d61754caa xen: branch for v6.1-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY1JhRwAKCRCAXGG7T9hj
 vtcFAQDEkdKa5lWFo6Lut8rraHPyjoCVdLV1J2KJo0vO8rgU+AEA1kuPMqQzpVez
 1xTkZrKnPvIo+Dv515oxDCmzTxGoDgA=
 =1fmO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Just two fixes for the new 'virtio with grants' feature"

* tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/virtio: Convert PAGE_SIZE/PAGE_SHIFT/PFN_UP to Xen counterparts
  xen/virtio: Handle cases when page offset > PAGE_SIZE properly
2022-10-21 14:43:09 -07:00
Maxime Ripard
a140a6a2d5
Merge drm/drm-next into drm-misc-next
Let's kick-off this release cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2022-10-18 15:00:03 +02:00
Dmitry Osipenko
e841ad86e7 xen/gntdev: Prepare to dynamic dma-buf locking specification
Prepare gntdev driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-13-dmitry.osipenko@collabora.com
2022-10-18 01:21:47 +03:00