Commit Graph

5647 Commits

Author SHA1 Message Date
Linus Torvalds
382e391365 hyperv-next for v6.14
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmeTFQ4THHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXqMWB/4uHjnu50u+m00OwXAKQr6i92zh50BZ
 RQragd9s9C8tuUNwPDmS/ct2BNAhoy43KJ0ClegdZjKxT1Ys8cLv4Wr5CaGckqWq
 +WCHqTgt+cPe0vUofqahB5wiAZMsnBgzFkV/OfFwBx0wkub9y5T3qVq5KapYlaDI
 7Gftb+wg1AAsrdZ/HuLRy5ZVvkM/73rU2uoi8WXjr/T14E1krCFR/qirLd1OXo6Q
 Jb97qhnCt/N9JPwIq5/VnYWde5Mpqz6UgtA2rFLDXgNGz+h9/ND6ecWFHjZWNVdc
 AKWZTO5t+fRVBOSyahoyRoYSntPw3wlxyL7A2/54h6j4Dex7wLt6NQBj
 =empO
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - Introduce a new set of Hyper-V headers in include/hyperv and replace
   the old hyperv-tlfs.h with the new headers (Nuno Das Neves)

 - Fixes for the Hyper-V VTL mode (Roman Kisel)

 - Fixes for cpu mask usage in Hyper-V code (Michael Kelley)

 - Document the guest VM hibernation behaviour (Michael Kelley)

 - Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain)

* tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Documentation: hyperv: Add overview of guest VM hibernation
  hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id()
  hyperv: Do not overlap the hvcall IO areas in get_vtl()
  hyperv: Enable the hypercall output page for the VTL mode
  hv_balloon: Fallback to generic_online_page() for non-HV hot added mem
  Drivers: hv: vmbus: Log on missing offers if any
  Drivers: hv: vmbus: Wait for boot-time offers during boot and resume
  uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup
  iommu/hyper-v: Don't assume cpu_possible_mask is dense
  Drivers: hv: Don't assume cpu_possible_mask is dense
  x86/hyperv: Don't assume cpu_possible_mask is dense
  hyperv: Remove the now unused hyperv-tlfs.h files
  hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h
  hyperv: Add new Hyper-V headers in include/hyperv
  hyperv: Clean up unnecessary #includes
  hyperv: Move hv_connection_id to hyperv-tlfs.h
2025-01-25 09:22:55 -08:00
Linus Torvalds
aa44198a6c iommufd 6.14 merge window pull
No major functionality this cycle:
 
 - iommufd part of the domain_alloc_paging_flags() conversion
 
 - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers
 
 - Increase a timeout waiting for other threads to drop transient refcounts
   that syzkaller was hitting
 
 - Fix a UBSAN hit in iova_bitmap due to shift out of bounds
 
 - Add missing cleanup of fault events during FD shutdown, fixing a memory leak
 
 - Improve the fault delivery flow to have a smaller locking critical
   region that does not include copy_to_user()
 
 - Fix 32 bit ABI breakage due to missed implicit padding, and fix the
   stack memory leakage
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ5JgnAAKCRCFwuHvBreF
 YQWrAP9ItOsbeOxYIEVQK1E90HbMWVHF8RhWDPnpChgzyorjjQEA/OOou6uTAcVC
 fJybLk3T7JrutMmBFvVLsRg+yCJPlgE=
 =ldhT
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "No major functionality this cycle:

   - iommufd part of the domain_alloc_paging_flags() conversion

   - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers

   - Increase a timeout waiting for other threads to drop transient
     refcounts that syzkaller was hitting

   - Fix a UBSAN hit in iova_bitmap due to shift out of bounds

   - Add missing cleanup of fault events during FD shutdown, fixing a
     memory leak

   - Improve the fault delivery flow to have a smaller locking critical
     region that does not include copy_to_user()

   - Fix 32 bit ABI breakage due to missed implicit padding, and fix the
     stack memory leakage"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Fix struct iommu_hwpt_pgfault init and padding
  iommufd/fault: Use a separate spinlock to protect fault->deliver list
  iommufd/fault: Destroy response and mutex in iommufd_fault_destroy()
  iommufd: Keep OBJ/IOCTL lists in an alphabetical order
  iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index()
  iommu: iommufd: fix WARNING in iommufd_device_unbind
  iommufd: Deal with IOMMU_HWPT_FAULT_ID_VALID in iommufd core
  iommufd/selftest: Remove domain_alloc_paging()
2025-01-24 12:04:35 -08:00
Linus Torvalds
f1c243fc78 IOMMU Updates for Linux v6.14
Including:
 
 	- Core changes:
 	  - PASID support for the blocked_domain.
 
 	- ARM-SMMU Updates:
 	  - SMMUv2:
 	    * Implement per-client prefetcher configuration on Qualcomm SoCs.
 	    * Support for the Adreno SMMU on Qualcomm's SDM670 SOC.
 	  - SMMUv3:
 	    * Pretty-printing of event records.
 	    * Drop the ->domain_alloc_paging implementation in favour of
 	      ->domain_alloc_paging_flags(flags==0).
 	  - IO-PGTable:
 	    * Generalisation of the page-table walker to enable external walkers
 	      (e.g. for debugging unexpected page-faults from the GPU).
 	    * Minor fix for handling concatenated PGDs at stage-2 with 16KiB pages.
 	  - Misc:
 	    * Clean-up device probing and replace the crufty probe-deferral hack
 	      with a more robust implementation of arm_smmu_get_by_fwnode().
 	    * Device-tree binding updates for a bunch of Qualcomm platforms.
 
 	- Intel VT-d Updates:
 	  - Remove domain_alloc_paging().
 	  - Remove capability audit code.
 	  - Draining PRQ in sva unbind path when FPD bit set.
 	  - Link cache tags of same iommu unit together.
 
 	- AMD-Vi Updates:
 	  - Use CMPXCHG128 to update DTE.
 	  - Cleanups of the domain_alloc_paging() path.
 
 	- RiscV IOMMU:
 	  - Platform MSI support.
 	  - Shutdown support.
 
 	- Rockchip IOMMU:
 	  - Add DT bindings for Rockchip RK3576.
 
 	- More smaller fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmeQsnQACgkQK/BELZcB
 GuOHug/8DDIuKlUHU2+3U2kKxMb8o1kDxGPkfKgXBaxxpprvY9DARRv7N4mnF6Vu
 Db8P7QihXfQKnZHXEiqHE7TlA5B+IVQd3Kz96P4sY3OlVWGZYqyKv2GEHyG5CjN/
 bay7bfgeo2EVEiAio6VToFFWTm+oxFZzhoYFIlAyZAuIQUp17gHXf7YyhUwk4rOz
 8g0XMH6uldidID6BVpArxHh/bN9MOTdHzkyhwPF3FL8E94ziX6rWILH9ADYxBn2o
 wqHR1STxv398k62toPpWb78c2RdANI8treDXsYpCyDF87dygdP+SA0qkK3G6kAVA
 /IiPothAj6oNm+Gvwd04tEkuqVVrVqrmWE3VXSps33Tk+apYtcmLCtdpwY/F93D1
 EZwTVqveBKk2hSWYVDlEyj9XKmZ9dYWDGvg2Fx844gltQHoHtEgYL+azCMU/ry7k
 3+KlkUqFZZxUDbVBbbDdMF+NpxyZTtfKsaLB8f5laOP1R8Os3+dC69Gw6bhoxaMc
 xfL3v245V5kWSRy+w02TyZGmZSzRx0FKUbFLKpLOvZD6pfx8t8oqJTG9FwN1KzpL
 lvcBpPB5AUeNJQpKcVSjs/ne8WiOqdABt7ge4E9J5TwrDI8sXk0mwJaPrlDgK1V/
 0xkkLmxWnqsu04CyDay+Uv3Bls/rRBpikR/iGt9P3BbZs7Cyryw=
 =Xei0
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:
 "Core changes:
   - PASID support for the blocked_domain

  ARM-SMMU Updates:
   - SMMUv2:
      - Implement per-client prefetcher configuration on Qualcomm SoCs
      - Support for the Adreno SMMU on Qualcomm's SDM670 SOC
   - SMMUv3:
      - Pretty-printing of event records
      - Drop the ->domain_alloc_paging implementation in favour of
        domain_alloc_paging_flags(flags==0)
   - IO-PGTable:
      - Generalisation of the page-table walker to enable external
        walkers (e.g. for debugging unexpected page-faults from the GPU)
      - Minor fix for handling concatenated PGDs at stage-2 with 16KiB
        pages
   - Misc:
      - Clean-up device probing and replace the crufty probe-deferral
        hack with a more robust implementation of
        arm_smmu_get_by_fwnode()
      - Device-tree binding updates for a bunch of Qualcomm platforms

  Intel VT-d Updates:
   - Remove domain_alloc_paging()
   - Remove capability audit code
   - Draining PRQ in sva unbind path when FPD bit set
   - Link cache tags of same iommu unit together

  AMD-Vi Updates:
   - Use CMPXCHG128 to update DTE
   - Cleanups of the domain_alloc_paging() path

  RiscV IOMMU:
   - Platform MSI support
   - Shutdown support

  Rockchip IOMMU:
   - Add DT bindings for Rockchip RK3576

  More smaller fixes and cleanups"

* tag 'iommu-updates-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (66 commits)
  iommu: Use str_enable_disable-like helpers
  iommu/amd: Fully decode all combinations of alloc_paging_flags
  iommu/amd: Move the nid to pdom_setup_pgtable()
  iommu/amd: Change amd_iommu_pgtable to use enum protection_domain_mode
  iommu/amd: Remove type argument from do_iommu_domain_alloc() and related
  iommu/amd: Remove dev == NULL checks
  iommu/amd: Remove domain_alloc()
  iommu/amd: Remove unused amd_iommu_domain_update()
  iommu/riscv: Fixup compile warning
  iommu/arm-smmu-v3: Add missing #include of linux/string_choices.h
  iommu/arm-smmu-v3: Use str_read_write helper w/ logs
  iommu/io-pgtable-arm: Add way to debug pgtable walk
  iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys
  iommu/io-pgtable-arm: Make pgtable walker more generic
  iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500
  iommu/arm-smmu: Introduce ACTLR custom prefetcher settings
  iommu/arm-smmu: Add support for PRR bit setup
  iommu/arm-smmu: Refactor qcom_smmu structure to include single pointer
  iommu/arm-smmu: Re-enable context caching in smmu reset operation
  iommu/vt-d: Link cache tags of same iommu unit together
  ...
2025-01-24 07:33:46 -08:00
Linus Torvalds
4c551165e7 Updates for the interrupt subsystem:
- Consolidation of the machine_kexec_mask_interrupts() by providing a
     generic implementation and replacing the copy & pasta orgy in the
     relevant architectures.
 
   - Prevent unconditional operations on interrupt chips during kexec
     shutdown, which can trigger warnings in certain cases when the
     underlying interrupt has been shut down before.
 
   - Make the enforcement of interrupt handling in interrupt context
     unconditionally available, so that it actually works for non x86
     related interrupt chips. The earlier enablement for ARM GIC chips set
     the required chip flag, but did not notice that the check was hidden
     behind a config switch which is not selected by ARM[64].
 
   - Decrapify the handling of deferred interrupt affinity setting. Some
     interrupt chips require that affinity changes are made from the context
     of handling an interrupt to avoid certain race conditions. For x86 this
     was the default, but with interrupt remapping this requirement was
     lifted and a flag was introduced which tells the core code that
     affinity changes can be done in any context. Unrestricted affinity
     changes are the default for the majority of interrupt chips. RISCV has
     the requirement to add the deferred mode to one of it's interrupt
     controllers, but with the original implementation this would require to
     add the any context flag to all other RISC-V interrupt chips. That's
     backwards, so reverse the logic and require that chips, which need the
     deferred mode have to be marked accordingly. That avoids chasing the
     'sane' chips and marking them.
 
   - Add multi-node support to the Loongarch AVEC interrupt controller
     driver.
 
   - The usual tiny cleanups, fixes and improvements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmePkVITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRbQD/9bHVph/V9Ekl7JAX3aY4gG4JbRhOc7
 dp1VAcHRhktRfoTztYRbjsbMu2nvZ58GKA8bkOS2jHSF/m3PbkIJfOhwk0YdIAoa
 +kdy5yDgqCGfkqW43DN4Cr+CnzGjWMitw67tFp3fhwehMDpDjdt2L28IjtanSS0f
 hO6FV7o65MWeJwxk4Isb2/nvkO+X23Lrp6RrWS8SXBnF9FFXxiPIg/fiOPTizhCh
 1W/bSGxLLb9WwsVzmlGAKVFlXDij0QGaIUug2fdVZ63OsELXD7tJrLSPG133yk92
 ppIa0s6BT4IBsfM00us4hG15PkLuJmP3yWWcoquG0rP8Wq58VOXiN6+rcJIyvB+5
 mWceTH6IKfZGoRQKwXC7BxeBAIb147reiJtb06meq1/8ADIvzafiNy0c8x9i/UaV
 QiyhPVENjaGCGDomZmJQqN7Yb02Wge1k8InQnodDrHxZNl/bX/B1Z8Bxd0n6hPHg
 NSJXYif2AxgaddpohsdygqRDbT6SNyQdj7YjJFY5qAGJ3yFyJ4JB6WTqkWW4o1vH
 3FVqdAnJmejAmmYSkah0Hkem2T5QASQmTWb93PLxiV6q+d0NM8stWAujjyVdIV/B
 W4Uj9mQ20cz54TjLtxqX+A1k6KcqOWRgh1l2QbUlFsgsOP3V8yz47yqYdR9qMWlO
 9kNEjI3sw+G/IQ==
 =q4rj
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt subsystem updates from Thomas Gleixner:

 - Consolidate the machine_kexec_mask_interrupts() by providing a
   generic implementation and replacing the copy & pasta orgy in the
   relevant architectures.

 - Prevent unconditional operations on interrupt chips during kexec
   shutdown, which can trigger warnings in certain cases when the
   underlying interrupt has been shut down before.

 - Make the enforcement of interrupt handling in interrupt context
   unconditionally available, so that it actually works for non x86
   related interrupt chips. The earlier enablement for ARM GIC chips set
   the required chip flag, but did not notice that the check was hidden
   behind a config switch which is not selected by ARM[64].

 - Decrapify the handling of deferred interrupt affinity setting.

   Some interrupt chips require that affinity changes are made from the
   context of handling an interrupt to avoid certain race conditions.
   For x86 this was the default, but with interrupt remapping this
   requirement was lifted and a flag was introduced which tells the core
   code that affinity changes can be done in any context. Unrestricted
   affinity changes are the default for the majority of interrupt chips.

   RISCV has the requirement to add the deferred mode to one of it's
   interrupt controllers, but with the original implementation this
   would require to add the any context flag to all other RISC-V
   interrupt chips. That's backwards, so reverse the logic and require
   that chips, which need the deferred mode have to be marked
   accordingly. That avoids chasing the 'sane' chips and marking them.

 - Add multi-node support to the Loongarch AVEC interrupt controller
   driver.

 - The usual tiny cleanups, fixes and improvements all over the place.

* tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set()
  genirq/timings: Add kernel-doc for a function parameter
  genirq: Remove IRQ_MOVE_PCNTXT and related code
  x86/apic: Convert to IRQCHIP_MOVE_DEFERRED
  genirq: Provide IRQCHIP_MOVE_DEFERRED
  hexagon: Remove GENERIC_PENDING_IRQ leftover
  ARC: Remove GENERIC_PENDING_IRQ
  genirq: Remove handle_enforce_irqctx() wrapper
  genirq: Make handle_enforce_irqctx() unconditionally available
  irqchip/loongarch-avec: Add multi-nodes topology support
  irqchip/ts4800: Replace seq_printf() by seq_puts()
  irqchip/ti-sci-inta : Add module build support
  irqchip/ti-sci-intr: Add module build support
  irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function
  irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args
  genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown
  kexec: Consolidate machine_kexec_mask_interrupts() implementation
  genirq: Reuse irq_thread_fn() for forced thread case
  genirq: Move irq_thread_fn() further up in the code
2025-01-21 13:51:07 -08:00
Nicolin Chen
e721f619e3 iommufd: Fix struct iommu_hwpt_pgfault init and padding
The iommu_hwpt_pgfault is used to report IO page fault data to userspace,
but iommufd_fault_fops_read was never zeroing its padding. This leaks the
content of the kernel stack memory to userspace.

Also, the iommufd uAPI requires explicit padding and use of __aligned_u64
to ensure ABI compatibility's with 32 bit.

pahole result, before:
struct iommu_hwpt_pgfault {
        __u32     flags;                /*     0     4 */
        __u32     dev_id;               /*     4     4 */
        __u32     pasid;                /*     8     4 */
        __u32     grpid;                /*    12     4 */
        __u32     perm;                 /*    16     4 */

        /* XXX 4 bytes hole, try to pack */

        __u64     addr;                 /*    24     8 */
        __u32     length;               /*    32     4 */
        __u32     cookie;               /*    36     4 */

        /* size: 40, cachelines: 1, members: 8 */
        /* sum members: 36, holes: 1, sum holes: 4 */
        /* last cacheline: 40 bytes */
};

pahole result, after:
struct iommu_hwpt_pgfault {
        __u32      flags;                /*     0     4 */
        __u32      dev_id;               /*     4     4 */
        __u32      pasid;                /*     8     4 */
        __u32      grpid;                /*    12     4 */
        __u32      perm;                 /*    16     4 */
        __u32      __reserved;           /*    20     4 */
        __u64      addr __attribute__((__aligned__(8))); /*    24     8 */
        __u32      length;               /*    32     4 */
        __u32      cookie;               /*    36     4 */

        /* size: 40, cachelines: 1, members: 9 */
        /* forced alignments: 1 */
        /* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));

Fixes: c714f15860 ("iommufd: Add fault and response message definitions")
Link: https://patch.msgid.link/r/20250120195051.2450-1-nicolinc@nvidia.com
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-21 13:55:49 -04:00
Nicolin Chen
3d49020a32 iommufd/fault: Use a separate spinlock to protect fault->deliver list
The fault->mutex serializes the fault read()/write() fops and the
iommufd_fault_auto_response_faults(), mainly for fault->response. Also, it
was conveniently used to fence the fault->deliver in poll() fop and
iommufd_fault_iopf_handler().

However, copy_from/to_user() may sleep if pagefaults are enabled. Thus,
they could take a long time to wait for user pages to swap in, blocking
iommufd_fault_iopf_handler() and its caller that is typically a shared IRQ
handler of an IOMMU driver, resulting in a potential global DOS.

Instead of reusing the mutex to protect the fault->deliver list, add a
separate spinlock, nested under the mutex, to do the job.
iommufd_fault_iopf_handler() would no longer be blocked by
copy_from/to_user().

Add a free_list in iommufd_auto_response_faults(), so the spinlock can
simply fence a fast list_for_each_entry_safe routine.

Provide two deliver list helpers for iommufd_fault_fops_read() to use:
 - Fetch the first iopf_group out of the fault->deliver list
 - Restore an iopf_group back to the head of the fault->deliver list

Lastly, move the mutex closer to the response in the fault structure,
and update its kdoc accordingly.

Fixes: 07838f7fd5 ("iommufd: Add iommufd fault object")
Link: https://patch.msgid.link/r/20250117192901.79491-1-nicolinc@nvidia.com
Cc: stable@vger.kernel.org
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-20 12:31:15 -04:00
Joerg Roedel
125f34e4c1 Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'qualcomm/msm', 'rockchip', 'riscv', 'core', 'intel/vt-d' and 'amd/amd-vi' into next 2025-01-17 09:02:35 +01:00
Krzysztof Kozlowski
54e7d90089 iommu: Use str_enable_disable-like helpers
Replace ternary (condition ? "enable" : "disable") syntax with helpers
from string_choices.h because:
1. Simple function call with one argument is easier to read.  Ternary
   operator has three arguments and with wrapping might lead to quite
   long code.
2. Is slightly shorter thus also easier to read.
3. It brings uniformity in the text - same string.
4. Allows deduping by the linker, which results in a smaller binary
   file.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Pranjal Shrivastava <praan@google.com>
Link: https://lore.kernel.org/r/20250114192642.912331-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17 09:00:37 +01:00
Jason Gunthorpe
082f1bcae8 iommu/amd: Fully decode all combinations of alloc_paging_flags
Currently AMD does not support
 IOMMU_HWPT_ALLOC_PASID | IOMMU_HWPT_ALLOC_DIRTY_TRACKING

It should be rejected. Instead it creates a V1 domain without dirty
tracking support.

Use a switch to fully decode the flags.

Fixes: ce2cd17546 ("iommu/amd: Enhance amd_iommu_domain_alloc_user()")
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17 08:59:32 +01:00
Jason Gunthorpe
5a081f7f42 iommu/amd: Move the nid to pdom_setup_pgtable()
The only thing that uses the nid is the io_pgtable code, and it should be
set before calling alloc_io_pgtable_ops() to ensure that the top levels
are allocated on the correct nid.

Since dev is never NULL now we can just do this trivially and remove the
other uses of nid. SVA and identity code paths never use it since they
don't use io_pgtable.

Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17 08:59:31 +01:00
Jason Gunthorpe
13b4ec7491 iommu/amd: Change amd_iommu_pgtable to use enum protection_domain_mode
Currently it uses enum io_pgtable_fmt which is from the io pagetable code
and most of the enum values are invalid. protection_domain_mode is
internal the driver and has the only two valid values.

Fix some signatures and variables to use the right type as well.

Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17 08:59:30 +01:00
Jason Gunthorpe
55b237dd7f iommu/amd: Remove type argument from do_iommu_domain_alloc() and related
do_iommu_domain_alloc() is only called from
amd_iommu_domain_alloc_paging_flags() so type is always
IOMMU_DOMAIN_UNMANAGED. Remove type and all the dead conditionals checking
it.

IOMMU_DOMAIN_IDENTITY checks are similarly obsolete as the conversion to
the global static identity domain removed those call paths.

The caller of protection_domain_alloc() should set the type, fix the miss
in the SVA code.

Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17 08:59:29 +01:00
Jason Gunthorpe
02bcd1a8b9 iommu/amd: Remove dev == NULL checks
This is no longer possible, amd_iommu_domain_alloc_paging_flags() is never
called with dev = NULL from the core code. Similarly
get_amd_iommu_from_dev() can never be NULL either.

Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17 08:59:29 +01:00
Jason Gunthorpe
f9b80f941e iommu/amd: Remove domain_alloc()
IOMMU drivers should not be sensitive to the domain type, a paging domain
should be created based only on the flags passed in, the same for all
callers.

AMD was using the domain_alloc() path to force VFIO into a v1 domain type,
because v1 gives higher performance. However now that
IOMMU_HWPT_ALLOC_PASID is present, and a NULL device is not possible,
domain_alloc_paging_flags() will do the right thing for VFIO.

When invoked from VFIO flags will be 0 and the amd_iommu_pgtable type of
domain will be selected. This is v1 by default unless the kernel command
line has overridden it to v2.

If the admin is forcing v2 assume they know what they are doing so force
it everywhere, including for VFIO.

Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17 08:59:28 +01:00
Alejandro Jimenez
1a684b099f iommu/amd: Remove unused amd_iommu_domain_update()
All the callers have been removed by the below commit, remove the
implementation and prototypes.

Fixes: 322d889ae7 ("iommu/amd: Remove amd_iommu_domain_update() from page table freeing")
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17 08:59:28 +01:00
Guo Ren
10c62c38b0 iommu/riscv: Fixup compile warning
When __BITS_PER_LONG == 32, size_t is defined as unsigned int rather
than unsigned long. Therefore, we should use size_t to avoid
type-checking errors.

Fixes: 488ffbf181 ("iommu/riscv: Paging domain support")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Tomasz Jeznach <tjeznach@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Link: https://lore.kernel.org/r/20250103024616.3359159-1-guoren@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17 08:58:06 +01:00
Nicolin Chen
3f4818ec13 iommufd/fault: Destroy response and mutex in iommufd_fault_destroy()
Both were missing in the initial patch.

Fixes: 07838f7fd5 ("iommufd: Add iommufd fault object")
Link: https://patch.msgid.link/r/bc8bb13e215af27e62ee51bdba3648dd4ed2dce3.1736923732.git.nicolinc@nvidia.com
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-16 16:40:33 -04:00
Thomas Gleixner
7d04319a05 x86/apic: Convert to IRQCHIP_MOVE_DEFERRED
Instead of marking individual interrupts as safe to be migrated in
arbitrary contexts, mark the interrupt chips, which require the interrupt
to be moved in actual interrupt context, with the new IRQCHIP_MOVE_DEFERRED
flag. This makes more sense because this is a per interrupt chip property
and not restricted to individual interrupts.

That flips the logic from the historical opt-out to a opt-in model. This is
simpler to handle for other architectures, which default to unrestricted
affinity setting. It also allows to cleanup the redundant core logic
significantly.

All interrupt chips, which belong to a top-level domain sitting directly on
top of the x86 vector domain are marked accordingly, unless the related
setup code marks the interrupts with IRQ_MOVE_PCNTXT, i.e. XEN.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/all/20241210103335.563277044@linutronix.de
2025-01-15 21:38:53 +01:00
Nicolin Chen
442003f3a8 iommufd: Keep OBJ/IOCTL lists in an alphabetical order
Reorder the existing OBJ/IOCTL lists.

Also run clang-format for the same coding style at line wrappings.

No functional change.

Link: https://patch.msgid.link/r/c5e6d6e0a0bb7abc92ad26937fde19c9426bee96.1736237481.git.nicolinc@nvidia.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-14 15:26:46 -04:00
Qasim Ijaz
e24c155105 iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index()
Resolve a UBSAN shift-out-of-bounds issue in iova_bitmap_offset_to_index()
where shifting the constant "1" (of type int) by bitmap->mapped.pgshift
(an unsigned long value) could result in undefined behavior.

The constant "1" defaults to a 32-bit "int", and when "pgshift" exceeds
31 (e.g., pgshift = 63) the shift operation overflows, as the result
cannot be represented in a 32-bit type.

To resolve this, the constant is updated to "1UL", promoting it to an
unsigned long type to match the operand's type.

Fixes: 58ccf0190d ("vfio: Add an IOVA bitmap support")
Link: https://patch.msgid.link/r/20250113223820.10713-1-qasdev00@gmail.com
Reported-by: syzbot <syzbot+85992ace37d5b7b51635@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=85992ace37d5b7b51635
Signed-off-by: Qasim Ijaz <qasdev00@gmail.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-14 13:53:18 -04:00
Suraj Sonawane
d9df72c6ac iommu: iommufd: fix WARNING in iommufd_device_unbind
Fix an issue detected by syzbot:

WARNING in iommufd_device_unbind iommufd: Time out waiting for iommufd object to become free

Resolve a warning in iommufd_device_unbind caused by a timeout while
waiting for the shortterm_users reference count to reach zero. The
existing 10-second timeout is insufficient in some scenarios, resulting in
failures the above warning.

Increase the timeout in iommufd_object_dec_wait_shortterm from 10 seconds
to 60 seconds to allow sufficient time for the reference count to drop to
zero. This change prevents premature timeouts and reduces the likelihood
of warnings during iommufd_device_unbind.

Fixes: 6f9c4d8c46 ("iommufd: Do not UAF during iommufd_put_object()")
Link: https://patch.msgid.link/r/20241123195900.3176-1-surajsonawane0215@gmail.com
Reported-by: syzbot+c92878e123785b1fa2db@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c92878e123785b1fa2db
Tested-by: syzbot+c92878e123785b1fa2db@syzkaller.appspotmail.com
Signed-off-by: Suraj Sonawane <surajsonawane0215@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-14 13:50:05 -04:00
Will Deacon
1f3dc29d24 iommu/arm-smmu-v3: Add missing #include of linux/string_choices.h
Commit f2c77f6e41 ("iommu/arm-smmu-v3: Use str_read_write helper w/
logs") introduced a call to str_read_write() in the SMMUv3 driver but
without an explicit #include of <linux/string_choices.h>. This breaks
the build for custom configurations where CONFIG_ACPI=n:

drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:1909:4: error: call to
	undeclared function 'str_read_write'; ISO C99 and later do not
	support implicit function declarations [-Wimplicit-function-declaration]
 1909 |                         str_read_write(evt->read),
      |                         ^

Add the missing #include.

Link: https://lore.kernel.org/r/d07e82a4-2880-4ae3-961b-471bfa7ac6c4@samsung.com
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: f2c77f6e41 ("iommu/arm-smmu-v3: Use str_read_write helper w/ logs")
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-10 13:28:56 +00:00
Michael Kelley
4f6b64f3d3 iommu/hyper-v: Don't assume cpu_possible_mask is dense
Current code gets the APIC IDs for CPUs numbered 255 and lower.
This code assumes cpu_possible_mask is dense, which is not true in
the general case per [1]. If cpu_possible_mask contains holes,
num_possible_cpus() is less than nr_cpu_ids, so some CPUs might get
skipped. Furthermore, getting the APIC ID of a CPU that isn't in
cpu_possible_mask is invalid.

However, the configurations that Hyper-V provides to guest VMs on x86
hardware, in combination with how x86 code assigns Linux CPU numbers,
*does* always produce a dense cpu_possible_mask. So the dense assumption
is not currently causing failures. But for robustness against future
changes in how cpu_possible_mask is populated, update the code to no
longer assume dense.

The correct approach is to determine the range to scan based on
nr_cpu_ids, and skip any CPUs that are not in the cpu_possible_mask.

[1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241003035333.49261-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20241003035333.49261-4-mhklinux@outlook.com>
2025-01-10 00:54:21 +00:00
Pranjal Shrivastava
f2c77f6e41 iommu/arm-smmu-v3: Use str_read_write helper w/ logs
Adopt the `str_read_write` helper in event logging as suggested by the
coccinelle tool.

Signed-off-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/all/20250107130053.GC6991@willie-the-truck/
Link: https://lore.kernel.org/r/20250107165100.1093357-1-praan@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-08 14:00:34 +00:00
Rob Clark
aff028a819 iommu/io-pgtable-arm: Add way to debug pgtable walk
Add an io-pgtable method to walk the pgtable returning the raw PTEs that
would be traversed for a given iova access.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241210165127.600817-4-robdclark@gmail.com
[will: Removed 'arm_lpae_io_pgtable_walk_data::level' per Mostafa]
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07 15:44:20 +00:00
Rob Clark
d9e589e6ad iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys
Re-use the generic pgtable walk path.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241210165127.600817-3-robdclark@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07 15:42:23 +00:00
Rob Clark
821500d5c5 iommu/io-pgtable-arm: Make pgtable walker more generic
We can re-use this basic pgtable walk logic in a few places.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241210165127.600817-2-robdclark@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07 15:42:23 +00:00
Bibek Kumar Patro
3e35c3e725 iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500
Add ACTLR data table for qcom_smmu_500 including corresponding data
entry and set prefetch value by way of a list of compatible strings.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
Link: https://lore.kernel.org/r/20241212151402.159102-6-quic_bibekkum@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07 13:55:28 +00:00
Bibek Kumar Patro
9fe18d825a iommu/arm-smmu: Introduce ACTLR custom prefetcher settings
Currently in Qualcomm SoCs the default prefetch is set to 1 which allows
the TLB to fetch just the next page table. MMU-500 features ACTLR
register which is implementation defined and is used for Qualcomm SoCs
to have a custom prefetch setting enabling TLB to prefetch the next set
of page tables accordingly allowing for faster translations.

ACTLR value is unique for each SMR (Stream matching register) and stored
in a pre-populated table. This value is set to the register during
context bank initialisation.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
Link: https://lore.kernel.org/r/20241212151402.159102-5-quic_bibekkum@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07 13:55:28 +00:00
Bibek Kumar Patro
7f2ef1bfc7 iommu/arm-smmu: Add support for PRR bit setup
Add an adreno-smmu-priv interface for drm/msm to call into arm-smmu-qcom
and initiate the "Partially Resident Region" (PRR) bit setup or reset
sequence as per request.

This will be used by GPU to setup the PRR bit and related configuration
registers through adreno-smmu private interface instead of directly
poking the smmu hardware.

Suggested-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
Link: https://lore.kernel.org/r/20241212151402.159102-4-quic_bibekkum@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07 13:55:07 +00:00
Bibek Kumar Patro
445d7a8ed9 iommu/arm-smmu: Refactor qcom_smmu structure to include single pointer
qcom_smmu_match_data is static and constant so refactor qcom_smmu to
store single pointer to qcom_smmu_match_data instead of replicating
multiple child members of the same and handle the further dereferences
in the places that want them.

Suggested-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
Link: https://lore.kernel.org/r/20241212151402.159102-3-quic_bibekkum@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07 13:26:51 +00:00
Bibek Kumar Patro
ef4144b1b4 iommu/arm-smmu: Re-enable context caching in smmu reset operation
Default MMU-500 reset operation disables context caching in prefetch
buffer. It is however expected for context banks using the ACTLR
register to retain their prefetch value during reset and runtime
suspend.

Add config 'ARM_SMMU_MMU_500_CPRE_ERRATA' to gate this errata workaround
in default MMU-500 reset operation which defaults to 'Y' and provide
option to disable workaround for context caching in prefetch buffer as
and when needed.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
Link: https://lore.kernel.org/r/20241212151402.159102-2-quic_bibekkum@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07 13:26:28 +00:00
Zhenzhong Duan
acf5d49aaf iommu/vt-d: Link cache tags of same iommu unit together
Cache tag invalidation requests for a domain are accumulated until a
different iommu unit is found when traversing the cache_tags linked list.
But cache tags of same iommu unit can be distributed in the linked list,
this make batched flush less efficient. E.g., one device backed by iommu0
is attached to a domain in between two devices attaching backed by iommu1.

Group cache tags together for same iommu unit in cache_tag_assign() to
maximize the performance of batched flush.

Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/r/20241219054358.8654-1-zhenzhong.duan@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07 09:30:53 +01:00
Lu Baolu
cf08ca81d0 iommu/vt-d: Draining PRQ in sva unbind path when FPD bit set
When a device uses a PASID for SVA (Shared Virtual Address), it's possible
that the PASID entry is marked as non-present and FPD bit set before the
device flushes all ongoing DMA requests and removes the SVA domain. This
can occur when an exception happens and the process terminates before the
device driver stops DMA and calls the iommu driver to unbind the PASID.

There's no need to drain the PRQ in the mm release path. Instead, the PRQ
will be drained in the SVA unbind path. But in such case,
intel_pasid_tear_down_entry() only checks the presence of the pasid entry
and returns directly.

Add the code to clear the FPD bit and drain the PRQ.

Fixes: c43e1ccdeb ("iommu/vt-d: Drain PRQs when domain removed from RID")
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20241217024240.139615-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07 09:30:52 +01:00
Lu Baolu
c220629940 iommu/vt-d: Remove iommu cap audit
The capability audit code was introduced by commit <ad3d19029979>
"iommu/vt-d: Audit IOMMU Capabilities and add helper functions", aiming
to verify the consistency of capabilities across all IOMMUs for supported
features.

Nowadays, all the kAPIs of the iommu subsystem have evolved to be device
oriented, in preparation for supporting heterogeneous IOMMU architectures.
There is no longer a need to require capability consistence among IOMMUs
for any feature.

Remove the iommu cap audit code to make the driver align with the design
in the iommu core.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20241216071828.22962-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07 09:30:51 +01:00
Jason Gunthorpe
de1dda7e0b iommu/vt-d: Remove domain_alloc_paging()
This is duplicated by intel_iommu_domain_alloc_paging_flags(), just remove
it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/0-v1-b101d00c5ee5+17645-vtd_paging_flags_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07 09:30:50 +01:00
Kees Bakker
60f030f741 iommu/vt-d: Avoid use of NULL after WARN_ON_ONCE
There is a WARN_ON_ONCE to catch an unlikely situation when
domain_remove_dev_pasid can't find the `pasid`. In case it nevertheless
happens we must avoid using a NULL pointer.

Signed-off-by: Kees Bakker <kees@ijzerbout.nl>
Link: https://lore.kernel.org/r/20241218201048.E544818E57E@bout3.ijzerbout.nl
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07 09:30:50 +01:00
Gao Shiyuan
5bb494d5cb iommu/amd: remove return value of amd_iommu_detect
The return value of amd_iommu_detect is not used, so remove it and
is consistent with other iommu detect functions.

Signed-off-by: Gao Shiyuan <gaoshiyuan@baidu.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250103165808.80939-1-gaoshiyuan@baidu.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-06 12:42:00 +01:00
Zhang Heng
afc0cbc6e2 iommu/msm: Use helper function devm_clk_get_prepared()
Since commit 7ef9651e97 ("clk: Provide new devm_clk helpers for prepared
and enabled clocks"), devm_clk_get() and clk_prepare() can now be replaced
by devm_clk_get_prepared() when driver prepares the clocks for the whole
lifetime of the device. Moreover, it is no longer necessary to unprepare
the clocks explicitly.

Signed-off-by: Zhang Heng <zhangheng@kylinos.cn>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20250103113059.463033-1-zhangheng@kylinos.cn
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-06 12:41:00 +01:00
Xu Lu
77a44196ab iommu/riscv: Add shutdown function for iommu driver
This commit supplies shutdown callback for iommu driver. The shutdown
callback resets necessary registers so that newly booted kernel can pass
riscv_iommu_init_check() after kexec. Also, the shutdown callback resets
iommu mode to bare instead of off so that new kernel can still use PCIE
devices even when CONFIG_RISCV_IOMMU is not enabled.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Link: https://lore.kernel.org/r/20250103093220.38106-3-luxu.kernel@bytedance.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-06 12:38:11 +01:00
Xu Lu
8d8d3752c0 iommu/riscv: Empty iommu queue before enabling it
Changing cqen/fqen/pqen from 0 to 1 sets the cqh/fqt/pqt registers to 0.
But the cqt/fqh/pqh registers are left unmodified. This commit resets
cqt/fqh/pqh registers to ensure corresponding queues are empty before
being enabled during initialization.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Link: https://lore.kernel.org/r/20250103093220.38106-2-luxu.kernel@bytedance.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-06 12:37:19 +01:00
Nicolin Chen
e94dc6ddda iommu/tegra241-cmdqv: Read SMMU IDR1.CMDQS instead of hardcoding
The hardware limitation "max=19" actually comes from SMMU Command Queue.
So, it'd be more natural for tegra241-cmdqv driver to read it out rather
than hardcoding it itself.

This is not an issue yet for a kernel on a baremetal system, but a guest
kernel setting the queue base/size in form of IPA/gPA might result in a
noncontiguous queue in the physical address space, if underlying physical
pages backing up the guest RAM aren't contiguous entirely: e.g. 2MB-page
backed guest RAM cannot guarantee a contiguous queue if it is 8MB (capped
to VCMDQ_LOG2SIZE_MAX=19). This might lead to command errors when HW does
linear-read from a noncontiguous queue memory.

Adding this extra IDR1.CMDQS cap (in the guest kernel) allows VMM to set
SMMU's IDR1.CMDQS=17 for the case mentioned above, so a guest-level queue
will be capped to maximum 2MB, ensuring a contiguous queue memory.

Fixes: a3799717b8 ("iommu/tegra241-cmdqv: Fix alignment failure at max_n_shift")
Reported-by: Ian Kalinowski <ikalinowski@nvidia.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/20241219051421.1850267-1-nicolinc@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-19 15:44:11 +00:00
Mostafa Saleh
b7b8a63055 iommu/io-pgtable-arm: Fix cfg reading in arm_lpae_concat_mandatory()
The newly introduced arm_lpae_concat_mandatory() function reads the
ias/oas fields from the 'io_pgtable_cfg' copy embedded inside the
'arm_lpae_io_pgtable' structure. However, this copy is not set until
later in alloc_io_pgtable_ops() after the alloc() function has been
called.

Use the address sizes passed in the 'io_pgtable_cfg' structure when
deciding whether or not to concatenate the PGD.

Fixes: 4dcac8407f ("iommu/io-pgtable-arm: Fix stage-2 concatenation with 16K")
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241215200412.561400-1-smostafa@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-19 15:41:27 +00:00
Yi Liu
647b7aad19 iommu: Remove the remove_dev_pasid op
The iommu drivers that supports PASID have supported attaching pasid to the
blocked_domain, hence remove the remove_dev_pasid op from the iommu_ops.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20241204122928.11987-8-yi.l.liu@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18 09:39:37 +01:00
Yi Liu
5f53638882 iommu/amd: Make the blocked domain support PASID
The blocked domain can be extended to park PASID of a device to be the
DMA blocking state. By this the remove_dev_pasid() op is dropped.

Remove PASID from old domain and device GCR3 table. No need to attach
PASID to the blocked domain as clearing PASID from GCR3 table will make
sure all DMAs for that PASID are blocked.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20241204122928.11987-7-yi.l.liu@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18 09:39:37 +01:00
Yi Liu
4f0bdab175 iommu/vt-d: Make the blocked domain support PASID
The blocked domain can be extended to park PASID of a device to be the
DMA blocking state. By this the remove_dev_pasid() op is dropped.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20241204122928.11987-6-yi.l.liu@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18 09:39:36 +01:00
Jason Gunthorpe
ef181762cb iommu/arm-smmu-v3: Make the blocked domain support PASID
The blocked domain is used to park RID to be blocking DMA state. This
can be extended to PASID as well. By this, the remove_dev_pasid() op
of ARM SMMUv3 can be dropped.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20241204122928.11987-5-yi.l.liu@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18 09:39:36 +01:00
Yi Liu
b18301b915 iommu: Detaching pasid by attaching to the blocked_domain
The iommu drivers are on the way to detach pasid by attaching to the blocked
domain. However, this cannot be done in one shot. During the transition, iommu
core would select between the remove_dev_pasid op and the blocked domain.

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20241204122928.11987-4-yi.l.liu@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18 09:39:35 +01:00
Yi Liu
1fbf73425f iommu: Consolidate the ops->remove_dev_pasid usage into a helper
Add a wrapper for the ops->remove_dev_pasid, this consolidates the iommu_ops
fetching and callback invoking. It is also a preparation for starting the
transition from using remove_dev_pasid op to detach pasid to the way using
blocked_domain to detach pasid.

Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20241204122928.11987-3-yi.l.liu@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18 09:39:35 +01:00
Yi Liu
fb3de9f9b0 iommu: Prevent pasid attach if no ops->remove_dev_pasid
driver should implement both set_dev_pasid and remove_dev_pasid op, otherwise
it is a problem how to detach pasid. In reality, it is impossible that an
iommu driver implements set_dev_pasid() but no remove_dev_pasid() op. However,
it is better to check it.

Move the group check to be the first as dev_iommu_ops() may fail when there
is no valid group. Also take the chance to remove the dev_has_iommu() check
as it is duplicated to the group check.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20241204122928.11987-2-yi.l.liu@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18 09:39:34 +01:00