mirror of
git://git.yoctoproject.org/linux-yocto.git
synced 2025-08-21 16:31:14 +02:00
master
6 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
8874d92b57 |
dmaengine updates for v6.12
New support: - Support for AMD Versal Gen 2 DMA IP - Rcar RZ/G3S SoC dma controller - Support for Intel Diamond Rapids and Granite Rapids-D dma controllers - Support for Freescale ls1021a-qdma controller - New driver for Loongson-1 APB DMA - New driver for AMD QDMA - Pl08x in LPC32XX router dma driver Updates: - Support for dpdma cyclic dma mode - XML conversion for marvell xor dma bindings - Dma clocks documentation for imx dma -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmbv/CAACgkQfBQHDyUj g0cEnRAAsv/rEDi47ku1dkbyOOfbhySj3/TyhOWs0eXPlJLrlMEC4b1i5QIP+Ldd ldzYIdlMLVbmFD+VfTOgqGY2qPgBTdahoCSaTH/GoiCV9OnKt2ImjoZ7KSWQjd90 Qtl8IQnuNBFMl3+DchU/EtGWA4+5dGM3y0p+TdIHxtf7z8Quos9XCp5QaTw8+4wg TMeQQKaoGjjm9aOBNwk9B9f3yqYB39ZmmqWfhII2395wrWulQYFsji6DnQhVcHrs EZ4w2lRIyqELszHSZ/8VPFfzyolsaDjTwC+1zZwVg8gFna2KOpZgRx1a16B+KZ7n w6BnMERLCTgkle9FQdhnzCeG8pZpk2YJT1OYLZHSH2XSHau9O7d9PSQ8mEwecDEz Iq3x0HfcZ5Cz2sansCwGTJQQpSvLqw74hybhhiFk6BQhEh3Bjq+7AjJvTQRGK2X8 w+v4e2hx7SpPCMIlXtdSf/1YhHil9897JS/nbAdYW2PflZhPSbDbH2c0hklab08K BdbriaAoT5XkQB0LKSDNHDcl+NxX0gph/geHpg2QD+JuJOydHR1El6nI2/d8Styj w07QKVCa/O46Bxz4AwQNIFOVL8JnygNTBjaJKof+6DX2/v4TikMcJYnVMj3i1fWB t/NsBzim1zliiGaZNiFoo3obzuzlYs2/yZ++dYZYY8oyLRGjJjM= =beF+ -----END PGP SIGNATURE----- Merge tag 'dmaengine-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "Unusually, more new driver and device support than updates. Couple of new device support, AMD, Rcar, Intel and New drivers in Freescale, Loonsoon, AMD and LPC32XX with DT conversion and mode updates etc. New support: - Support for AMD Versal Gen 2 DMA IP - Rcar RZ/G3S SoC dma controller - Support for Intel Diamond Rapids and Granite Rapids-D dma controllers - Support for Freescale ls1021a-qdma controller - New driver for Loongson-1 APB DMA - New driver for AMD QDMA - Pl08x in LPC32XX router dma driver Updates: - Support for dpdma cyclic dma mode - XML conversion for marvell xor dma bindings - Dma clocks documentation for imx dma" * tag 'dmaengine-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (24 commits) dmaengine: loongson1-apb-dma: Fix the build warning caused by the size of pdev_irqname dmaengine: Fix spelling mistakes dmaengine: Add dma router for pl08x in LPC32XX SoC dmaengine: fsl-edma: add edma src ID check at request channel dmaengine: fsl-edma: change to guard(mutex) within fsl_edma3_xlate() dmaengine: avoid non-constant format string dmaengine: imx-dma: Remove i.MX21 support dt-bindings: dma: fsl,imx-dma: Document the DMA clocks dmaengine: Loongson1: Add Loongson-1 APB DMA driver dt-bindings: dma: Add Loongson-1 APB DMA dmaengine: zynqmp_dma: Add support for AMD Versal Gen 2 DMA IP dt-bindings: dmaengine: zynqmp_dma: Add a new compatible string dmaengine: idxd: Add new DSA and IAA device IDs for Diamond Rapids platform dmaengine: idxd: Add a new DSA device ID for Granite Rapids-D platform dmaengine: ti: k3-udma: Remove unused declarations dmaengine: amd: qdma: Add AMD QDMA driver dmaengine: xilinx: dpdma: Add support for cyclic dma mode dma: ipu: Remove include/linux/dma/ipu-dma.h dt-bindings: dma: fsl-mxs-dma: Add compatible string "fsl,imx8qxp-dma-apbh" dt-bindings: fsl-qdma: allow compatible string fallback to fsl,ls1021a-qdma ... |
||
![]() |
bbdd4df35c |
dmaengine: idxd: Clean up cpumask and hotplug for perfmon
The idxd PMU is system-wide scope, which is supported by the generic perf_event subsystem now. Set the scope for the idxd PMU and remove all the cpumask and hotplug codes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20240802151643.1691631-6-kan.liang@linux.intel.com |
||
![]() |
8bce5522a1 |
dmaengine: idxd: Convert comma to semicolon
Replace a comma between expression statements by a semicolon for more readability. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240711013436.2655373-1-nichen@iscas.ac.cn Signed-off-by: Vinod Koul <vkoul@kernel.org> |
||
![]() |
f221033f5c |
dmaengine: idxd: Fix oops during rmmod on single-CPU platforms
During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:
BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>
Fix the issue by preventing the migration of the perf context to an
invalid target.
Fixes:
|
||
![]() |
cae701b9cc |
dmaengine:idxd: Use local64_try_cmpxchg in perfmon_pmu_event_update
Use local64_try_cmpxchg instead of local64_cmpxchg (*ptr, old, new) == old in perfmon_pmu_event_update. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vinod Koul <vkoul@kernel.org> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Link: https://lore.kernel.org/r/20230703145346.5206-1-ubizjak@gmail.com Signed-off-by: Vinod Koul <vkoul@kernel.org> |
||
![]() |
81dd4d4d61 |
dmaengine: idxd: Add IDXD performance monitor support
Implement the IDXD performance monitor capability (named 'perfmon' in the DSA (Data Streaming Accelerator) spec [1]), which supports the collection of information about key events occurring during DSA and IAX (Intel Analytics Accelerator) device execution, to assist in performance tuning and debugging. The idxd perfmon support is implemented as part of the IDXD driver and interfaces with the Linux perf framework. It has several features in common with the existing uncore pmu support: - it does not support sampling - does not support per-thread counting However it also has some unique features not present in the core and uncore support: - all general-purpose counters are identical, thus no event constraints - operation is always system-wide While the core perf subsystem assumes that all counters are by default per-cpu, the uncore pmus are socket-scoped and use a cpu mask to restrict counting to one cpu from each socket. IDXD counters use a similar strategy but expand the scope even further; since IDXD counters are system-wide and can be read from any cpu, the IDXD perf driver picks a single cpu to do the work (with cpu hotplug notifiers to choose a different cpu if the chosen one is taken off-line). More specifically, the perf userspace tool by default opens a counter for each cpu for an event. However, if it finds a cpumask file associated with the pmu under sysfs, as is the case with the uncore pmus, it will open counters only on the cpus specified by the cpumask. Since perfmon only needs to open a single counter per event for a given IDXD device, the perfmon driver will create a sysfs cpumask file for the device and insert the first cpu of the system into it. When a user uses perf to open an event, perf will open a single counter on the cpu specified by the cpu mask. This amounts to the default system-wide rather than per-cpu counting mentioned previously for perfmon pmu events. In order to keep the cpu mask up-to-date, the driver implements cpu hotplug support for multiple devices, as IDXD usually enumerates and registers more than one idxd device. The perfmon driver implements basic perfmon hardware capability discovery and configuration, and is initialized by the IDXD driver's probe function. During initialization, the driver retrieves the total number of supported performance counters, the pmu ID, and the device type from idxd device, and registers itself under the Linux perf framework. The perf userspace tool can be used to monitor single or multiple events depending on the given configuration, as well as event groups, which are also supported by the perfmon driver. The user configures events using the perf tool command-line interface by specifying the event and corresponding event category, along with an optional set of filters that can be used to restrict counting to specific work queues, traffic classes, page and transfer sizes, and engines (See [1] for specifics). With the configuration specified by the user, the perf tool issues a system call passing that information to the kernel, which uses it to initialize the specified event(s). The event(s) are opened and started, and following termination of the perf command, they're stopped. At that point, the perfmon driver will read the latest count for the event(s), calculate the difference between the latest counter values and previously tracked counter values, and display the final incremental count as the event count for the cycle. An overflow handler registered on the IDXD irq path is used to account for counter overflows, which are signaled by an overflow interrupt. Below are a couple of examples of perf usage for monitoring DSA events. The following monitors all events in the 'engine' category. Becuuse no filters are specified, this captures all engine events for the workload, which in this case is 19 iterations of the work generated by the kernel dmatest module. Details describing the events can be found in Appendix D of [1], Performance Monitoring Events, but briefly they are: event 0x1: total input data processed, in 32-byte units event 0x2: total data written, in 32-byte units event 0x4: number of work descriptors that read the source event 0x8: number of work descriptors that write the destination event 0x10: number of work descriptors dispatched from batch descriptors event 0x20: number of work descriptors dispatched from work queues # perf stat -e dsa0/event=0x1,event_category=0x1/, dsa0/event=0x2,event_category=0x1/, dsa0/event=0x4,event_category=0x1/, dsa0/event=0x8,event_category=0x1/, dsa0/event=0x10,event_category=0x1/, dsa0/event=0x20,event_category=0x1/ modprobe dmatest channel=dma0chan0 timeout=2000 iterations=19 run=1 wait=1 Performance counter stats for 'system wide': 5,332 dsa0/event=0x1,event_category=0x1/ 5,327 dsa0/event=0x2,event_category=0x1/ 19 dsa0/event=0x4,event_category=0x1/ 19 dsa0/event=0x8,event_category=0x1/ 0 dsa0/event=0x10,event_category=0x1/ 19 dsa0/event=0x20,event_category=0x1/ 21.977436186 seconds time elapsed The command below illustrates filter usage with a simple example. It specifies that MEM_MOVE operations should be counted for the DSA device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number of Memory Move Descriptors, which is part of event category 0x3 - Operations. The detailed category and event IDs are available in Appendix D, Performance Monitoring Events, of [1]). In addition to the event and event category, a number of filters are also specified (the detailed filter values are available in Chapter 6.4 (Filter Support) of [1]), which will restrict counting to only those events that meet all of the filter criteria. In this case, the filters specify that only MEM_MOVE operations that are serviced by work queue wq0 and specifically engine number engine0 and traffic class tc0 having sizes between 0 and 4k and page size of between 0 and 1G result in a counter hit; anything else will be filtered out and not appear in the final count. Note that filters are optional - any filter not specified is assumed to be all ones and will pass anything. # perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7, filter_eng=0x1,event=0x8,event_category=0x3/ modprobe dmatest channel=dma0chan0 timeout=2000 iterations=19 run=1 wait=1 Performance counter stats for 'system wide': 19 dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7, filter_eng=0x1,event=0x8,event_category=0x3/ 21.865914091 seconds time elapsed The output above reflects that the unspecified workload resulted in the counting of 19 MEM_MOVE operation events that met the filter criteria. [1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html [ Based on work originally by Jing Lin. ] Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Link: https://lore.kernel.org/r/0c5080a7d541904c4ad42b848c76a1ce056ddac7.1619276133.git.zanussi@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> |