linux-yocto

mirror of git://git.yoctoproject.org/linux-yocto.git synced 2025-08-22 00:42:01 +02:00

Author	SHA1	Message	Date
Matthew Auld	a00b8f1aae	drm/xe: fix xe_device_mem_access_get() races It looks like there is at least one race here, given that the pm_runtime_suspended() check looks to return false if we are in the process of suspending the device (RPM_SUSPENDING vs RPM_SUSPENDED). We later also do xe_pm_runtime_get_if_active(), but since the device is suspending or has now suspended, this doesn't do anything either. Following from this we can potentially return from xe_device_mem_access_get() with the device suspended or about to be, leading to broken behaviour. Attempt to fix this by always grabbing the runtime ref when our internal ref transitions from 0 -> 1. The hard part is then dealing with the runtime_pm callbacks also calling xe_device_mem_access_get() and deadlocking, which the pm_runtime_suspended() check prevented. v2: - ct->lock looks to be primed with fs_reclaim, so holding that and then allocating memory will cause lockdep to complain. Now that we unconditionally grab the mem_access.lock around mem_access_{get,put}, we need to change the ordering wrt to grabbing the ct->lock, since some of the runtime_pm routines can allocate memory (or at least that's what lockdep seems to suggest). Hopefully not a big deal. It might be that there were already issues with this, just that the atomics where "hiding" the potential issues. v3: - Use Thomas Hellström' idea with tracking the active task that is executing in the resume or suspend callback, in order to avoid recursive resume/suspend calls deadlocking on itself. - Split the ct->lock change. v4: - Add smb_mb() around accessing the pm_callback_task for extra safety. (Thomas Hellström) v5: - Clarify the kernel-doc for the mem_access.lock, given that it is quite strange in what it protects (data vs code). The real motivation is to aid lockdep. (Rodrigo Vivi) v6: - Split out the lock change. We still want this as a lockdep aid but only for the xe_device_mem_access_get() path. Sticking a lock on the put() looks be a no-go, also the runtime_put() there is always async. - Now that the lock is gone move to atomics and rely on the pm code serialising multiple callers on the 0 -> 1 transition. - g2h_worker_func() looks to be the next issue, given that suspend-resume callbacks are using CT, so try to handle that. v7: - Add xe_device_mem_access_get_if_ongoing(), and use it in g2h_worker_func(). v8 (Anshuman): - Just always grab the rpm, instead of just on the 0 -> 1 transition, which is a lot clearer and simplifies the code quite a bit. v9: - Make sure we also adjust the CT fast-path with if-active. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Acked-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:37:35 -05:00
Francois Dugast	3e8e7ee6a3	drm/xe: Cleanup style warnings Reduce the number of warnings reported by checkpatch.pl from 118 to 48 by addressing those warnings types: LEADING_SPACE LINE_SPACING BRACES TRAILING_SEMICOLON CONSTANT_COMPARISON BLOCK_COMMENT_STYLE RETURN_VOID ONE_SEMICOLON SUSPECT_CODE_INDENT LINE_CONTINUATIONS UNNECESSARY_ELSE UNSPECIFIED_INT UNNECESSARY_INT MISORDERED_TYPE Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:37:31 -05:00
Matthew Auld	35c8a96439	drm/xe: handle TLB invalidations from CT fast-path In various test cases that put the system under a heavy load, we can sometimes see errors with missed TLB invalidations. In such cases we see the interrupt arrive for the invalidation from the GuC, however the actual processing of the completion is pushed onto a workqueue and handled with all the other CT stuff, which might take longer than expected. Since we expect TLB invalidations to complete within a reasonable amount of time (at most ~250ms), and they do seem pretty critical, allow handling directly from the CT fast-path. v2 (José): - Actually use the correct spinlock/unlock_irq, since pending_lock is grabbed from IRQ. v3: - Don't publish the TLB fence on the list until after we fully initialize it and successfully do the CT send. The list is now only protected by the spin_lock pending_lock and we can't hold that across the entire TLB send operation. v4 (Matt Brost): - Be careful with racing against fast CT path writing the seqno, before we have actually published the fence. References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/297 References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/320 References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/449 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:35:23 -05:00
Matthew Auld	0b688f9b28	drm/xe/ct: update g2h outstanding for CTB capture Looks to always to be zero when inspecting the CTB dump. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:35:23 -05:00
Matthew Auld	a4d362bbed	drm/xe/ct: serialise fast_lock during CT disable The fast-path CT could be running as we enter a runtime-suspend or potentially a GT reset, however here we only use the ct->fast_lock and not the full ct->lock. Before disabling the CT, also serialise against the fast_lock to ensure any in-progress work finishes before we start nuking the CT related stuff. Once we disable ct->enabled and drop the lock, any new work should fail gracefully, and anything that was in progress should be finished. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:35:23 -05:00
Matthew Auld	dad33831d8	drm/xe/ct: hold fast_lock when reserving space for g2h Reserving and checking for space on the g2h side relies on the fast_lock, and not the CT lock since we need to release space from the fast CT path. Make sure we hold it when checking for space and reserving it. The main concern is calling __g2h_release_space() as we are reserving something and since the info.space and info.g2h_outstanding operations are not atomic we can get some nonsense values back. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:35:22 -05:00
Matthew Auld	c4bbc32e09	drm/xe: hold mem_access.ref for CT fast-path Just checking xe_device_mem_access_ongoing() is not enough, we also need to hold the reference otherwise the ref can transition from 1 -> 0 as we enter g2h_read(), leading to warnings. While we can't do a full rpm sync in the IRQ, we can keep the device awake if the ref is non-zero. Introduce a new helper for this and set it to work in for the CT fast-path. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:35:22 -05:00
Alan Previn	c7fac450dd	drm/xe/guc: Fix h2g_write usage of GUC_CTB_MSG_MAX_LEN In the ABI header, GUC_CTB_MSG_MIN_LEN is '1' because GUC_CTB_HDR_LEN is 1. This aligns with H2G/G2H CTB specification where all command formats are defined in units of dwords so that '1' is a dword. Accordingly, GUC_CTB_MSG_MAX_LEN is 256-1 (i.e. 255 dwords). However, h2g_write was incorrectly assuming that GUC_CTB_MSG_MAX_LEN was in bytes. Fix this. v3: Fix nit on #define location.(Matt) v2: By correctly treating GUC_CTB_MSG_MAX_LEN as dwords, it causes a local array to consume 4x the stack size. Rework the function to avoid consuming stack even if the action size is large. (Matt) Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:35:07 -05:00
Lucas De Marchi	90738d8665	drm/xe/guc: Fix typo s/enabled/enable/ Fix the log message when it fails to enable CT. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230611222447.2837573-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:34:14 -05:00
Matthew Brost	f1a5a9bf14	drm/xe/guc: Read HXG fields from DW1 of G2H response The HXG fields are DW1 not DW0, fix this. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:35:20 -05:00
Matt Roper	876611c2b7	drm/xe: Memory allocations are tile-based, not GT-based Since memory and address spaces are a tile concept rather than a GT concept, we need to plumb tile-based handling through lots of memory-related code. Note that one remaining shortcoming here that will need to be addressed before media GT support can be re-enabled is that although the address space is shared between a tile's GTs, each GT caches the PTEs independently in their own TLB and thus TLB invalidation should be handled at the GT level. v2: - Fix kunit test build. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:14 -05:00
Matthew Auld	565ce72e1c	drm/xe: don't allocate under ct->lock Seems to be a sensitive lock, where ct->lock looks to be primed with fs_reclaim, so holding that and then allocating memory will cause lockdep to complain. We need to change the ordering wrt to grabbing the ct->lock and potentially grabbing the runtime_pm, since some of the runtime_pm routines can allocate memory (or at least that's what lockdep seems to suggest). Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:09 -05:00
Rodrigo Vivi	513260dfd1	drm/xe: Convert GuC CT print to snapshot capture and print. The goal is to allow for a snapshot capture to be taken at the time of the crash, while the print out can happen at a later time through the exposed devcoredump virtual device. v2: Handle memory allocation failures. (Matthew) Do not use GFP_ATOMIC on cases like debugfs prints. (Matthew) v3: checkpatch fixes v4: Do not use atomic in the g2h_worker_func (Matthew) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>	2023-12-19 18:33:52 -05:00
Rodrigo Vivi	a7ca8157ec	drm/xe: Extract non mapped regions out of GuC CTB into its own struct. No functional change here. The goal is to have a clear split between the mapped portions of the CTB and the static information, so we can easily capture snapshots that will be used for later read out with the devcoredump infrastructure. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>	2023-12-19 18:33:52 -05:00
Niranjana Vishwanathapura	2988cf02ee	drm/xe: Fix memory use after free The wait_event_timeout() on g2h_fence.wq which is declared on stack can return before the wake_up() gets called, resulting in a stack out of bound access when wake_up() accesses the g2h_fene.wq. Do not declare g2h_fence related wait_queue_head_t on stack. Fixes the below KASAN BUG and associated kernel crashes. BUG: KASAN: stack-out-of-bounds in do_raw_spin_lock+0x6f/0x1e0 Read of size 4 at addr ffff88826252f4ac by task kworker/u128:5/467 CPU: 25 PID: 467 Comm: kworker/u128:5 Tainted: G U 6.3.0-rc4-xe #1 Workqueue: events_unbound g2h_worker_func [xe] Call Trace: <TASK> dump_stack_lvl+0x64/0xb0 print_report+0xc2/0x600 kasan_report+0x96/0xc0 do_raw_spin_lock+0x6f/0x1e0 _raw_spin_lock_irqsave+0x47/0x60 __wake_up_common_lock+0xc0/0x150 dequeue_one_g2h+0x20f/0x6a0 [xe] g2h_worker_func+0xa9/0x180 [xe] process_one_work+0x527/0x990 worker_thread+0x2d1/0x640 kthread+0x174/0x1b0 ret_from_fork+0x29/0x50 </TASK> Tested-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:31:40 -05:00
Lucas De Marchi	ea9f879d03	drm/xe: Sort includes Sort includes and split them in blocks: 1) .h corresponding to the .c. Example: xe_bb.c should have a "#include "xe_bb.h" first. 2) #include <linux/...> 3) #include <drm/...> 4) local includes 5) i915 includes This is accomplished by running `clang-format --style=file -i --sort-includes drivers/gpu/drm/xe/*.[ch]` and ignoring all the changes after the includes. There are also some manual tweaks to split the blocks. v2: Also sort includes in headers Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:29:20 -05:00
Matthew Brost	a9351846d9	drm/xe: Break of TLB invalidation into its own file TLB invalidation is used by more than USM (page faults) so break this code out into its own file. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:27:45 -05:00
Matthew Brost	5b64366087	drm/xe: Don't process TLB invalidation done in CT fast-path We can't currently do this due to TLB invalidation done handler expecting the seqno being received in-order, with the fast-path a TLB invalidation done could pass one being processed in the slow-path in an extreme corner case. Remove TLB invalidation done from the fast-path for now and in a follow up reenable this once the TLB invalidation done handler can deal with out of order seqno. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:27:45 -05:00
Matthew Brost	f900725af8	drm/xe/guc: s/xe_guc_send_mmio/xe_guc_mmio_send Now aligns with the xe_guc_ct_send naming. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Philippe Lecluse <philippe.lecluse1@gmail.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-12 14:06:00 -05:00
Matthew Brost	dd08ebf6c3	drm/xe: Introduce a new DRM driver for Intel GPUs Xe, is a new driver for Intel GPUs that supports both integrated and discrete platforms starting with Tiger Lake (first Intel Xe Architecture). The code is at a stage where it is already functional and has experimental support for multiple platforms starting from Tiger Lake, with initial support implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as well as in NEO (for OpenCL and Level0). The new Xe driver leverages a lot from i915. As for display, the intent is to share the display code with the i915 driver so that there is maximum reuse there. But it is not added in this patch. This initial work is a collaboration of many people and unfortunately the big squashed patch won't fully honor the proper credits. But let's get some git quick stats so we can at least try to preserve some of the credits: Co-developed-by: Matthew Brost <matthew.brost@intel.com> Co-developed-by: Matthew Auld <matthew.auld@intel.com> Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Co-developed-by: Francois Dugast <francois.dugast@intel.com> Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com> Co-developed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Co-developed-by: Philippe Lecluse <philippe.lecluse@intel.com> Co-developed-by: Nirmoy Das <nirmoy.das@intel.com> Co-developed-by: Jani Nikula <jani.nikula@intel.com> Co-developed-by: José Roberto de Souza <jose.souza@intel.com> Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Co-developed-by: Dave Airlie <airlied@redhat.com> Co-developed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Co-developed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com>	2023-12-12 14:05:48 -05:00

1 2

70 Commits