This reverts commit 74e1006430.
This causes a regression in the workload selection.
A more extensive fix is being worked on.
For now, revert.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3618
Fixes: 74e1006430 ("drm/amd/pm: correct the workload setting")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME
events while in the D0 state.
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 447a54a0f7)
Cc: stable@vger.kernel.org
That is just a waste of time on APUs.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3704
Fixes: 216c1282dd ("drm/amdgpu: use GTT only as fallback for VRAM|GTT")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e8fc090d32)
Cc: stable@vger.kernel.org
The coherency flags can only be determined when the BO is locked and that
in turn is only guaranteed when the mapping is validated.
Fix the check, move the resource check into the function and add an assert
that the BO is locked.
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: d1a372af1c ("drm/amdgpu: Set MTYPE in PTE based on BO flags")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1b4ca8546f)
Cc: stable@vger.kernel.org
Currently, the pp_dpm_mclk values are reported in descending order
on SMU IP v14.0.0/1/4. Adjust to ascending order for consistency
with other clock interfaces.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d4be16ccfd)
Cc: stable@vger.kernel.org
H264 supports 4096x4096 starting from Polaris.
HEVC also supports 4096x4096, with VCN 3 and newer 8192x4352
is supported.
Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 69e9a9e65b)
Cc: stable@vger.kernel.org
At some point, the IEEE ID identification for the replay check in the
AMD EDID was added. However, this check causes the following
out-of-bounds issues when using KASAN:
[ 27.804016] BUG: KASAN: slab-out-of-bounds in amdgpu_dm_update_freesync_caps+0xefa/0x17a0 [amdgpu]
[ 27.804788] Read of size 1 at addr ffff8881647fdb00 by task systemd-udevd/383
...
[ 27.821207] Memory state around the buggy address:
[ 27.821215] ffff8881647fda00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 27.821224] ffff8881647fda80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 27.821234] >ffff8881647fdb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 27.821243] ^
[ 27.821250] ffff8881647fdb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 27.821259] ffff8881647fdc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 27.821268] ==================================================================
This is caused because the ID extraction happens outside of the range of
the edid lenght. This commit addresses this issue by considering the
amd_vsdb_block size.
Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b7e381b1cc)
Cc: stable@vger.kernel.org
If the nominal VBlank is too small, optimizing for stutter can cause
the prefetch bandwidth to increase drasticaly, resulting in higher
clock and power requirements. Only optimize if it is >3x the stutter
latency.
Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 003215f962)
Cc: stable@vger.kernel.org
[Why]
In the case where a dml allocation fails for any reason, the
current state's dml contexts would no longer be valid. Then
subsequent calls dc_state_copy_internal would shallow copy
invalid memory and if the new state was released, a double
free would occur.
[How]
Reset dml pointers in new_state to NULL and avoid invalid
pointer
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Ryan Seto <ryanseto@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bcafdc6152)
Cc: stable@vger.kernel.org
[Why]
In certain use case such as KDE login screen, there will be no atomic
commit while do the frame update.
If the Panel Replay enabled, it will cause the screen not updated and
looks like system hang.
[How]
Delay few atomic commits before enabled the Panel Replay just like PSR.
Fixes: be64336307 ("drm/amd/display: Re-enable panel replay feature")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3686
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3682
Tested-By: Corey Hickey <bugfood-c@fatooh.org>
Tested-By: James Courtier-Dutton <james.dutton@gmail.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ca628f0edd)
Cc: stable@vger.kernel.org # 6.11+
Panel Replay feature may also use the same variable with PSR.
Change the variable name and make it not specify for PSR.
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c7fafb7a46)
Cc: stable@vger.kernel.org # 6.11+
Avoid a possible buffer overflow if size is larger than 4K.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f5d873f582)
Cc: stable@vger.kernel.org
Users should not be able to run these.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7ba9395430)
Cc: stable@vger.kernel.org
Regular users shouldn't have read access.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c0cfd2e652)
Cc: stable@vger.kernel.org
For DPX mode, the number of memory partitions supported should be less
than or equal to 2.
Fixes: 1589c82a10 ("drm/amdgpu: Check memory ranges for valid xcp mode")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 990c4f5807)
Cc: stable@vger.kernel.org
Correct the workload setting in order not to mix the setting
with the end user. Update the workload mask accordingly.
v2: changes as below:
1. the end user can not erase the workload from driver except default workload.
2. always shows the real highest priority workoad to the end user.
3. the real workload mask is combined with driver workload mask and end user workload mask.
v3: apply this to the other ASICs as well.
v4: simplify the code
v5: refine the code based on the review comments.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8cc438be5d)
Cc: stable@vger.kernel.org # 6.11.x
always pick the pptable from IFWI on smu v14.0.2/3
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 136ce12bd5)
Cc: stable@vger.kernel.org # 6.11.x
acpi_evaluate_object() may return AE_NOT_FOUND (failure), which
would result in dereferencing buffer.pointer (obj) while being NULL.
Although this case may be unrealistic for the current code, it is
still better to protect against possible bugs.
Bail out also when status is AE_NOT_FOUND.
This fixes 1 FORWARD_NULL issue reported by Coverity
Report: CID 1600951: Null pointer dereferences (FORWARD_NULL)
Signed-off-by: Antonio Quartulli <antonio@mandelbit.com>
Fixes: c9b7c809b8 ("drm/amd: Guard against bad data for ATIF ACPI method")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 91c9e221fe)
Cc: stable@vger.kernel.org
An upstream bug report suggests that there are production dGPUs that are
older than DCN401 but still have a umc_info in VBIOS tables with the
same version as expected for a DCN401 product. Hence, reading this
tables should be guarded with a version check.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3678
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2551b4a321)
Fixes: 00c391102a ("drm/amd/display: Add misc DC changes for DCN401")
Cc: stable@vger.kernel.org # 6.11.x
[Why]
During boot up and resume the DC layer will reset the panel
brightness to fix a flicker issue.
It will cause the dm->actual_brightness is not the current panel
brightness level. (the dm->brightness is the correct panel level)
[How]
Set the backlight level after do the set mode.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Fixes: d9e865826c ("drm/amd/display: Simplify brightness initialization")
Reported-by: Mark Herbert <mark.herbert42@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3655
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7875afafba)
Cc: stable@vger.kernel.org
The following 3 commits landed in parallel:
commit d7d2688bf4 ("drm/amd/pm: update workload mask after the setting")
commit 7a1613e47e ("drm/amdgpu/smu13: always apply the powersave optimization")
commit 7c210ca5a2 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D")
While everything is set correctly, this caused the profile to be
reported incorrectly because both the powersave and fullscreen3d bits
were set in the mask and when the driver prints the profile, it looks
for the first bit set.
Fixes: d7d2688bf4 ("drm/amd/pm: update workload mask after the setting")
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ecfe9b2376)
Cc: stable@vger.kernel.org
KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:
[ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[ 33.861816] Tainted: [W]=WARN
[ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[ 33.861822] Call Trace:
[ 33.861826] <TASK>
[ 33.861829] dump_stack_lvl+0x66/0x90
[ 33.861838] print_report+0xce/0x620
[ 33.861853] kasan_report+0xda/0x110
[ 33.862794] kasan_check_range+0xfd/0x1a0
[ 33.862799] __asan_memset+0x23/0x40
[ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.867135] dev_attr_show+0x43/0xc0
[ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0
[ 33.867155] seq_read_iter+0x3f8/0x1140
[ 33.867173] vfs_read+0x76c/0xc50
[ 33.867198] ksys_read+0xfb/0x1d0
[ 33.867214] do_syscall_64+0x90/0x160
...
[ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[ 33.867358] kasan_save_stack+0x33/0x50
[ 33.867364] kasan_save_track+0x17/0x60
[ 33.867367] __kasan_kmalloc+0x87/0x90
[ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu]
[ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[ 33.869608] local_pci_probe+0xda/0x180
[ 33.869614] pci_device_probe+0x43f/0x6b0
Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.
Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.
The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.
v2:
* Drop impossible v3_0 case. (Mario)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40bc9 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0880f58f96)
Cc: stable@vger.kernel.org # v6.6+
This reverts
commit 9dad21f910 ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35")
[why & how]
The offending commit exposes a hang with lid close/open behavior.
Both issues seem to be related to ODM 2:1 mode switching, so there
is another issue generic to that sequence that needs to be
investigated.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 68bf95317e)
Cc: stable@vger.kernel.org
Some devices do not support fullscreen 3D.
v2: Make the check generic.
Fixes: ec1aab7816 ("drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Kenneth Feng <kenneth.feng@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
(cherry picked from commit 1cdd67510e)
There is random data corruption caused by const fill, this is caused by
write compression mode not correctly configured.
So correct compression mode for const fill.
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 75400f8d6e)
Cc: stable@vger.kernel.org # 6.11.x
[Why&How]
Disabling P-State support on full updates for DCN401 results in
introducing additional communication with SMU. A UCLK hard min message
to SMU takes 4 seconds to go through, which was due to DCN not allowing
pstate switch, which was caused by incorrect value for TTU watermark
before blanking the HUBP prior to DPG on for servicing the test request.
Fix the issue temporarily by disallowing pstate changes for compliance
test while test request handler is reworked for a proper fix.
Fixes: 67ea53a4bd ("drm/amd/display: Disable DCN401 UCLK P-State support on full updates")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8a79f7cdbb)
Cc: stable@vger.kernel.org
[Why&How]
vblank immediate disable currently does not work for all asics. On
DCN401, the vblank interrupts never stop coming, and hence we never
get a chance to trigger idle optimizations.
Add a workaround to enable immediate disable only on APUs for now. This
adds a 2-frame delay for triggering idle optimization, which is a
negligible overhead.
Fixes: 58a261bfc9 ("drm/amd/display: use a more lax vblank enable policy for older ASICs")
Fixes: e45b6716de ("drm/amd/display: use a more lax vblank enable policy for DCN35+")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9b47278cec)
Cc: stable@vger.kernel.org
disable deep sleep during the compute workload for the
potential performance loss on smu v14.0.2/3
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7d9af459f4)
update overdrive function on smu v14.0.2/3
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dcf822fca5)
update the driver-fw interface file for smu v14.0.2/3
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0642c95efb)
If a BIOS provides bad data in response to an ATIF method call
this causes a NULL pointer dereference in the caller.
```
? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1))
? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434)
? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2))
? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1))
? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642)
? exc_page_fault (arch/x86/mm/fault.c:1542)
? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu
```
It has been encountered on at least one system, so guard for it.
Fixes: d38ceaf99e ("drm/amdgpu: add core driver (v4)")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c9b7c809b8)
Cc: stable@vger.kernel.org
Process device data pdd->vram_usage is read by rocm-smi via sysfs, this
is currently missing the svm_bo usage accounting, so "rocm-smi
--showpids" per process VRAM usage report is incorrect.
Add pdd->vram_usage accounting when svm_bo allocation and release,
change to atomic64_t type because it is updated outside process mutex
now.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 98c0b0efcc)
With Unified MES enabled in gfx12, need separate event log buffer for the
2 MES pipes to avoid data overwrite.
Signed-off-by: Michael Chen <michael.chen@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 144df260f3)
Cc: stable@vger.kernel.org # 6.11.x
Before this patch, if multiple BO_HANDLES chunks were submitted,
the error -EINVAL would be correctly set but could be overwritten
by the return value from amdgpu_cs_p1_bo_handles(). This patch
ensures that if there are multiple BO_HANDLES, we stop.
Fixes: fec5f8e8c6 ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit")
Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 40f2cd9882)
Cc: stable@vger.kernel.org
It should be enabled on both bare metal and VFs.
Fixes: e189be9b2e ("drm/amdgpu: Add enforce_isolation sysfs attribute")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Cc: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
(cherry picked from commit dc8847b054)
Since, two suspend-resume cycles are required to enter hibernate and,
since we only need to enable idle optimizations in the first cycle
(which is pretty much equivalent to s2idle). We can check in_s0ix, to
prevent the system from entering idle optimizations before it actually
enters hibernate (from display's perspective). Also, call
dc_set_power_state() before dc_allow_idle_optimizations(), since it's
safer to do so because dc_set_power_state() writes to DMUB.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2fe79508d9)
Cc: stable@vger.kernel.org # 6.10+
[Why]
Since the surface/stream update flags aren't cleared after applying
updates, those same updates may be applied again in a future call to
update surfaces/streams for surfaces/streams that aren't actually part
of that update (i.e. applying an update for one surface/stream can
trigger unintended programming on a different surface/stream).
For example, when an update results in a call to
program_front_end_for_ctx, that function may call program_pipe on all
pipes. If there are surface update flags that were never cleared on the
surface some pipe is attached to, then the same update will be
programmed again.
[How]
Clear the surface and stream update flags after applying the updates.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3441
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3616
Cc: Melissa Wen <mwen@igalia.com>
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Josip Pavic <Josip.Pavic@amd.com>
Signed-off-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7671f62c10)
Cc: stable@vger.kernel.org
Partially revert
commit 0ca9f757a0 ("drm/amd/pm: powerplay: Add `__counted_by` attribute for flexible arrays")
The count attribute for these arrays does not get set until
after the arrays are allocated and populated leading to false
UBSAN warnings.
Fixes: 0ca9f757a0 ("drm/amd/pm: powerplay: Add `__counted_by` attribute for flexible arrays")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3662
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8a5ae927b6)
Cc: stable@vger.kernel.org
Only creating a new reference for each process instead of each VM.
Fixes: 9a1c1339ab ("drm/amdkfd: Run restore_workers on freezable WQs")
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5fa4362894)
Cc: stable@vger.kernel.org
atomic:
- Use correct type when reading damage rectangles
display:
- Fix kernel docs
dp-mst:
- Fix DSC decompression detection
hdmi:
- Fix infoframe size
sched:
- Update maintainers
- Fix race condition whne queueing up jobs
- Fix locking in drm_sched_entity_modify_sched()
- Fix pointer deref if entity queue changes
sysfb:
- Disable sysfb if framebuffer parent device is unknown
amdgpu:
- DML2 fix
- DSC fix
- Dispclk fix
- eDP HDR fix
- IPS fix
- TBT fix
i915:
- One fix for bitwise and logical "and" mixup in PM code
xe:
- Restore pci state on resume
- Fix locking on submission, queue and vm
- Fix UAF on queue destruction
- Fix resource release on freq init error path
- Use rw_semaphore to reduce contention on ASID->VM lookup
- Fix steering for media on Xe2_HPM
- Tuning updates to Xe2
- Resume TDR after GT reset to prevent jobs running forever
- Move id allocation to avoid userspace using a guessed number
to trigger UAF
- Fix OA stream close preventing pbatch buffers to complete
- Fix NPD when migrating memory on LNL
- Fix memory leak when aborting binds
panthor:
- Fix locking
- Set FOP_UNSIGNED_OFFSET in fops instance
- Acquire lock in panthor_vm_prepare_map_op_ctx()
- Avoid uninitialized variable in tick_ctx_cleanup()
- Do not block scheduler queue if work is pending
- Do not add write fences to the shared BOs
vbox:
- Fix VLA handling
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmb/YwgACgkQDHTzWXnE
hr6VbxAAoq9FYTAdRPWzfG1HYpG96UyTh+IT6lz1bk/Hblxhi7oRdfmRy/bVPQYh
vj+Q2xrnyS6JYhyfeDT2nU75tD3gvR1V/qSamxXS7c1nrqcb431DaMzuSQ5ST6MZ
jrmob2TxlbXDDw70dxtiGCmSu0a9QInbelEamJQySKOdun0Il5C0LRZIBMpicDGc
7Y3eSpCIwgTSU6bnApGyOchppvzptiqBWGmhoIuACMOgXI8eaLUPqbROKEHlPe5g
JIG603rRK7cf+on/KEwvgrd2ZO59fJZvmwFrM5yY5bOsDCwTIJ6mHhOmutUNQvmd
G5n6ZFnVxlBRSVWCAqPRBgA405s/0wi2IQprilaPCu2qAXToBXAUpIHuuat5I/8b
BVwurVRAGV/GSeg7E51H3o8cu/fcQr4aGNW4Ul6fS1G123ZuUISpcUp9IEnqG7nB
5PSnHapadb5Pu+7kwhbWUD4kONp16oEacZPhymlN+74Q6X3v8UVlK/YSyD6wq4fj
2s4TBWUmXmNxztNEkgJhyJORYQhZeBaD0PtPq8kSzMUCFj3Q7Wf0bAODhbnmdCw1
iPgxRd9+38IpfW621AROJUoTcyCaLtlSUvHWgFfska5CYnbtbKuNPBW6baQxeEHe
Rhns01dZhNTIbKNEw37cfOf2DqcjpmRm4cVJj4xjZawxWYlNldk=
=wcmz
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2024-10-04' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes, xe and amdgpu lead the way, with panthor, and few core
components getting various fixes. Nothing seems too out of the
ordinary.
atomic:
- Use correct type when reading damage rectangles
display:
- Fix kernel docs
dp-mst:
- Fix DSC decompression detection
hdmi:
- Fix infoframe size
sched:
- Update maintainers
- Fix race condition whne queueing up jobs
- Fix locking in drm_sched_entity_modify_sched()
- Fix pointer deref if entity queue changes
sysfb:
- Disable sysfb if framebuffer parent device is unknown
amdgpu:
- DML2 fix
- DSC fix
- Dispclk fix
- eDP HDR fix
- IPS fix
- TBT fix
i915:
- One fix for bitwise and logical "and" mixup in PM code
xe:
- Restore pci state on resume
- Fix locking on submission, queue and vm
- Fix UAF on queue destruction
- Fix resource release on freq init error path
- Use rw_semaphore to reduce contention on ASID->VM lookup
- Fix steering for media on Xe2_HPM
- Tuning updates to Xe2
- Resume TDR after GT reset to prevent jobs running forever
- Move id allocation to avoid userspace using a guessed number to
trigger UAF
- Fix OA stream close preventing pbatch buffers to complete
- Fix NPD when migrating memory on LNL
- Fix memory leak when aborting binds
panthor:
- Fix locking
- Set FOP_UNSIGNED_OFFSET in fops instance
- Acquire lock in panthor_vm_prepare_map_op_ctx()
- Avoid uninitialized variable in tick_ctx_cleanup()
- Do not block scheduler queue if work is pending
- Do not add write fences to the shared BOs
vbox:
- Fix VLA handling"
* tag 'drm-fixes-2024-10-04' of https://gitlab.freedesktop.org/drm/kernel: (41 commits)
drm/xe: Fix memory leak when aborting binds
drm/xe: Prevent null pointer access in xe_migrate_copy
drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close
drm/xe/queue: move xa_alloc to prevent UAF
drm/xe/vm: move xa_alloc to prevent UAF
drm/xe: Clean up VM / exec queue file lock usage.
drm/xe: Resume TDR after GT reset
drm/xe/xe2: Add performance tuning for L3 cache flushing
drm/xe/xe2: Extend performance tuning to media GT
drm/xe/mcr: Use Xe2_LPM steering tables for Xe2_HPM
drm/xe: Use helper for ASID -> VM in GPU faults and access counters
drm/xe: Convert to USM lock to rwsem
drm/xe: use devm_add_action_or_reset() helper
drm/xe: fix UAF around queue destruction
drm/xe/guc_submit: add missing locking in wedged_fini
drm/xe: Restore pci state upon resume
drm/amd/display: Fix system hang while resume with TBT monitor
drm/amd/display: Enable idle workqueue for more IPS modes
drm/amd/display: Add HDR workaround for specific eDP
drm/amd/display: avoid set dispclk to 0
...