linux-yocto

mirror of git://git.yoctoproject.org/linux-yocto.git synced 2025-08-22 00:42:01 +02:00

Author	SHA1	Message	Date
Jesse Zhang	aa47fe8d35	drm/amdkfd: Fix resource leak in criu restore queue To avoid memory leaks, release q_extra_data when exiting the restore queue. v2: Correct the proto (Alex) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:06 -04:00
Leo Li	578aab4ecc	drm/amd/display: Do not reset planes based on crtc zpos_changed [Why] drm_normalize_zpos will set the crtc_state->zpos_changed to 1 if any of it's assigned planes changes zpos, or is removed/added from it. To have amdgpu_dm request a plane reset on this is too broad. For example, if only the cursor plane was moved from one crtc to another, the crtc's zpos_changed will be set to true. But that does not mean that the underlying primary plane requires a reset. [How] Narrow it down so that only the plane that has a change in zpos will require a reset. As a future TODO, we can further optimize this by only requiring a reset on z-order change. Z-order is different from z-pos, since a zpos change doesn't necessarily mean the z-ordering changed, and DC should only require a reset if the z-ordering changed. For example, the following zpos update does not change z-ordering: Plane A: zpos 2 -> 3 Plane B: zpos 1 -> 2 => Plane A is still on top of plane B: no reset needed Whereas this one does change z-ordering: Plane A: zpos 2 -> 1 Plane B: zpos 1 -> 2 => Plane A changed from on top, to below plane B: reset needed Fixes: `38e0c3df6d` ("drm/amd/display: Move PRIMARY plane zpos higher") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3569 Signed-off-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 11:52:23 -04:00
Jani Nikula	0df8ef6e1b	drm/amdgpu: drop redundant W=1 warnings from Makefile Since commit `a61ddb4393` ("drm: enable (most) W=1 warnings by default across the subsystem"), most of the extra warnings in the driver Makefile are redundant. Remove them. Note that -Wmissing-declarations and -Wmissing-prototypes are always enabled by default in scripts/Makefile.extrawarn. Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:17 -04:00
Christian König	7ccde2e6c0	drm/amdgpu: revert "use CPU for page table update if SDMA is unavailable" That is clearly not something we should do upstream. The SDMA is mandatory for the driver to work correctly. We could do this for emulation and bringup, but in those cases the engineer should probably enabled CPU based updates manually. This reverts commit `62eefd10ac`. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:06 -04:00
Dan Carpenter	27f9dcb9cc	drm/amdgpu/mes11: Indent an if statment Indent the "break" statement one more tab. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:05 -04:00
Philip Yang	663b0f1e14	drm/amdkfd: Document and define SVM events message macro Document how to use SMI system management interface to enable and receive SVM events. Document SVM event triggers. Define SVM events message string format macro that could be used by user mode for sscanf to parse the event. Add it to uAPI header file to make it obvious that is changing uAPI in future. No functional changes. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:05 -04:00
Hawking Zhang	7eafe7a730	drm/amdkfd: Select reset method for poison handling Driver mode-2 is only supported by relative new smc firmware. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:05 -04:00
Jonathan Kim	101025e94b	drm/amdkfd: fix missed queue reset on queue destroy If a queue is being destroyed but causes a HWS hang on removal, the KFD may issue an unnecessary gpu reset if the destroyed queue can be fixed by a queue reset. This is because the queue has been removed from the KFD's queue list prior to the preemption action on destroy so the reset call will fail to match the HQD PQ reset information against the KFD's queue record to do the actual reset. To fix this, deactivate the queue prior to preemption since it's being destroyed anyways and remove the queue from the KFD's queue list after preemption. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:05 -04:00
Ramesh Errabolu	01be2b62c0	drm/amdgpu: Surface svm_default_granularity, a RW module parameter Enables users to update SVM's default granularity, used in buffer migration and handling of recoverable page faults. Param value is set in terms of log(numPages(buffer)), e.g. 9 for a 2 MIB buffer Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:05 -04:00
Jesse Zhang	e8397d327e	drm/amdgpu: fix queue reset issue by mmio Initialize the queue type before resetting the queue using mmio. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:54:54 -04:00
Srinivasan Shanmugam	e146a7ab21	drm/amd/display: Add kdoc entry for 'program_isharp_1dlut' in 'dpp401_dscl_program_isharp' Added a descriptor for the 'program_isharp_1dlut' parameter, which is a flag used to determine whether to program the isharp 1D LUT. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/../display/dc/dpp/dcn401/dcn401_dpp_dscl.c:963: warning: Function parameter or struct member 'program_isharp_1dlut' not described in 'dpp401_dscl_program_isharp' Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:43:37 -04:00
Srinivasan Shanmugam	559a285816	drm/amdgpu: Replace 'amdgpu_job_submit_direct' with 'drm_sched_entity' in cleaner shader This commit replaces the use of amdgpu_job_submit_direct which submits the job to the ring directly, with drm_sched_entity in the cleaner shader job submission process. The change allows the GPU scheduler to manage the cleaner shader job. - The job is then submitted to the GPU using the drm_sched_entity_push_job function, which allows the GPU scheduler to manage the job. This change improves the reliability of the cleaner shader job submission process by leveraging the capabilities of the GPU scheduler. Fixes: `d361ad5d2f` ("drm/amdgpu: Add sysfs interface for running cleaner shader") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:42:33 -04:00
Srinivasan Shanmugam	2578487ebe	drm/amdgpu/: Add missing kdoc entry in amdgpu_vm_handle_fault function This commit adds a description for the 'ts' parameter in the amdgpu_vm_handle_fault function's comment block. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2781: warning: Function parameter or struct member 'ts' not described in 'amdgpu_vm_handle_fault' Cc: Xiaogang.Chen <Xiaogang.Chen@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408251419.vgZHg3GV-lkp@intel.com/ Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:42:05 -04:00
Qili Lu	f5a972dfe3	drm/amd/display: fix dccg root clock optimization related hang [Why] enable dpp rcg before we disable dppclk in hw_init cause system hang/reboot [How] we remove dccg rcg related code from init into a separate function and call it after we init pipe Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Qili Lu <qili.lu@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:41:07 -04:00
Nicholas Susanto	5359d5bc97	drm/amd/display: Refactor dccg35_get_other_enabled_symclk_fe [Why] Function used to check the number of FEs connected to the current BE. This was then used to determine if the symclk could be disabled, if all FEs were disconnected. However, the function would skip over the primary FE and return 0 when the primary FE was still connected. This caused black screens on driver disable with an MST daisy chain hooked up. [How] Refactor the function to correctly return the number of FEs connected to the input BE. Also, rename it for clarity. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Nicholas Susanto <Nicholas.Susanto@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:40:55 -04:00
Lijo Lazar	4481df364d	drm/amdgpu: Normalize reg offsets on JPEG v4.0.3 On VFs and SOCs with GC 9.4.4, VCN RRMT is disabled. Only local register offsets should be used on JPEG v4.0.3 as they cannot handle remote access to other AIDs. Since only local offsets are used, the special write to MCM_ADDR register is no longer needed. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:40:47 -04:00
Tobias Jakobi	0607a50c00	drm/amd/display: Avoid race between dcn35_set_drr() and dc_state_destruct() dc_state_destruct() nulls the resource context of the DC state. The pipe context passed to dcn35_set_drr() is a member of this resource context. If dc_state_destruct() is called parallel to the IRQ processing (which calls dcn35_set_drr() at some point), we can end up using already nulled function callback fields of struct stream_resource. The logic in dcn35_set_drr() already tries to avoid this, by checking tg against NULL. But if the nulling happens exactly after the NULL check and before the next access, then we get a race. Avoid this by copying tg first to a local variable, and then use this variable for all the operations. This should work, as long as nobody frees the resource pool where the timing generators live. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3142 Fixes: `06ad7e1642` ("drm/amd/display: Destroy DC context while keeping DML and DML2") Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:40:09 -04:00
Tobias Jakobi	a3cc326a43	drm/amd/display: Avoid race between dcn10_set_drr() and dc_state_destruct() dc_state_destruct() nulls the resource context of the DC state. The pipe context passed to dcn10_set_drr() is a member of this resource context. If dc_state_destruct() is called parallel to the IRQ processing (which calls dcn10_set_drr() at some point), we can end up using already nulled function callback fields of struct stream_resource. The logic in dcn10_set_drr() already tries to avoid this, by checking tg against NULL. But if the nulling happens exactly after the NULL check and before the next access, then we get a race. Avoid this by copying tg first to a local variable, and then use this variable for all the operations. This should work, as long as nobody frees the resource pool where the timing generators live. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3142 Fixes: `06ad7e1642` ("drm/amd/display: Destroy DC context while keeping DML and DML2") Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Tested-by: Raoul van Rüschen <raoul.van.rueschen@gmail.com> Tested-by: Christopher Snowhill <chris@kode54.net> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Tested-by: Sefa Eyeoglu <contact@scrumplex.net> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:39:02 -04:00
Li Zetao	760e3c8b32	drm/amdgpu: use clamp() in amdgpu_vm_adjust_size() When it needs to get a value within a certain interval, using clamp() makes the code easier to understand than min(max()). Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:57 -04:00
Li Zetao	6fbbb660b1	drm/amd: use clamp() in amdgpu_pll_get_fb_ref_div() When it needs to get a value within a certain interval, using clamp() makes the code easier to understand than min(max()). Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:53 -04:00
Peng Liu	2c7795e245	drm/amdgpu: enable gfxoff quirk on HP 705G4 Enabling gfxoff quirk results in perfectly usable graphical user interface on HP 705G4 DM with R5 2400G. Without the quirk, X server is completely unusable as every few seconds there is gpu reset due to ring gfx timeout. Signed-off-by: Peng Liu <liupeng01@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:45 -04:00
Peng Liu	0126c0ae11	drm/amdgpu: add raven1 gfxoff quirk Fix screen corruption with openkylin. Link: https://bbs.openkylin.top/t/topic/171497 Signed-off-by: Peng Liu <liupeng01@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:33 -04:00
Colin Ian King	7b17e8f3a0	drm/amd/display: Fix spelling mistake "recompte" -> "recompute" There is a spelling mistake in a DRM_DEBUG_DRIVER message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:27 -04:00
David Belanger	4e9fadacdd	drm/amdkfd: Add cache line size info Populate cache line size info in topology based on information from IP discovery table. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Sreekant Somasekharan <Sreekant.Somasekharan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:18 -04:00
Srinivasan Shanmugam	4c4e9cb58d	drm/amd/display: Add missing kdoc entry for 'bs_coeffs_updated' in dpp401_dscl_program_isharp This commit addresses a missing kdoc for the 'bs_coeffs_updated' parameter in the 'dpp401_dscl_program_isharp' function. The 'bs_coeffs_updated' is a flag indicating whether the Blur and Scale Coefficients have been updated. The 'dpp401_dscl_program_isharp' function is responsible for programming the isharp, which includes setting the isharp filter, noise gain, and blur and scale coefficients. If the 'bs_coeffs_updated' flag is set to true, the function updates the blur and scale coefficients. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/../display/dc/dpp/dcn401/dcn401_dpp_dscl.c:961: warning: Function parameter or struct member 'bs_coeffs_updated' not described in 'dpp401_dscl_program_isharp' Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Tom Chung <chiahsuan.chung@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:08 -04:00
Lang Yu	4453808d9e	drm/amdgpu: fix invalid fence handling in amdgpu_vm_tlb_flush CPU based update doesn't produce a fence, handle such cases properly. Fixes: `d8a3f0a034` ("drm/amdgpu: implement TLB flush fence") Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:36:50 -04:00
Christian König	4da5a95bf1	drm/amdgpu: re-work VM syncing Rework how VM operations synchronize to submissions. Provide an amdgpu_sync container to the backends instead of an reservation object and fill in the amdgpu_sync object in the higher layers of the code. No intended functional change, just prepares for upcomming changes. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:36:24 -04:00
Alex Deucher	1a8d845470	Revert "drm/amdgpu: align pp_power_profile_mode with kernel docs" This reverts commit `8f614469de`. This breaks some manual setting of the profile mode in certain cases. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3600 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `7a19955764`) Cc: stable@vger.kernel.org	2024-09-05 14:46:39 -04:00
Alex Deucher	7a19955764	Revert "drm/amdgpu: align pp_power_profile_mode with kernel docs" This reverts commit `bbb05f8a9c`. This breaks some manual setting of the profile mode in certain cases. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3600 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-05 14:27:41 -04:00
Dillon Varone	38e3285dbd	drm/amd/display: Block timing sync for different signals in PMO PMO assumes that like timings can be synchronized, but DC only allows this if the signal types match. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `29d3d6af43`) Cc: stable@vger.kernel.org	2024-09-02 13:13:43 -04:00
Leo Li	53c3685f53	drm/amd/display: Lock DC and exit IPS when changing backlight Backlight updates require aux and/or register access. Therefore, driver needs to disallow IPS beforehand. So, acquire the dc lock before calling into dc to update backlight - we should be doing this regardless of IPS. Then, while the lock is held, disallow IPS before calling into dc, then allow IPS afterwards (if it was previously allowed). Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `988fe28626`) Cc: stable@vger.kernel.org # 6.10+	2024-09-02 13:12:13 -04:00
Alex Deucher	4de34b0478	drm/amdgpu: always allocate cleared VRAM for GEM allocations This adds allocation latency, but aligns better with user expectations. The latency should improve with the drm buddy clearing patches that Arun has been working on. In addition this fixes the high CPU spikes seen when doing wipe on release. v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528 Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Acked-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com> (cherry picked from commit `6c0a7c3c69`) Cc: stable@vger.kernel.org # 6.10.x	2024-09-02 13:08:51 -04:00
Jack Xiao	34c36a77f4	drm/amdgpu/mes: add mes mapping legacy queue switch For mes11 old firmware has issue to map legacy queue, add a flag to switch mes to map legacy queue. Fixes: `f9d8c5c785` ("drm/amdgpu/gfx: enable mes to map legacy queue support") Reported-by: Andrew Worsley <amworsley@gmail.com> Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `52491d97aa`)	2024-09-02 13:05:39 -04:00
Leo Li	65444581a4	drm/amd/display: Determine IPS mode by ASIC and PMFW versions [Why] DCN IPS interoperates with other system idle power features, such as Zstates. On DCN35, there is a known issue where system Z8 + DCN IPS2 causes a hard hang. We observe this on systems where the SBIOS allows Z8. Though there is a SBIOS fix, there's no guarantee that users will get it any time soon, or even install it. A workaround is needed to prevent this from rearing its head in the wild. [How] For DCN35, check the pmfw version to determine whether the SBIOS has the fix. If not, set IPS1+RCG as the deepest possible state in all cases except for s0ix and display off (DPMS). Otherwise, enable all IPS Signed-off-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `28d43d0895`) Cc: stable@vger.kernel.org	2024-09-02 13:05:09 -04:00
Alex Deucher	ead60e9c4e	drm/amdgpu/gfx10: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:40 -04:00
Alex Deucher	3f2d35c325	drm/amdgpu/gfx11: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:38 -04:00
Alex Deucher	21818f39be	drm/amdgpu/gfx12: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:34 -04:00
Alex Deucher	f8eee864ba	drm/amdgpu/gfx12: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:31 -04:00
Alex Deucher	01d05521f7	drm/amdgpu/gfx11: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:28 -04:00
Alex Deucher	bcee4c3f89	drm/amdgpu/gfx10: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:24 -04:00
Alex Deucher	1a1995b1dc	drm/amdgpu/gfx12: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:22 -04:00
Alex Deucher	01163079e1	drm/amdgpu/gfx11: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:20 -04:00
Alex Deucher	4d5ddfa4b1	drm/amdgpu/gfx10: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:16 -04:00
Jiadong Zhu	178ad0e280	drm/amdgpu/mes11: implement mmio queue reset for gfx11 Implement queue reset for graphic and compute queue. v2: use amdgpu_gfx_rlc funcs to enter/exit safe mode. v3: use gfx_v11_0_request_gfx_index_mutex() v4: fix mutex handling Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:13 -04:00
Jiadong Zhu	01b4ae38e5	drm/amdgpu/mes: implement amdgpu_mes_reset_hw_queue_mmio The reset_queue api could be used from kfd or kgd. v2: add use_mmio parameter for mes_reset_legacy_queue. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:07 -04:00
Jiadong Zhu	8b2429a13f	drm/amdgpu/mes: modify mes api for mmio queue reset Add me/pipe/queue parameters for queue reset input. v2: fix build (Alex) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:03 -04:00
Alex Deucher	8fe4fde381	drm/amdgpu/gfx12: fallback to driver reset compute queue directly Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ring type reset in each function respectively. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:01 -04:00
Alex Deucher	2480599890	drm/amdgpu/gfx12: add ring reset callbacks Add ring reset callbacks for gfx and compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:58 -04:00
Alex Deucher	d1f2144321	drm/amdgpu/gfx10: rework reset sequence To match other GFX IPs. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:55 -04:00
Jiadong Zhu	097af47d3c	drm/amdgpu/gfx10: wait for reset done before remap There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) v3: fix KIQ locking harder (Jessie) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:47 -04:00
Jiadong Zhu	2f3806f781	drm/amdgpu/gfx10: remap queue after reset successfully Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. v2: fix up error handling (Alex) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:44 -04:00
Alex Deucher	1741281a15	drm/amdgpu/gfx10: add ring reset callbacks Add ring reset callbacks for gfx and compute. v2: fix gfx handling v3: wait for KIQ to complete Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:38 -04:00
Jiadong Zhu	a10c93931b	drm/amdgpu/gfx11: wait for reset done before remap There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:34 -04:00
Alex Deucher	7d8e9e65f2	drm/amdgpu/gfx11: rename gfx_v11_0_gfx_init_queue() Rename to gfx_v11_0_kgq_init_queue() to better align with the other naming in the file. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:31 -04:00
Prike Liang	072b441478	drm/amdgpu/gfx11: fallback to driver reset compute queue directly (v2) Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ring type reset in each function respectively. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:21 -04:00
Aric Cyr	f2ea269bd2	drm/amd/display: 3.2.299 This version brings along the following: - DCN35 fixes - DML2 fixes - IPS fixes - ODM fixes - Miscellaneous cleanups - MST fixes - SPL fixes Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:09 -04:00
Hansen Dsouza	9888773753	drm/amd/display: Fix flickering caused by dccg Always allow un-gating. Follow legacy workaround for repeated dppclk dto updates Reviewed-by: Muhammad Ahmed <ahmed.ahmed@amd.com> Signed-off-by: Hansen Dsouza <Hansen.Dsouza@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:39:53 -04:00
Dillon Varone	29d3d6af43	drm/amd/display: Block timing sync for different signals in PMO PMO assumes that like timings can be synchronized, but DC only allows this if the signal types match. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:39:14 -04:00
Gabe Teeger	fc5da5c00c	drm/amd/display: fix graphics hang in multi-display mst case [what] Graphics hang observed with 3 displays connected to DP2.0 mst dock. [why] There's a mismatch in dml and dc between the assignments of hpo link encoders. [how] Add a new array in dml that tracks the current mapping of HPO stream encoders to HPO link encoders in dc. Reviewed-by: Sung joon Kim <sungjoon.kim@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Gabe Teeger <Gabe.Teeger@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:38:30 -04:00
Relja Vojvodic	efaf15752d	drm/amd/display: Add sharpness control interface - Add interface for controlling shapness level input into DCN. - Update SPL to support custom sharpness values. - Add support for different sharpness values depending on YUV/RGB content. Reviewed-by: Samson Tam <samson.tam@amd.com> Signed-off-by: Relja Vojvodic <Relja.Vojvodic@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:38:23 -04:00
Dillon Varone	6e84109447	Revert "drm/amd/display: Wait for all pending cleared before full update" This reverts commit `f0b7dcf258`. It is causing graphics hangs. Reviewed-by: Martin Leung <martin.leung@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:37:29 -04:00
Samson Tam	8a060e9c17	drm/amd/display: disable sharpness if HDR Multiplier is too large [Why] Certain profiles have higher HDR multiplier than SDR boost max which is not currently supported [How] Disable sharpness for these profiles Fixes: `1b0ce903fe` ("drm/amd/display: add improvements for text display and HDR DWM and MPO") Reviewed-by: Martin Leung <martin.leung@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:35:32 -04:00
Meenakshikumar Somasundaram	c24538c4aa	drm/amd/display: Add dpia debug option to control power management [Why] To provide option to dpia control power management [How] By adding disable_usb4_pm_support bit field in dpia_debug option to control dpia power management Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:35:23 -04:00
Alex Deucher	b3e9bfd866	drm/amdgpu/gfx11: add ring reset callbacks Add ring reset callbacks for gfx and compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:35:12 -04:00
Samson Tam	0ba3cb8e7c	drm/amd/display: re-enable Dynamic ODM policy [Why] Previous disable ODM policy due to underflow issue with sharpener. Issue is resolved after updating sharpening policy to apply to both windowed and fullscreen video [How] Remove sharpness check disabling Dynamic ODM policy Reviewed-by: Martin Leung <martin.leung@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:35:06 -04:00
Leo Li	988fe28626	drm/amd/display: Lock DC and exit IPS when changing backlight Backlight updates require aux and/or register access. Therefore, driver needs to disallow IPS beforehand. So, acquire the dc lock before calling into dc to update backlight - we should be doing this regardless of IPS. Then, while the lock is held, disallow IPS before calling into dc, then allow IPS afterwards (if it was previously allowed). Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:34:11 -04:00
Daniel Sa	c66db9e9a0	drm/amd/display: only trigger BIOS related assert for older ASICs [Why] Some asserts are always hit on startup/Pnp when they should only be used to indicate when something has gone wrong. [How] Ignore result of getting function from bios cmd table for newer asics. Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Daniel Sa <Daniel.Sa@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:34:02 -04:00
Nicholas Susanto	6f4835f9df	drm/amd/display: Fix DCN35 set min dispclk logic [Why] Setting min dispclk to 50Mhz outside clock lowering function causes unnecessary calls to SMU to lower dispclk and causes dentist hangs when there is no stream on the pipes. [How] Move the set minimum dispclk logic inside the lowering dispclk if statement. Fixes: `2344413205` ("DCN35 set min dispclk to 50Mhz") Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Nicholas Susanto <Nicholas.Susanto@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:32:42 -04:00
Prike Liang	ad17b124c3	drm/amdgpu/gfx9.4.3: Implement compute pipe reset Implement the compute pipe reset, and the driver will fallback to pipe reset when queue reset fails. The pipe reset only deactivates the queue which is scheduled in the pipe, and meanwhile the MEC pipe will be reset to the firmware _start pointer. So, it seems pipe reset will cost more cycles than the queue reset; therefore, the driver tries to recover by doing queue reset first. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:32:32 -04:00
Christian Brauner	641bb4394f	fs: move FMODE_UNSIGNED_OFFSET to fop_flags This is another flag that is statically set and doesn't need to use up an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit. (1) mem_open() used from proc_mem_operations (2) adi_open() used from adi_fops (3) drm_open_helper(): (3.1) accel_open() used from DRM_ACCEL_FOPS (3.2) drm_open() used from (3.2.1) amdgpu_driver_kms_fops (3.2.2) psb_gem_fops (3.2.3) i915_driver_fops (3.2.4) nouveau_driver_fops (3.2.5) panthor_drm_driver_fops (3.2.6) radeon_driver_kms_fops (3.2.7) tegra_drm_fops (3.2.8) vmwgfx_driver_fops (3.2.9) xe_driver_fops (3.2.10) DRM_GEM_FOPS (3.2.11) DEFINE_DRM_GEM_DMA_FOPS (4) struct memdev sets fmode flags based on type of device opened. For devices using struct mem_fops unsigned offset is used. Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts into the open helper to ensure that the flag is always set. Link: https://lore.kernel.org/r/20240809-work-fop_unsigned-v1-1-658e054d893e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-30 08:22:36 +02:00
Dave Airlie	4f7d8da5e3	drm-misc-next for v6.12: UAPI Changes: devfs: - support device numbers up to MINORBITS limit Core Changes: ci: - increase job timeout devfs: - use XArray for minor ids displayport: - mst: GUID improvements docs: - add fixes and cleanups panic: - optionally display QR code Driver Changes: amdgpu: - faster vblank disabling - GUID improvements gm12u320 - convert to struct drm_edid host1x: - fix syncpoint IRQ during resume - use iommu_paging_domain_alloc() imx: - ipuv3: convert to struct drm_edid omapdrm: - improve error handling panel: - add support for BOE TV101WUM-LL2 plus DT bindings - novatek-nt35950: improve error handling - nv3051d: improve error handling - panel-edp: add support for BOE NE140WUM-N6G; revert support for SDC ATNA45AF01 - visionox-vtdr6130: improve error handling; use devm_regulator_bulk_get_const() renesas: - rz-du: add support for RZ/G2UL plus DT bindings sti: - convert to struct drm_edid tegra: - gr3d: improve PM domain handling - convert to struct drm_edid -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmbQiVAACgkQaA3BHVML eiNgVggAqN5f9i0Rbk5tTfasBBSq0qiNE3X7mHDFQsAY+iGRJUzuYlIjjATSunsB HnqcA0aoT3CaBpl1drRTg1wCWRXBZrnAG2mgVa/eGBjrSH2i2d9IgxcNT2XvQkI5 K4Ac2Ulpr+57d8nHmeEjztQusD2vaDtNH7b6pU2wNmZkiqUCbzcLn9GuL9OF8tSh 6EApiPExbBASQeV0+xVt7mbtasclzFf8wukQXlK8zlWDeHTTTRibBwRy1txyqdG3 qnBCabVTorgah81vBezXegrro4yQ1ITo5A1ZTYYJroA70mqMN5cm5kYasIb1zqXP f/xXGLB/a96bV9zqEFhWGInlEqGthA== =1fX2 -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2024-08-29' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: devfs: - support device numbers up to MINORBITS limit Core Changes: ci: - increase job timeout devfs: - use XArray for minor ids displayport: - mst: GUID improvements docs: - add fixes and cleanups panic: - optionally display QR code Driver Changes: amdgpu: - faster vblank disabling - GUID improvements gm12u320 - convert to struct drm_edid host1x: - fix syncpoint IRQ during resume - use iommu_paging_domain_alloc() imx: - ipuv3: convert to struct drm_edid omapdrm: - improve error handling panel: - add support for BOE TV101WUM-LL2 plus DT bindings - novatek-nt35950: improve error handling - nv3051d: improve error handling - panel-edp: add support for BOE NE140WUM-N6G; revert support for SDC ATNA45AF01 - visionox-vtdr6130: improve error handling; use devm_regulator_bulk_get_const() renesas: - rz-du: add support for RZ/G2UL plus DT bindings sti: - convert to struct drm_edid tegra: - gr3d: improve PM domain handling - convert to struct drm_edid Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240829144654.GA145538@linux.fritz.box	2024-08-30 13:40:38 +10:00
Alex Deucher	6c0a7c3c69	drm/amdgpu: always allocate cleared VRAM for GEM allocations This adds allocation latency, but aligns better with user expectations. The latency should improve with the drm buddy clearing patches that Arun has been working on. In addition this fixes the high CPU spikes seen when doing wipe on release. v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528 Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Acked-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com>	2024-08-29 14:56:37 -04:00
Jack Xiao	52491d97aa	drm/amdgpu/mes: add mes mapping legacy queue switch For mes11 old firmware has issue to map legacy queue, add a flag to switch mes to map legacy queue. Fixes: `f9d8c5c785` ("drm/amdgpu/gfx: enable mes to map legacy queue support") Reported-by: Andrew Worsley <amworsley@gmail.com> Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:41:02 -04:00
Yifan Zhang	96316211eb	drm/amdkfd: Don't drain ih1 for APU ih1 is not initialized for APUs. Don't drain it or NULL pointer error will be triggered. Fixes: `6ef29715ac` ("drm/amdkfd: Change kfd/svm page fault drain handling") Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:40:00 -04:00
Alex Deucher	1125f95cd2	drm/amdgpu/gfx12: return early in preempt_ib() When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:57 -04:00
Alex Deucher	1e487c9173	drm/amdgpu/gfx11: return early in preempt_ib() When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:48 -04:00
Leo Li	28d43d0895	drm/amd/display: Determine IPS mode by ASIC and PMFW versions [Why] DCN IPS interoperates with other system idle power features, such as Zstates. On DCN35, there is a known issue where system Z8 + DCN IPS2 causes a hard hang. We observe this on systems where the SBIOS allows Z8. Though there is a SBIOS fix, there's no guarantee that users will get it any time soon, or even install it. A workaround is needed to prevent this from rearing its head in the wild. [How] For DCN35, check the pmfw version to determine whether the SBIOS has the fix. If not, set IPS1+RCG as the deepest possible state in all cases except for s0ix and display off (DPMS). Otherwise, enable all IPS Signed-off-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:37 -04:00
Sunil Khatri	30e8f4c2bd	drm/amdgpu: Move the dumping log out of for loop log message "Dumping IP State Completed" needs to be logged only once when state dumping is complete. Hence moving it out of the for loop. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Trigger Huang <Trigger.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:19 -04:00
Victor Zhao	af76ca8e18	drm/amd/amdgpu: move drain_workqueue before shutdown is set [background] when unloading amdgpu driver right after running a workload, drain_workqueue is causing "Fence fallback timer expired on ring sdma0.0". Under sriov, this issue will cause sriov full access timeout and a reset happening. move drain_workqueue before shutdown is set to allow ih process and before enter full access under sriov to avoid full access time cost. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:07 -04:00
Trigger Huang	c67db6a6a6	drm/amdgpu: Do core dump immediately when job tmo Do the coredump immediately after a job timeout to get a closer representation of GPU's error status. V2: This will skip printing vram_lost as the GPU reset is not happened yet (Alex) V3: Unconditionally call the core dump as we care about all the reset functions(soft-recovery and queue reset and full adapter reset, Alex) V4: Do the dump after adev->job_hang = true (Sunil) Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Acked-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:00 -04:00
Trigger Huang	6122f5c72e	drm/amdgpu: skip printing vram_lost if needed The vm lost status can only be obtained after a GPU reset occurs, but sometimes a dev core dump can be happened before GPU reset. So a new argument is added to tell the dev core dump implementation whether to skip printing the vram_lost status in the dump. And this patch is also trying to decouple the core dump function from the GPU reset function, by replacing the argument amdgpu_reset_context with amdgpu_job to specify the context for core dump. V2: Inform user if VRAM lost check is skipped so users don't assume VRAM wasn't lost (Alex) Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:38:53 -04:00
Alex Deucher	7c1a2d8aba	drm/amdgpu/gfx9: put queue resets behind a debug option Pending extended validation. Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:38:48 -04:00
Alex Deucher	a9b67c036c	drm/amdgpu: add experimental resets debug flag Add this flag to enable experimental resets for testing before they are fully validated. Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:38:36 -04:00
Fangzhi Zuo	7745a1dee0	drm/amdgpu/display: Fix a mistake in revert commit [why] It is to fix in try_disable_dsc() due to misrevert of commit `338567d176` ("drm/amd/display: Fix MST BW calculation Regression") [How] Fix restoring minimum compression bw by 'max_kbps', instead of native bw 'stream_kbps' Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:35:52 -04:00
Jani Nikula	b71ccff68e	drm/amd/display: switch to guid_gen() to generate valid GUIDs Instead of just smashing jiffies into a GUID, use guid_gen() to generate RFC 4122 compliant GUIDs. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240812122312.1567046-3-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2024-08-29 11:22:43 +03:00
Jani Nikula	33929707b8	drm/mst: switch to guid_t type for GUID The kernel has a guid_t type for GUIDs. Switch to using it, but avoid any functional changes here. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240812122312.1567046-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2024-08-29 11:21:25 +03:00
Candice Li	849f0d5880	drm/amd/pm: Drop unsupported features on smu v14_0_2 Drop unsupported features on smu v14_0_2. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3376f922bf`)	2024-08-28 10:07:37 -04:00
Lijo Lazar	badfdc6211	drm/amd/pm: Add support for new P2S table revision Add p2s table support for a new revision of SMUv13.0.6. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `010cc730ac`)	2024-08-28 10:06:21 -04:00
Likun Gao	6d5064c379	drm/amdgpu: support for gc_info table v1.3 Add gc_info table v1.3 for IP discovery. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `875ff9a7ee`)	2024-08-28 10:05:54 -04:00
Ma Ke	3b9a33235c	drm/amd/display: avoid using null object of framebuffer Instead of using state->fb->obj[0] directly, get object from framebuffer by calling drm_gem_fb_get_obj() and return error code when object is null to avoid using null object of framebuffer. Fixes: `5d945cbcd4` ("drm/amd/display: Create a file dedicated to planes") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `73dd0ad9e5`)	2024-08-28 10:05:46 -04:00
Alex Deucher	959fc102ff	drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `40318a2406`)	2024-08-28 10:04:53 -04:00
Kenneth Feng	37a45fb8db	drm/amd/pm: update message interface for smu v14.0.2/3 update message interface for smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `01bfabc2d1`)	2024-08-28 10:04:19 -04:00
Alex Deucher	d420c857d8	drm/amdgpu/swsmu: always force a state reprogram on init Always reprogram the hardware state on init. This ensures the PMFW state is explicitly programmed and we are not relying on the default PMFW state. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c50fe289ed`) Cc: stable@vger.kernel.org	2024-08-28 10:03:03 -04:00
Alex Deucher	948f279dc4	drm/amdgpu/smu13.0.7: print index for profiles Print the index for the profiles. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3543 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b86a6a57b8`)	2024-08-28 10:01:07 -04:00
Alex Deucher	8f614469de	drm/amdgpu: align pp_power_profile_mode with kernel docs The kernel doc says you need to select manual mode to adjust this, but the code only allows you to adjust it when manual mode is not selected. Remove the manual mode check. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `bbb05f8a9c`) Cc: stable@vger.kernel.org	2024-08-28 10:00:25 -04:00
Alex Deucher	c50fe289ed	drm/amdgpu/swsmu: always force a state reprogram on init Always reprogram the hardware state on init. This ensures the PMFW state is explicitly programmed and we are not relying on the default PMFW state. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:56:10 -04:00
Zaeem Mohamed	e45a3933bb	drm/amdgpu/display: remove unnecessary TODO spl_os_types.h Remove unnecessary TODO from spl_os_types.h Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:56:02 -04:00
Zaeem Mohamed	ff95eabe57	drm/amdgpu/display: SPDX copyright for spl_os_types.h Use appropriate SPDX copyright for spl_os_types.h Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:55:56 -04:00
Fangzhi Zuo	3715112c1b	drm/amd/display: Add DSC Debug Log Add DSC log in each critical routines to facilitate debugging. Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:55:49 -04:00
Aric Cyr	38d6f7e27d	drm/amd/display: 3.2.298 This version brings along the following fixes: - Fix MS/MP mismatches in dml21 for dcn401 - Resolved Coverity issues - Add back quality EASF and ISHARP and dc dependency changes - Add sharpness support for windowed YUV420 video - Add improvements for text display and HDR DWM and MPO - Fix Synaptics Cascaded Panamera DSC Determination - Allocate DCN35 clock table transfer buffers in GART - Add Replay Low Refresh Rate parameters in dc type Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:55:37 -04:00
Samson Tam	469a486541	drm/amd/display: add sharpness support for windowed YUV420 video [Why] Previous only applied sharpness for fullscreen YUV420 video. [How] Remove fullscrene restriction and apply sharpness for windowed YUV420 video as well. Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:55:27 -04:00
Samson Tam	1b0ce903fe	drm/amd/display: add improvements for text display and HDR DWM and MPO [Why] Tune settings for improved text display. Handle differences between DWM and MPO in HDR path. [How] Update sharpener LBA table. Use HDR multiplier to calculate scalar matrix coefficients for HDR RGB MPO path. Update unit tests. Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:55:20 -04:00
Dennis Chan	b4148dc2fa	drm/amd/display: Add Replay Low Refresh Rate parameters in dc type. Why: To supported Low Refresh Rate panel for Replay Feature, Adding some parameters to record Low Refresh Rate information. Reviewed-by: Robin Chen <robin.chen@amd.com> Signed-off-by: Dennis Chan <dennis.chan@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:55:02 -04:00
Samson Tam	6efc0ab3b0	drm/amd/display: add back quality EASF and ISHARP and dc dependency changes [Why] Addressed previous issues with quality changes and new issues due to rolling back quality changes. [How] This reverts commit `f9e6759888`, fixes merge conflicts, and fixed some formatting errors. Store current sharpness level for each pregen table to minimize calculating sharpness table every time. Disable dynamic ODM when sharpness is enabled. Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:53:58 -04:00
Nicholas Kazlauskas	9793a4a6e5	drm/amd/display: Notify DMCUB of D0/D3 state [Why] We want to avoid arming the HPD timer in firmware when preparing for S0i3 entry when DC is considered in D3. [How] Notify DMCUB of the power state transitions so it can decide to arm the HPD timer for idle in DCN35 only in D0. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:53:51 -04:00
Fangzhi Zuo	4437936c6b	drm/amd/display: Fix Synaptics Cascaded Panamera DSC Determination Synaptics Cascaded Panamera topology needs to unconditionally acquire root aux for dsc decoding. Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:53:27 -04:00
ChunTao Tso	e565b6b0b5	drm/amd/display: Retry Replay residency [Why] Because sometime DMUB GPINT will time out, it will cause we return 0 as residency number. [How] Retry to avoid this happened. Reviewed-by: Robin Chen <robin.chen@amd.com> Signed-off-by: ChunTao Tso <ChunTao.Tso@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:53:14 -04:00
Nicholas Kazlauskas	6692982582	drm/amd/display: Allocate DCN35 clock table transfer buffers in GART [Why] Request from PMFW to use GART for clock table transfer tables as framebuffer is being deprecated on APU. [How] Switch over to GART via the allocation flag. Reviewed-by: Sung joon Kim <sungjoon.kim@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:52:44 -04:00
Aurabindo Pillai	87d23164d8	drm/amd/display: do not set traslate_by_source for DCN401 cursor translate_by_source need not be set for DCN401 onwards since cursor cursor composition comes after scaler in the hardware pipeline. Hence offset calculation has been reworked, and this setting is not necessary to be enabled anymore. Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:52:23 -04:00
Daniel Sa	6dcc304f85	drm/amd/display: Resolve Coverity Issues [WHY] Remove coverity issues that were originally ignored. [HOW] Ran coverity locally on driver, used output report to find existing coverity issues, resolved them Reviewed-by: Nicholas Choi <nicholas.choi@amd.com> Signed-off-by: Daniel Sa <Daniel.Sa@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:52:15 -04:00
Dillon Varone	949237a34d	drm/amd/display: Fix MS/MP mismatches in dml21 for dcn401 [WHY] Prefetch calculations did not guarantee that bandwidth required in mode support was less than mode programming which can cause failures. [HOW] Fix bandwidth calculations to assume fixed times for OTO schedule, and choose which schedule to use based on time to fetch pixel data. Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:52:07 -04:00
Alvin Lee	f0b7dcf258	drm/amd/display: Wait for all pending cleared before full update [Description] Before every full update we must wait for all pending updates to be cleared - this is particularly important for minimal transitions because if we don't wait for pending cleared, it will be as if there was no minimal transition at all. In OTG we must read 3 different status registers for pending cleared, one specifically for OTG updates, one specifically for OPTC updates, and the last for surface related updates Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:52:01 -04:00
Ahmed, Muhammad	5d666496c2	drm/amd/display: guard write a 0 post_divider value to HW [why] post_divider_value should not be 0. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ahmed, Muhammad <Ahmed.Ahmed@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:51:53 -04:00
Alvin Lee	3d054c4076	drm/amd/display: Don't skip clock updates in overclocking [Description] Skipping clock updates is not a hard requirement for overclocking and only an optimization. Remove the skip as this can cause issues for FAMS transitions during the overclock sequence. If FAMS is enabled we must disable UCLK switch on any full update (which requires update clocks to be called). Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:51:44 -04:00
Leo Li	a08d75927f	drm/amd: Introduce additional IPS debug flags [Why] Idle power states (IPS) describe levels of power-gating within DCN. DM and DC is responsible for ensuring that we are out of IPS before any DCN programming happens. Any DCN programming while we're in IPS leads to undefined behavior (mostly hangs). Because IPS intersects with all display features, the ability to disable IPS by default while ironing out the known issues is desired. However, disabing it completely will cause important features such as s0ix entry to fail. Therefore, more granular IPS debug flags are desired. [How] Extend the dc debug mask bits to include the available list of IPS debug flags. All the flags should work as documented, with the exception of IPS_DISABLE_DYNAMIC. It requires dm changes which will be done in later changes. v2: enable docs and fix docstring format Signed-off-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:51:32 -04:00
Alex Deucher	b86a6a57b8	drm/amdgpu/smu13.0.7: print index for profiles Print the index for the profiles. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3543 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:51:23 -04:00
Alex Deucher	b932d5ad92	drm/amdgpu/swsmu: fix ordering for setting workload_mask No change in functionality for the current code, but we need to set the index properly before changing it if we ever use a non-0 index. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:51:17 -04:00
Alex Deucher	bbb05f8a9c	drm/amdgpu: align pp_power_profile_mode with kernel docs The kernel doc says you need to select manual mode to adjust this, but the code only allows you to adjust it when manual mode is not selected. Remove the manual mode check. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-27 17:51:06 -04:00
Daniel Vetter	e55ef65510	amd-drm-next-6.12-2024-08-26: amdgpu: - SDMA devcoredump support - DCN 4.0.1 updates - DC SUBVP fixes - Refactor OPP in DC - Refactor MMHUBBUB in DC - DC DML 2.1 updates - DC FAMS2 updates - RAS updates - GFX12 updates - VCN 4.0.3 updates - JPEG 4.0.3 updates - Enable wave kill (soft recovery) for compute queues - Clean up CP error interrupt handling - Enable CP bad opcode interrupts - VCN 4.x fixes - VCN 5.x fixes - GPU reset fixes - Fix vbios embedded EDID size handling - SMU 14.x updates - Misc code cleanups and spelling fixes - VCN devcoredump support - ISP MFD i2c support - DC vblank fixes - GFX 12 fixes - PSR fixes - Convert vbios embedded EDID to drm_edid - DCN 3.5 updates - DMCUB updates - Cursor fixes - Overdrive support for SMU 14.x - GFX CP padding optimizations - DCC fixes - DSC fixes - Preliminary per queue reset infrastructure - Initial per queue reset support for GFX 9 - Initial per queue reset support for GFX 7, 8 - DCN 3.2 fixes - DP MST fixes - SR-IOV fixes - GFX 9.4.3/4 devcoredump support - Add process isolation framework - Enable process isolation support for GFX 9.4.3/4 - Take IOMMU remapping into account for P2P DMA checks amdkfd: - CRIU fixes - Improved input validation for user queues - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Initial per queue reset support for GFX 9 - Allow users to target recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking UAPI: - KFD support for targetting queues on recommended SDMA engines Proposed userspace: `2f588a2406` `eb30a5bbc7` drm/buddy: - Add start address support for trim function -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZszhcQAKCRC93/aFa7yZ 2M4ZAQD+xgIJkQ9HISQeqER5GblnfrorARd32yP/BH0c+JbGUAD9H/BIB41teZ80 vw2WTx+4TyB39awgvtpDH8iEQdkcSAE= =717w -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.12-2024-08-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.12-2024-08-26: amdgpu: - SDMA devcoredump support - DCN 4.0.1 updates - DC SUBVP fixes - Refactor OPP in DC - Refactor MMHUBBUB in DC - DC DML 2.1 updates - DC FAMS2 updates - RAS updates - GFX12 updates - VCN 4.0.3 updates - JPEG 4.0.3 updates - Enable wave kill (soft recovery) for compute queues - Clean up CP error interrupt handling - Enable CP bad opcode interrupts - VCN 4.x fixes - VCN 5.x fixes - GPU reset fixes - Fix vbios embedded EDID size handling - SMU 14.x updates - Misc code cleanups and spelling fixes - VCN devcoredump support - ISP MFD i2c support - DC vblank fixes - GFX 12 fixes - PSR fixes - Convert vbios embedded EDID to drm_edid - DCN 3.5 updates - DMCUB updates - Cursor fixes - Overdrive support for SMU 14.x - GFX CP padding optimizations - DCC fixes - DSC fixes - Preliminary per queue reset infrastructure - Initial per queue reset support for GFX 9 - Initial per queue reset support for GFX 7, 8 - DCN 3.2 fixes - DP MST fixes - SR-IOV fixes - GFX 9.4.3/4 devcoredump support - Add process isolation framework - Enable process isolation support for GFX 9.4.3/4 - Take IOMMU remapping into account for P2P DMA checks amdkfd: - CRIU fixes - Improved input validation for user queues - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Initial per queue reset support for GFX 9 - Allow users to target recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking UAPI: - KFD support for targetting queues on recommended SDMA engines Proposed userspace: `2f588a2406` `eb30a5bbc7` drm/buddy: - Add start address support for trim function From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826201528.55307-1-alexander.deucher@amd.com	2024-08-27 14:33:12 +02:00
Daniel Vetter	4461e9e5c3	Linux 6.11-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbK2B8eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFwkH/10QpUgzIfbFKbF+ 5hwcvaqS5myxWwJ4PjN0eR1qGE6RzVO0Tb24+TVql+7pxu+iWm1kYgC3+/T5xJsP ECAszdmPWSco1xaHrh2y3PyCJjaBiqFbIxdjPp7odjDpG9qarbcty8YpWs44u/gd RDXzHUuScEShBhEt0ZhvE1pIDL8jJ8JL3yqOMZ+XaDxtJbjaHw4GHp8efxlBWc8N jZKIVJi22q5NWG5T0tGtPWwzCm0ewA/JNMTEvE9leoSoAgO85NZ0ivxMC76q/tbj BrYk5KnzfhJs4b/n/KtIwWaLTgLyXKGqHMaMq8sbXtp410aUdgnRJO2cl3fI+1vc vxQfAfk= =RemI -----END PGP SIGNATURE----- Merge v6.11-rc5 into drm-next amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might as well catch up with a backmerge and handle them all. Plus both misc and intel maintainers asked for a backmerge anyway. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2024-08-27 14:09:45 +02:00
Hamza Mahfooz	58a261bfc9	drm/amd/display: use a more lax vblank enable policy for older ASICs Ideally, we want to drop the legacy vblank enable for older ASICs. This should be possible now, since we can now specify how many frames we need to wait before disabling vblanking instead of being forced to either choose between no delay (which can still be buggy) and drm_vblank_offdelay (which is much longer by default than is required on AMD hardware). Suggested-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240822161856.174600-4-hamza.mahfooz@amd.com	2024-08-23 11:53:51 -04:00
Hamza Mahfooz	e45b6716de	drm/amd/display: use a more lax vblank enable policy for DCN35+ Ideally, we want to enable immediate vblank disable, when possible and we should be able to do so on DCN35+, if PSR isn't supported by a given CRTC. Suggested-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240822161856.174600-3-hamza.mahfooz@amd.com	2024-08-23 11:53:51 -04:00
Hamza Mahfooz	537ef0f888	drm/amd/display: use new vblank enable policy for DCN35+ Hook up drm_crtc_vblank_on_config() in amdgpu_dm. So, that we can enable PSR and other static screen optimizations more quickly, while avoiding stuttering issues that are accompanied by the following dmesg error: [drm:dc_dmub_srv_wait_idle [amdgpu]] ERROR Error waiting for DMUB idle: status=3 This also allows us to mimic how vblanking is handled by the Windows amdgpu driver. Specifically, we wait two idle frames before disabling the vblank timer there. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240822161856.174600-2-hamza.mahfooz@amd.com	2024-08-23 11:53:51 -04:00
Candice Li	3376f922bf	drm/amd/pm: Drop unsupported features on smu v14_0_2 Drop unsupported features on smu v14_0_2. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:55:22 -04:00
Xiaogang Chen	6ef29715ac	drm/amdkfd: Change kfd/svm page fault drain handling When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and not handle any incoming pages fault of this process until a deferred work item got executed by default system wq. The time period of "not handle page fault" can be long and is unpredicable. That is advese to kfd performance on page faults recovery. This patch uses time stamp of incoming page fault to decide to drop or recover page fault. When app unmap vm ranges kfd records each gpu device's ih ring current time stamp. These time stamps are used at kfd page fault recovery routine. Any page fault happened on unmapped ranges after unmap events is application bug that accesses vm range after unmap. It is not driver work to cover that. By using time stamp of page fault do not need drain page faults at deferred work. So, the time period that kfd does not handle page faults is reduced and can be controlled. Signed-off-by: Xiaogang.Chen <Xiaogang.Chen@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:55:13 -04:00
Lijo Lazar	010cc730ac	drm/amd/pm: Add support for new P2S table revision Add p2s table support for a new revision of SMUv13.0.6. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:55:03 -04:00
Likun Gao	875ff9a7ee	drm/amdgpu: support for gc_info table v1.3 Add gc_info table v1.3 for IP discovery. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:54:57 -04:00
Ma Ke	73dd0ad9e5	drm/amd/display: avoid using null object of framebuffer Instead of using state->fb->obj[0] directly, get object from framebuffer by calling drm_gem_fb_get_obj() and return error code when object is null to avoid using null object of framebuffer. Fixes: `5d945cbcd4` ("drm/amd/display: Create a file dedicated to planes") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:54:11 -04:00
Yang Wang	4416377ae1	drm/amdgpu: add list empty check to avoid null pointer issue Add list empty check to avoid null pointer issues in some corner cases. - list_for_each_entry_safe() Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:45 -04:00
Jinjie Ruan	2845f51223	drm/amd/display: Make dcn401_dsc_funcs static The sparse tool complains as follows: drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dcn401/dcn401_dsc.c:30:24: warning: symbol 'dcn401_dsc_funcs' was not declared. Should it be static? This symbol is not used outside of dcn401_dsc.c, so marks it static. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:42 -04:00
Jinjie Ruan	570867ef90	drm/amd/display: Make dcn35_hubp_funcs static The sparse tool complains as follows: drivers/gpu/drm/amd/amdgpu/../display/dc/hubp/dcn35/dcn35_hubp.c:191:19: warning: symbol 'dcn35_hubp_funcs' was not declared. Should it be static? This symbol is not used outside of dcn35_hubp.c, so marks it static. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:39 -04:00
Jinjie Ruan	0e405395e0	drm/amd/display: Make core_dcn4_ip_caps_base static The sparse tool complains as follows: drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4.c:12:28: warning: symbol 'core_dcn4_ip_caps_base' was not declared. Should it be static? This symbol is not used outside of dcn35_hubp.c, so marks it static. And do not want to change it, so mark it const. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:36 -04:00
Jinjie Ruan	988bfa0bc6	drm/amd/display: Make core_dcn4_g6_temp_read_blackout_table static The sparse tool complains as follows: drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c:6853:56: warning: symbol 'core_dcn4_g6_temp_read_blackout_table' was not declared. Should it be static? This symbol is not used outside of dml2_core_dcn4_calcs.c, so marks it static. And not want to change it, so mark it const. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:31 -04:00
Alex Deucher	40318a2406	drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:25 -04:00
Hawking Zhang	b05d6476ae	drm/amdgpu: Retire query_utcl2_poison_status callback Driver switches to interrupt source id to identify utcl2 poison event. polling interface is not needed. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:16 -04:00
Rahul Jain	75f0efbc4b	drm/amdgpu: Take IOMMU remapping into account for p2p checks when trying to enable p2p the amdgpu_device_is_peer_accessible() checks the condition where address_mask overlaps the aper_base and hence returns 0, due to which the p2p disables for this platform IOMMU should remap the BAR addresses so the device can access them. Hence check if peer_adev is remapping DMA v5: (Felix, Alex) - fixing comment as per Alex feedback - refactor code as per Felix v4: (Alex) - fix the comment and description v3: - remove iommu_remap variable v2: (Alex) - Fix as per review comments - add new function amdgpu_device_check_iommu_remap to check if iommu remap Signed-off-by: Rahul Jain <Rahul.Jain@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:05 -04:00
Kenneth Feng	01bfabc2d1	drm/amd/pm: update message interface for smu v14.0.2/3 update message interface for smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:52:59 -04:00
Hawking Zhang	e28604d833	drm/amdkfd: Drop poison hanlding from gfx v10 Not supported. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:52:51 -04:00
Hawking Zhang	db6341a916	drm/amdkfd: Check int source id for utcl2 poison event Traditional utcl2 fault_status polling does not work in SRIOV environment. The polling of fault status register from guest side will be dropped by hardware. Driver should switch to check utcl2 interrupt source id to identify utcl2 poison event. It is set to 1 when poisoned data interrupts are signaled. v2: drop the unused local variable (Tao) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:52:33 -04:00
Alex Deucher	9cead81eff	drm/amdgpu: fix eGPU hotplug regression The driver needs to wait for the on board firmware to finish its initialization before probing the card. Commit `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") switched from using msleep() to using usleep_range() which seems to have caused init failures on some navi1x boards. Switch back to msleep(). Fixes: `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Ma Jun <Jun.Ma2@amd.com> (cherry picked from commit `c69b07f7bb`) Cc: stable@vger.kernel.org # 6.10.x	2024-08-20 23:07:11 -04:00
Candice Li	c99769bcea	drm/amdgpu: Validate TA binary size Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c0a04e3570`) Cc: stable@vger.kernel.org	2024-08-20 23:04:17 -04:00
Alex Deucher	e3e4bf58ba	drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1 The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: `a03ebf1163` ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2dc3851ef7`) Cc: stable@vger.kernel.org	2024-08-20 22:51:37 -04:00
Yang Wang	0b43312902	drm/amdgpu: fixing rlc firmware loading failure issue Skip rlc firmware validation to ignore firmware header size mismatch issues. This restores the workaround added in commit `849e133c97` ("drm/amdgpu: Fix the null pointer when load rlc firmware") Fixes: `3af2c80ae2` ("drm/amdgpu: refine gfx10 firmware loading") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `89ec85d16e`)	2024-08-20 22:51:31 -04:00
Alex Deucher	88c511dea1	drm/amd/gfx11: move the gfx mutex into the caller Otherwise we can fail to drop the software mutex when we fail to take the hardware mutex. Fixes: `76acba7b7f` ("drm/amdgpu/gfx11: add a mutex for the gfx semaphore") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:14 -04:00
Tim Huang	186fb12e7a	drm/amd/pm: ensure the fw_info is not null before using it This resolves the dereference null return value warning reported by Coverity. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:14 -04:00
Victor Zhao	bf2bc61638	drm/amd/amdgpu: allow use kiq to do hdp flush under sriov when use cpu to do page table update under sriov runtime, since mmio access is blocked, kiq has to be used to flush hdp. change WREG32_NO_KIQ to WREG32 to allow kiq. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:14 -04:00
Alex Deucher	c69b07f7bb	drm/amdgpu: fix eGPU hotplug regression The driver needs to wait for the on board firmware to finish its initialization before probing the card. Commit `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") switched from using msleep() to using usleep_range() which seems to have caused init failures on some navi1x boards. Switch back to msleep(). Fixes: `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Ma Jun <Jun.Ma2@amd.com>	2024-08-20 22:14:14 -04:00
Martin Leung	e389eefe34	drm/amd/display: Promote DC to 3.2.297 - Various DML 2.1 fixes - Fix module unload - Fix construct_phy with MXM connector - Support UHBR10 link rate on eDP - Revert updated DCCG wrappers Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Martin Leung <Martin.Leung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:14 -04:00
Austin Zheng	d07722e1fc	drm/amd/display: DML2.1 Reintegration for Various Fixes [Why and How] DML2.1 reintegration for several fixes and updates to the DML code. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Roman Li <roman.li@amd Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:14 -04:00
Tim Huang	20b5a8f9f4	drm/amd/display: fix double free issue during amdgpu module unload Flexible endpoints use DIGs from available inflexible endpoints, so only the encoders of inflexible links need to be freed. Otherwise, a double free issue may occur when unloading the amdgpu module. [ 279.190523] RIP: 0010:__slab_free+0x152/0x2f0 [ 279.190577] Call Trace: [ 279.190580] <TASK> [ 279.190582] ? show_regs+0x69/0x80 [ 279.190590] ? die+0x3b/0x90 [ 279.190595] ? do_trap+0xc8/0xe0 [ 279.190601] ? do_error_trap+0x73/0xa0 [ 279.190605] ? __slab_free+0x152/0x2f0 [ 279.190609] ? exc_invalid_op+0x56/0x70 [ 279.190616] ? __slab_free+0x152/0x2f0 [ 279.190642] ? asm_exc_invalid_op+0x1f/0x30 [ 279.190648] ? dcn10_link_encoder_destroy+0x19/0x30 [amdgpu] [ 279.191096] ? __slab_free+0x152/0x2f0 [ 279.191102] ? dcn10_link_encoder_destroy+0x19/0x30 [amdgpu] [ 279.191469] kfree+0x260/0x2b0 [ 279.191474] dcn10_link_encoder_destroy+0x19/0x30 [amdgpu] [ 279.191821] link_destroy+0xd7/0x130 [amdgpu] [ 279.192248] dc_destruct+0x90/0x270 [amdgpu] [ 279.192666] dc_destroy+0x19/0x40 [amdgpu] [ 279.193020] amdgpu_dm_fini+0x16e/0x200 [amdgpu] [ 279.193432] dm_hw_fini+0x26/0x40 [amdgpu] [ 279.193795] amdgpu_device_fini_hw+0x24c/0x400 [amdgpu] [ 279.194108] amdgpu_driver_unload_kms+0x4f/0x70 [amdgpu] [ 279.194436] amdgpu_pci_remove+0x40/0x80 [amdgpu] [ 279.194632] pci_device_remove+0x3a/0xa0 [ 279.194638] device_remove+0x40/0x70 [ 279.194642] device_release_driver_internal+0x1ad/0x210 [ 279.194647] driver_detach+0x4e/0xa0 [ 279.194650] bus_remove_driver+0x6f/0xf0 [ 279.194653] driver_unregister+0x33/0x60 [ 279.194657] pci_unregister_driver+0x44/0x90 [ 279.194662] amdgpu_exit+0x19/0x1f0 [amdgpu] [ 279.194939] __do_sys_delete_module.isra.0+0x198/0x2f0 [ 279.194946] __x64_sys_delete_module+0x16/0x20 [ 279.194950] do_syscall_64+0x58/0x120 [ 279.194954] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 279.194980] </TASK> Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:14 -04:00
Nicholas Susanto	2344413205	drm/amd/display: DCN35 set min dispclk to 50Mhz [Why] Causes hard hangs when resuming after display off on extended/duplicate modes [How] Set the min dispclk to 50Mhz for DCN35 Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Nicholas Susanto <Nicholas.Susanto@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Ilya Bakoulin	ec9e2e7acc	drm/amd/display: Fix construct_phy with MXM connector [Why/How] The call to construct_phy will fail in cases where connector type is MXM, and the dc_link won't be properly created/initialized. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Sung Joon Kim	f327189389	drm/amd/display: Support UHBR10 link rate on eDP [why] Supporting UHBR10 link rate on eDP leverages the existing DP2.0 code but need to add some small adjustments in code. [how] Acknowledge the given DPCD caps for UHBR10 link rate support and allow DP2.0 programming sequence and link training for eDP. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Sung Joon Kim <Sungjoon.Kim@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Nevenko Stupar	272e6aab14	drm/amd/display: Hardware cursor changes color when switched to software cursor [Why & How] DCN4 Cursor has separate degamma block and should always do Cursor degamma for Cursor color modes. Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Nevenko Stupar <Nevenko.Stupar@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Michael Strauss	4e9e50b6ae	drm/amd/display: Allow UHBR Interop With eDP Supported Link Rates Table [WHY] eDP 2.0 is introducing support for UHBR link rates, however current eDP ILR link optimization does not account for UHBR capabilities. Either UHBR capabilities will be provided via the same 128b/132b rate DPCD caps that are currently used on DP2.1, or Table 4-13 may be updated to include UHBR rates. [HOW] Add extra Supported Link Rates table translations for UHBR10/13.5/20. Update eDP link setting optimization search to be aware of 128b/132b DPCD rate caps in order to unblock UHBR on panels with Supported Link Rates table. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Nicholas Susanto	7c9cb6d1bf	drm/amd/display: Remove redundant check in DCN35 hwseq Removing redundant condition. Reviewed-by: Hansen Dsouza <Hansen.Dsouza@amd.com> Signed-off-by: Nicholas Susanto <Nicholas.Susanto@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Aurabindo Pillai	8783a18409	drm/amd/display: remove an extraneous call for checking dchub clock when removing the amdgpu module and reinserting it, a call trace is triggered: [ 334.230602] RIP: 0010:hubbub2_get_dchub_ref_freq+0xbb/0xe0 [amdgpu] [ 334.230807] Code: 25 28 00 00 00 75 3c 48 8d 65 f0 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 45 31 db e9 55 a1 ca de <0f> 0b eb c6 0f 0b eb c2 d1 eb 8d 83 c0 63 ff ff 3d 20 4e 00 00 76 [ 334.230809] RSP: 0018:ffffbc8b823fb540 EFLAGS: 00010246 [ 334.230811] RAX: 0000000000001000 RBX: 00000000000186a0 RCX: 0000000000000000 [ 334.230812] RDX: ffffbc8b823fb544 RSI: 0000000000000000 RDI: 0000000000000000 [ 334.230813] RBP: ffffbc8b823fb560 R08: 0000000000000000 R09: 0000000000000000 [ 334.230814] R10: 0000000000000000 R11: 000000000000000f R12: ffff9e644f1f2bb0 [ 334.230815] R13: ffff9e6451361300 R14: 0000000000000000 R15: ffff9e6452c00000 [ 334.230816] FS: 00007af7c8519000(0000) GS:ffff9e737dd00000(0000) knlGS:0000000000000000 [ 334.230817] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 334.230818] CR2: 0000703576b9cbd0 CR3: 00000001095a2000 CR4: 0000000000750ee0 [ 334.230819] PKRU: 55555554 [ 334.230820] Call Trace: [ 334.230822] <TASK> [ 334.230824] ? show_regs+0x6d/0x80 [ 334.230828] ? __warn+0x89/0x160 [ 334.230832] ? hubbub2_get_dchub_ref_freq+0xbb/0xe0 [amdgpu] [ 334.231024] ? report_bug+0x17e/0x1b0 [ 334.231028] ? handle_bug+0x46/0x90 [ 334.231030] ? exc_invalid_op+0x18/0x80 [ 334.231032] ? asm_exc_invalid_op+0x1b/0x20 [ 334.231036] ? hubbub2_get_dchub_ref_freq+0xbb/0xe0 [amdgpu] [ 334.231217] dc_create_resource_pool+0xfd/0x320 [amdgpu] [ 334.231408] dc_create+0x256/0x700 [amdgpu] [ 334.231588] ? srso_alias_return_thunk+0x5/0x7f [ 334.231590] ? dmi_matches+0xa0/0x230 [ 334.231594] amdgpu_dm_init+0x28c/0x25f0 [amdgpu] [ 334.231791] ? prb_read_valid+0x1c/0x30 [ 334.231795] ? __irq_work_queue_local+0x43/0xf0 [ 334.231798] ? srso_alias_return_thunk+0x5/0x7f [ 334.231800] ? irq_work_queue+0x2f/0x70 [ 334.231802] ? srso_alias_return_thunk+0x5/0x7f [ 334.231803] ? __wake_up_klogd.part.0+0x40/0x70 [ 334.231805] ? srso_alias_return_thunk+0x5/0x7f [ 334.231807] ? vprintk_emit+0xd9/0x210 [ 334.231809] ? set_dev_info+0x130/0x1c0 [ 334.231812] ? srso_alias_return_thunk+0x5/0x7f [ 334.231813] ? dev_printk_emit+0xa1/0xe0 [ 334.231819] dm_hw_init+0x14/0x30 [amdgpu] [ 334.231993] amdgpu_device_init+0x23c7/0x2fc0 [amdgpu] [ 334.232134] ? pci_read_config_word+0x25/0x50 [ 334.232139] amdgpu_driver_load_kms+0x1a/0xd0 [amdgpu] [ 334.232284] amdgpu_pci_probe+0x1f9/0x620 [amdgpu] On DCN401, get_dchub_ref_freq() hook is called before init_hw() hook. Hence, it is expected to trigger an assert. Remove the extraneous call to get_dchub_ref_freq() to suppress the call trace Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Michael Strauss	9de60462cd	drm/amd/display: Update HPO I/O When Handling Link Retrain Automation Request [WHY] Previous multi-display HPO fix moved where HPO I/O enable/disable is performed. The codepath now taken to enable/disable HPO I/O is not used for compliance test automation, meaning that if a compliance box being driven at a DP1 rate requests retrain at UHBR, HPO I/O will remain off if it was previously off. [HOW] Explicitly update HPO I/O after allocating encoders for test request. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Hansen Dsouza	18ac82c26d	Revert "drm/amd/display: Update to using new dccg callbacks" [Why] Revert updated DCCG wrappers due to regression [How] This reverts commit `680458d41a`. Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Hansen Dsouza <Hansen.Dsouza@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Candice Li	c0a04e3570	drm/amdgpu: Validate TA binary size Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Mukul Joshi	eb067d65c3	drm/amdkfd: Update BadOpcode Interrupt handling with MES Based on the recommendation of MEC FW, update BadOpcode interrupt handling by unmapping all queues, removing the queue that got the interrupt from scheduling and remapping rest of the queues back when using MES scheduler. This is done to prevent the case where unmapping of the bad queue can fail thereby causing a GPU reset. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Mukul Joshi	9a16042f02	drm/amdkfd: Update queue unmap after VM fault with MES MEC FW expects MES to unmap all queues when a VM fault is observed on a queue and then resumed once the affected process is terminated. Use the MES Suspend and Resume APIs to achieve this. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Mukul Joshi	ccf8ef6b75	drm/amdgpu: Implement MES Suspend and Resume APIs for GFX11 Add implementation for MES Suspend and Resume APIs to unmap/map all queues for GFX11. Support for GFX12 will be added when the corresponding firmware support is in place. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:04 -04:00
Amber Lin	87758a0ef1	drm/amdkfd: Enable processes isolation on gfx9 When amdgpu enable enforce_isolation, KFD enables single-process mode in HWS and sets exec_cleaner_shader bit in MAP_PROCESS. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:08:07 -04:00
Srinivasan Shanmugam	f846250b8a	drm/amdgpu/gfx_v9_4_3: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_4_3 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. The commit also includes a check for the type of the ring. If the type of the ring is `AMDGPU_RING_TYPE_COMPUTE`, the `xcp_id` of the `enforce_isolation` structure in the `gfx` structure of the `amdgpu_device` is set to the `xcp_id` of the ring. This ensures that the correct `xcp_id` is used when enforcing isolation on compute rings. The `xcp_id` is an identifier for an XCP partition, and different rings can be associated with different XCP partitions. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	2024-08-20 22:08:02 -04:00
Srinivasan Shanmugam	b710dbe55d	drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-20 22:07:58 -04:00
Srinivasan Shanmugam	afefd6f245	drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization This commit introduces the Enforce Isolation Handler designed to enforce shader isolation on AMD GPUs, which helps to prevent data leakage between different processes. The handler counts the number of emitted fences for each GFX and compute ring. If there are any fences, it schedules the `enforce_isolation_work` to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences, it signals the Kernel Fusion Driver (KFD) to resume the runqueue. The function is synchronized using the `enforce_isolation_mutex`. This commit also introduces a reference count mechanism (kfd_sch_req_count) to keep track of the number of requests to enable the KFD scheduler. When a request to enable the KFD scheduler is made, the reference count is decremented. When the reference count reaches zero, a delayed work is scheduled to enforce isolation after a delay of GFX_SLICE_PERIOD. When a request to disable the KFD scheduler is made, the function first checks if the reference count is zero. If it is, it cancels the delayed work for enforcing isolation and checks if the KFD scheduler is active. If the KFD scheduler is active, it sends a request to stop the KFD scheduler and sets the KFD scheduler state to inactive. Then, it increments the reference count. The function is synchronized using the kfd_sch_mutex to ensure that the KFD scheduler state and reference count are updated atomically. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:35 -04:00
Amber Lin	234eebe161	drm/amdkfd: APIs to stop/start KFD scheduling Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling. When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from runlist. If users send ioctls to KFD to create queues, they'll be added but those queues won't be mapped to runlist (so not scheduled) until amdgpu_amdkfd_start_sched is called. v2: fix build (Alex) Signed-off-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:28 -04:00
Srinivasan Shanmugam	b1f49ff9cb	drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware This commit extends the cleaner shader feature to support GFX9.4.4 hardware. The cleaner shader feature is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). This operation needs to be performed in isolation, while no other tasks should be running on the GPU at the same time. Previously, the cleaner shader feature was implemented for GFX9.4.3 hardware. This commit adds support for GFX9.4.4 hardware by allowing the cleaner shader to be used with this hardware version. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:23 -04:00
Srinivasan Shanmugam	335288315a	drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3 This commit adds the cleaner shader microcode for GFX9.4.3 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_9_4_3_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps improve GPU performance and resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. v2: fix copyright date (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:16 -04:00
Srinivasan Shanmugam	d4c3815495	drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware The patch modifies the gfx_v9_4_3_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v9_4_3_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v9_4_3 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v9_4_3_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:11 -04:00
Srinivasan Shanmugam	c2e70d307f	drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware The patch modifies the gfx_v9_0_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v9_0_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v9_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v9_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:07 -04:00
Srinivasan Shanmugam	22ff907d4f	drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution This commit adds the PACKET3_RUN_CLEANER_SHADER definition. This packet is a command packet used to instruct the GPU to execute the cleaner shader. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. The PACKET3_RUN_CLEANER_SHADER packet is used to trigger the execution of the cleaner shader on the GPU. The packet consists of a header followed by a RESERVED field, which is programmed to zero. When the GPU receives this packet, it fetches and executes the cleaner shader instructions from the location specified in the packet. The cleaner shader feature helps to enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:02 -04:00
Srinivasan Shanmugam	d361ad5d2f	drm/amdgpu: Add sysfs interface for running cleaner shader This patch adds a new sysfs interface for running the cleaner shader on AMD GPUs. The cleaner shader is used to clear GPU memory before it's reused, which can help prevent data leakage between different processes. The new sysfs file is write-only and is named `run_cleaner_shader`. Write the number of the partition to this file to trigger the cleaner shader on that partition. There is only one partition on GPUs which do not support partitioning. Changes made in this patch: - Added `amdgpu_set_run_cleaner_shader` function to handle writes to the `run_cleaner_shader` sysfs file. - Added `run_cleaner_shader` to the list of device attributes in `amdgpu_device_attrs`. - Updated `default_attr_update` to handle `run_cleaner_shader`. - Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device attributes. v2: fix error handling (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	2024-08-20 22:06:56 -04:00
Srinivasan Shanmugam	e189be9b2e	drm/amdgpu: Add enforce_isolation sysfs attribute This commit adds a new sysfs attribute 'enforce_isolation' to control the 'enforce_isolation' setting per GPU. The attribute can be read and written, and accepts values 0 (disabled) and 1 (enabled). When 'enforce_isolation' is enabled, reserved VMIDs are allocated for each ring. When it's disabled, the reserved VMIDs are freed. The set function locks a mutex before changing the 'enforce_isolation' flag and the VMIDs, and unlocks it afterwards. This ensures that these operations are atomic and prevents race conditions and other concurrency issues. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:06:52 -04:00
Srinivasan Shanmugam	dba1a6cfc3	drm/amdgpu: Enforce isolation as part of the job This patch adds a new parameter 'enforce_isolation' to the amdgpu_job structure. This parameter is used to determine whether shader isolation should be enforced for a job. The enforce_isolation parameter is then stored in the amdgpu_job structure and used when flushing the VM. The enforce_isolation field of the amdgpu_job structure is set directly after the job is allocated This change allows more fine-grained control over shader isolation, making it possible to enforce isolation on a per-job basis rather than globally. This can be useful in scenarios where only certain jobs require isolation. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-20 22:06:43 -04:00
Victor Skvortsov	19cff16559	drm/amdgpu: abort KIQ waits when there is a pending reset Stop waiting for the KIQ to return back when there is a reset pending. It's quite likely that the KIQ will never response. Signed-off-by: Koenig Christian <Christian.Koenig@amd.com> Suggested-by: Lazar Lijo <Lijo.Lazar@amd.com> Tested-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:27:50 -04:00
Srinivasan Shanmugam	9659520419	drm/amdgpu: Make enforce_isolation setting per GPU This commit makes enforce_isolation setting to be per GPU and per partition by adding the enforce_isolation array to the adev structure. The adev variable is set based on the global enforce_isolation module parameter during device initialization. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU is used to determine whether to enforce isolation between graphics and compute processes on that GPU. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU and partition is used to determine whether to enforce isolation between graphics and compute processes on that GPU and partition. This allows the enforce_isolation setting to be controlled individually for each GPU and each partition, which is useful in a system with multiple GPUs and partitions where different isolation settings might be desired for different GPUs and partitions. v2: fix loop in amdgpu_vmid_mgr_init() (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-16 14:27:45 -04:00
Alex Deucher	ee7a846ea2	drm/amdgpu: Emit cleaner shader at end of IB submission This commit introduces the emission of a cleaner shader at the end of the IB submission process. This is achieved by adding a new function pointer, `emit_cleaner_shader`, to the `amdgpu_ring_funcs` structure. If the `emit_cleaner_shader` function is set in the ring functions, it is called during the VM flush process. The cleaner shader is only emitted if the `enable_cleaner_shader` flag is set in the `amdgpu_device` structure. This allows the cleaner shader emission to be controlled on a per-device basis. By emitting a cleaner shader at the end of the IB submission, we can ensure that the VM state is properly cleaned up after each submission. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-16 14:27:42 -04:00
Srinivasan Shanmugam	aec773a1fb	drm/amdgpu: Add infrastructure for Cleaner Shader feature The cleaner shader is used by the CP firmware to clean LDS and GPRs between processes on the CUs. This adds an internal API for GFX IP code to allocate and initialize the cleaner shader. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-16 14:27:34 -04:00
Alex Deucher	f49280ffd2	drm/amdgpu: handle enforce isolation on non-0 gfxhub Some chips have more than one gfxhub so check if we are a gfxhub rather than just gfxhub 0. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:27:28 -04:00
Alex Deucher	2dc3851ef7	drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1 The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: `a03ebf1163` ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:27:04 -04:00
Sunil Khatri	1a2103d685	drm/amdgpu: add vcn ip dump support for vcn_v2_6 Add support for logging the registers in devcoredump buffer for vcn_v2_6. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:57 -04:00
Sunil Khatri	bc62abe1b9	drm/amdgpu: add print support for vcn_v2_5 ip dump Add support for logging the registers in devcoredump buffer for vcn_v2_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:52 -04:00
Sunil Khatri	0eea81ee2e	drm/amdgpu: add vcn_v2_5 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v2_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:47 -04:00
Sunil Khatri	b910cacb4e	drm/amdgpu: add print support for vcn_v2_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v2_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:41 -04:00
Sunil Khatri	2239aaa204	drm/amdgpu: add vcn_v2_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v2_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:36 -04:00
Sunil Khatri	ef9f3b5fd9	drm/amdgpu: add print support for vcn_v1_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v1_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:31 -04:00
Sunil Khatri	837cc7f1bf	drm/amdgpu: add vcn_v1_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v1_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:25 -04:00
Sunil Khatri	439c3b124e	drm/amdgpu: add print support for vcn_v4_0_5 ip dump Add support for logging the registers in devcoredump buffer for vcn_v4_0_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:19 -04:00
Sunil Khatri	3a50a51d04	drm/amdgpu: add print support for vcn_v4_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v4_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:12 -04:00
Sunil Khatri	dc57edda81	drm/amdgpu: add print support for vcn_v4_0_3 ip dump Add support for logging the registers in devcoredump buffer for vcn_v4_0_3. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:06 -04:00
Sunil Khatri	46553db49c	drm/amdgpu: add vcn_v4_0_5 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v4_0_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:59 -04:00
Sunil Khatri	9d87dac3f9	drm/amdgpu: add vcn_v4_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v4_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:53 -04:00
Sunil Khatri	8962915044	drm/amdgpu: add vcn_v4_0_3 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v4_0_3. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:30 -04:00
Sunil Khatri	f3c958ab85	drm/amdgpu: add print support for vcn_v5_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v5_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:12 -04:00
Alex Deucher	32aada4d0a	drm/amdgpu/mes12: add API for user queue reset Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:09 -04:00
Alex Deucher	d4f1fde734	drm/amdgpu/mes11: add API for user queue reset Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:06 -04:00
Alex Deucher	5b7a59de48	drm/amdgpu/mes: add API for user queue reset Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:02 -04:00
Alex Deucher	478efcb90b	drm/amdgpu/gfx11: export gfx_v11_0_request_gfx_index_mutex() It will be used by the queue reset code. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:56 -04:00

... 2 3 4 5 6 ...

31433 Commits