This reverts commit 74e1006430.
This causes a regression in the workload selection.
A more extensive fix is being worked on.
For now, revert.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3618
Fixes: 74e1006430 ("drm/amd/pm: correct the workload setting")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Correct the workload setting in order not to mix the setting
with the end user. Update the workload mask accordingly.
v2: changes as below:
1. the end user can not erase the workload from driver except default workload.
2. always shows the real highest priority workoad to the end user.
3. the real workload mask is combined with driver workload mask and end user workload mask.
v3: apply this to the other ASICs as well.
v4: simplify the code
v5: refine the code based on the review comments.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8cc438be5d49b8326b2fcade0bdb7e6a97df9e0b)
Cc: stable@vger.kernel.org # 6.11.x
Why:
If the reg mmMP1_SMN_C2PMSG_90 is being written to during amdgpu driver
load or driver unload, subsequent amdgpu driver load will fail at
smu_hw_init. The default of mmMP1_SMN_C2PMSG_90 register at a clean
environment is 0x1 and if value differs from expected, amdgpu driver
load will fail.
How to fix:
Ignore the initial value in smu response register before the first smu
message is sent,if smc in SMU_FW_INIT state, just proceed further to
send the message. If register holds an unexpected value after smu message
was sent set, smc_state to SMU_FW_HANG state and no further smu messages
will be sent.
v2:
Set SMU_FW_INIT state at the start of smu hw_init/resume.
Check smc_fw_state before sending smu message if in hang state skip
sending message.
Set SMU_FW_HANG only in case unexpected value is detected
Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
SMU firmware has not supported MALL PG.
[How]
Disable MALL PG and make it always on until SMU firmware is ready.
Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove unused callback to set PLPD policy and its implementation from
arcturus, aldebaran and SMUv13.0.6 SOCs.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI
PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client.
Remove that as well.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add support to set XGMI PLPD policy levels through 'pm_policy/xgmi_plpd'
sysfs node.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add support to set/get information about different DPM policies. The
support is only available on SOCs which use swsmu architecture.
A DPM policy type may be defined with different levels. For example, a
policy may be defined to select Pstate preference and then later a
pstate preference may be chosen.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add support for MACO flag checking.
MACO mode only works if BACO is supported.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use a unified and more explicit name get_bamaco_support
to replace is_baco_support and get_asic_baco_capability
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
fix the high voltage issue after unload on smu 13.0.10
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
implement smu send rma reason function for smu v13.0.6
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 5f38ac54e6.
This causes issues with rebooting and the 7800XT.
Cc: Kenneth Feng <kenneth.feng@amd.com>
Cc: stable@vger.kernel.org
Fixes: 5f38ac54e6 ("drm/amd/pm: fix the high voltage and temperature issue")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3062
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fulfill the SMU13.0.0 support for Wifi RFI mitigation feature.
--
v10->v11:
- downgrade the prompt level on message failure(Lijo)
v13:
- Fix the format issue (IIpo Jarvinen)
- Move function smu_v13_0_0_set_wbrf_exclusion_ranges to
smu_v13_0.c as a generic code for later use (IIpo Jarvinen)
Co-developed-by: Evan Quan <quanliangl@hotmail.com>
Signed-off-by: Evan Quan <quanliangl@hotmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To protect PMFW from being overloaded.
Signed-off-by: Evan Quan <quanliangl@hotmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With WBRF feature supported, as a driver responding to the frequencies,
amdgpu driver is able to do shadow pstate switching to mitigate possible
interference(between its (G-)DDR memory clocks and local radio module
frequency bands used by Wifi 6/6e/7).
--
v1->v2:
- update the prompt for feature support(Lijo)
v8->v9:
- update parameter document for smu_wbrf_event_handler(Simon)
v9->v10:
v10->v11:
- correct the logics for wbrf range sorting(Lijo)
v13:
- Fix the format issue (IIpo Jarvinen)
Signed-off-by: Evan Quan <quanliangl@hotmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add those data structures to support Wifi RFI mitigation feature.
Signed-off-by: Evan Quan <quanliangl@hotmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmV2PMQeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhgQIAKSdjtpkv+IeDwph
umLHxDlCTT8nl+TAhEYuxSdMiHt1i7+zyF0snnHhGthBRnBZBf+PDdkQOmqtjLZj
e591VGY7tdNCAVUwmRRaI82JFz9Pl1k8DXL3f+CE1+MfbpsirAecoIAL0rsvg+4f
ICKSAOBZonL7GT7IGrsbxgqGKCLC+2PcgNT4pADBdjtbC1nmVSLK21dtWr6i4xsn
CT9MxAnej1+EOpreyHhNZUVqfQy0/04x4OY0u/EOXU3PgpfGTrAzkyfwzcd/eit0
x59TZ0owDVXfvvdXUUAC0zG1th2ek5LKNtL+C4rKOoAbq7GZoQfjN5M/Ug+yykwE
aqm6CJ8=
=tZ+R
-----END PGP SIGNATURE-----
Backmerge tag 'v6.7-rc5' into drm-next
Linux 6.7-rc5
Alex requested this for some amdkfd work relying on the symbols exports.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Increment the driver if version and add new mems to the mertics table.
Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The smu needs to get the rlc power down message to sync the rlc state
with smu, the rlc state updating message need to be sent at while smu
begin suspend sequence , otherwise SMU will crash while RLC state is not
notified by driver, and rlc state probally changed after that
notification, so it needs to notify rlc state to smu at the end of the
suspend sequence in amdgpu_device_suspend() that can make sure the rlc
state is correctly set to SMU.
[ 101.000590] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 101.000598] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[ 110.838026] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 110.838035] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features.
[ 110.838039] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[ 110.838040] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
[ 110.884394] PM: suspend of devices aborted after 21213.620 msecs
[ 110.884402] PM: start suspend of devices aborted after 21213.882 msecs
[ 110.884405] PM: Some devices failed to suspend, or early wake event detected
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The smu needs to get the rlc power down message to sync the rlc state
with smu, the rlc state updating message need to be sent at while smu
begin suspend sequence , otherwise SMU will crash while RLC state is not
notified by driver, and rlc state probally changed after that
notification, so it needs to notify rlc state to smu at the end of the
suspend sequence in amdgpu_device_suspend() that can make sure the rlc
state is correctly set to SMU.
[ 101.000590] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 101.000598] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[ 110.838026] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 110.838035] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features.
[ 110.838039] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[ 110.838040] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
[ 110.884394] PM: suspend of devices aborted after 21213.620 msecs
[ 110.884402] PM: start suspend of devices aborted after 21213.882 msecs
[ 110.884405] PM: Some devices failed to suspend, or early wake event detected
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add API support to fetch a snapshot of power management metrics from PMFW.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
fix the high voltage and temperature issue after the driver is unloaded on smu 13.0.0,
smu 13.0.7 and smu 13.0.10
v2 - fix the code format and make sure it is used on the unload case only.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Support for getting power1_cap_min value on smu13 and smu11.
For other Asics, we still use 0 as the default value.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Power up/down UMSCH by SMU.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Acked-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add ppt callback to power up/down UMSCH.
v2: squash in updates (Alex)
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add initial swSMU support for smu 14 series ASIC.
v2: squash in build fixes and updates (Li Ma)
fix warnings (Alex)
v3: squash in updates (Alex)
v4: squash in updates (Alex)
v5: squash in avg/current power updates (Alex)
Signed-off-by: Li Ma <li.ma@amd.com>
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The matching values for `pcie_gen_cap` and `pcie_width_cap` when
fetched from powerplay tables are 1 byte, so narrow the arguments
to match to ensure min() and max() comparisons without casts.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replace with set_plpd_mode uniformly for places to use.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add the interface to change xgmi per-link power down policy.
v2: split from sysfs interface code and miscellaneous updates
v3: check against XGMI_PLPD_DEFAULT/XGMI_PLPD_OPTIMIZED and
pass PPSMC param
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add enum pp_xgmi_plpd_mode to describe PLPD policies.
v2: move the enum from amdgpu_smu.h to kgd_pp_interface.h
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- There is a DPM issue where if DC is not present,
FCLK will stay at low level.
We need to send a SMU message to configure the DPM
- Reuse smu_v13_0_notify_display_change() for this purpose
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the following errors reported by checkpatch:
ERROR: open brace '{' following enum go on the same line
ERROR: open brace '{' following struct go on the same line
Signed-off-by: Ran Sun <sunran001@208suo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some GPUs have been overloading average power values and input power
values. To disambiguate these, introduce a new
`AMDGPU_PP_SENSOR_GPU_INPUT_POWER` and the GPUs that share input
power update to use this instead of average power.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2746
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
An intentional delay is added on soft ctf triggered. Then there will
be a double check for the GPU temperature before taking further
action. This can avoid unintended shutdown due to temperature
momentary fluctuation.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
added a sysfs interface for thermal throttling, then userspace
can get/update thermal limit
Signed-off-by: Kun Liu <Kun.Liu2@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the SW CTF limit from existing register
when there's a fan failure detected via SMU interrupt.
Signed-off-by: lyndonli <Lyndon.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
enable mode1 reset and prioritize debug port on smu_v13_0_10
as a more reliable message processing
v2 - move mode1 reset callback to smu_v13_0_0_ppt.c
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add debugfs interface to log GFXOFF statistics:
- Read amdgpu_gfxoff_count to get the total GFXOFF entry count at the
time of query since system power-up
- Write 1 to amdgpu_gfxoff_residency to start logging, and 0 to stop.
Read it to get average GFXOFF residency % multiplied by 100
during the last logging interval.
Both features are designed to be keep the values persistent between
suspends.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
support BAMACO reset on smu_v13_0_7, take BAMACO as a subset of BACO
for the low latency, and it only happens on specific platforms.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
So we can eventaully use them in the common smu code for
accessing the SMU mailboxes without needing a lot of
per asic logic in the common code.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GC v11_0_1 asic needs to issue the EnableGfxImu message after start IMU.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Modify the common smu13 code and add a new smu
13.0.0 ppt file to handle the smu 13.0.0 specific
configuration.
v2: squash in typo fix in profile name
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to relay on this way to get the raw PPTable when
SCPM feature is enabled.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With SCPM enabled, pptable cannot be uploaded to SMU directly.
The transferring has to be via PSP.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This will allow us to use the generic *_get_metrics_data functions for
ASICs that support unique_id
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>