linux-yocto/arch
Babu Moger c42f651a76 x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID
[ Upstream commit 15292f1b4c ]

Users can create as many monitoring groups as the number of RMIDs supported
by the hardware. However, on AMD systems, only a limited number of RMIDs
are guaranteed to be actively tracked by the hardware. RMIDs that exceed
this limit are placed in an "Unavailable" state.

When a bandwidth counter is read for such an RMID, the hardware sets
MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked
again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable
remains set on first read after tracking re-starts and is clear on all
subsequent reads as long as the RMID is tracked.

resctrl miscounts the bandwidth events after an RMID transitions from the
"Unavailable" state back to being tracked. This happens because when the
hardware starts counting again after resetting the counter to zero, resctrl
in turn compares the new count against the counter value stored from the
previous time the RMID was tracked.

This results in resctrl computing an event value that is either undercounting
(when new counter is more than stored counter) or a mistaken overflow (when
new counter is less than stored counter).

Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to
zero whenever the RMID is in the "Unavailable" state to ensure accurate
counting after the RMID resets to zero when it starts to be tracked again.

Example scenario that results in mistaken overflow
==================================================
1. The resctrl filesystem is mounted, and a task is assigned to a
   monitoring group.

   $mount -t resctrl resctrl /sys/fs/resctrl
   $mkdir /sys/fs/resctrl/mon_groups/test1/
   $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks

   $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
   21323            <- Total bytes on domain 0
   "Unavailable"    <- Total bytes on domain 1

   Task is running on domain 0. Counter on domain 1 is "Unavailable".

2. The task runs on domain 0 for a while and then moves to domain 1. The
   counter starts incrementing on domain 1.

   $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
   7345357          <- Total bytes on domain 0
   4545             <- Total bytes on domain 1

3. At some point, the RMID in domain 0 transitions to the "Unavailable"
   state because the task is no longer executing in that domain.

   $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
   "Unavailable"    <- Total bytes on domain 0
   434341           <- Total bytes on domain 1

4.  Since the task continues to migrate between domains, it may eventually
    return to domain 0.

    $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
    17592178699059  <- Overflow on domain 0
    3232332         <- Total bytes on domain 1

In this case, the RMID on domain 0 transitions from "Unavailable" state to
active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when
the counter is read and begins tracking the RMID counting from 0.

Subsequent reads succeed but return a value smaller than the previously
saved MSR value (7345357). Consequently, the resctrl's overflow logic is
triggered, it compares the previous value (7345357) with the new, smaller
value and incorrectly interprets this as a counter overflow, adding a large
delta.

In reality, this is a false positive: the counter did not overflow but was
simply reset when the RMID transitioned from "Unavailable" back to active
state.

Here is the text from APM [1] available from [2].

"In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the
first QM_CTR read when it begins tracking an RMID that it was not
previously tracking. The U bit will be zero for all subsequent reads from
that RMID while it is still tracked by the hardware. Therefore, a QM_CTR
read with the U bit set when that RMID is in use by a processor can be
considered 0 when calculating the difference with a subsequent read."

[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
    Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory
    Bandwidth (MBM).

  [ bp: Split commit message into smaller paragraph chunks for better
    consumption. ]

Fixes: 4d05bf71f1 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org # needs adjustments for <= v6.17
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-23 16:24:39 +02:00
..
alpha arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
arc arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
arm ARM: OMAP2+: pm33xx-core: ix device node reference leaks in amx3_idle_init 2025-10-19 16:37:18 +02:00
arm64 arm64: errata: Apply workarounds for Neoverse-V3AE 2025-10-23 16:24:38 +02:00
csky arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
hexagon arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
loongarch LoongArch: Init acpi_gbl_use_global_lock to false 2025-10-19 16:37:08 +02:00
m68k arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
microblaze arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
mips arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
nios2 arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
openrisc arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
parisc parisc: Remove spurious if statement from raw_copy_from_user() 2025-10-19 16:37:29 +02:00
powerpc powerpc/fadump: skip parameter area allocation when fadump is disabled 2025-10-23 16:24:32 +02:00
riscv riscv: kprobes: Fix probe address validation 2025-10-23 16:24:32 +02:00
s390 s390: Add -Wno-pointer-sign to KBUILD_CFLAGS_DECOMPRESSOR 2025-10-19 16:37:37 +02:00
sh arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
sparc sparc: fix error handling in scan_one_device() 2025-10-19 16:37:31 +02:00
um arch: copy_thread: pass clone_flags as u64 2025-10-15 12:03:13 +02:00
x86 x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID 2025-10-23 16:24:39 +02:00
xtensa xtensa: simdisk: add input size check in proc_write_simdisk 2025-10-19 16:37:31 +02:00
.gitignore
Kconfig rust: cfi: only 64-bit arm and x86 support CFI_CLANG 2025-10-23 16:24:21 +02:00