linux-yocto/Documentation
Shiju Jose be9b359e05 cxl/edac: Add CXL memory device soft PPR control feature
Post Package Repair (PPR) maintenance operations may be supported by CXL
devices that implement CXL.mem protocol. A PPR maintenance operation
requests the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support PPR features
may implement PPR Maintenance operations. DRAM components may support two
types of PPR, hard PPR (hPPR), for a permanent row repair, and Soft PPR
(sPPR), for a temporary row repair. Soft PPR is much faster than hPPR,
but the repair is lost with a power cycle.

During the execution of a PPR Maintenance operation, a CXL memory device:
- May or may not retain data
- May or may not be able to process CXL.mem requests correctly, including
the ones that target the DPA involved in the repair.
These CXL Memory Device capabilities are specified by Restriction Flags
in the sPPR Feature and hPPR Feature.

Soft PPR maintenance operation may be executed at runtime, if data is
retained and CXL.mem requests are correctly processed. For CXL devices with
DRAM components, hPPR maintenance operation may be executed only at boot
because typically data may not be retained with hPPR maintenance operation.

When a CXL device identifies error on a memory component, the device
may inform the host about the need for a PPR maintenance operation by using
an Event Record, where the Maintenance Needed flag is set. The Event Record
specifies the DPA that should be repaired. A CXL device may not keep track
of the requests that have already been sent and the information on which
DPA should be repaired may be lost upon power cycle.
The userspace tool requests for maintenance operation if the number of
corrected error reported on a CXL.mem media exceeds error threshold.

CXL spec 3.2 section 8.2.10.7.1.2 describes the device's sPPR (soft PPR)
maintenance operation and section 8.2.10.7.1.3 describes the device's
hPPR (hard PPR) maintenance operation feature.

CXL spec 3.2 section 8.2.10.7.2.1 describes the sPPR feature discovery and
configuration.

CXL spec 3.2 section 8.2.10.7.2.2 describes the hPPR feature discovery and
configuration.

Add support for controlling CXL memory device soft PPR (sPPR) feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for PRR to the userspace. For example CXL PPR control for the
CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/

Add checks to ensure the memory to be repaired is offline and originates
from a CXL DRAM or CXL gen_media error record reported in the current boot,
before requesting a PPR operation on the device.

Note: Tested with QEMU patch for CXL PPR feature.
https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m70b2b010f43f7f4a6f9acee5ec9008498bf292c3

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250521124749.817-9-shiju.jose@huawei.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-05-23 13:25:06 -07:00
..
ABI docs: ABI: replace mcroce@microsoft.com with new Meta address 2025-04-17 20:10:07 -07:00
accel
accounting
admin-guide xfs: remove duplicate Zoned Filesystems sections in admin-guide 2025-04-22 16:05:24 +02:00
arch OpenRISC updates for 6.15 2025-04-26 09:01:13 -07:00
block Documentation: ublk: remove dead footnote 2025-03-31 07:06:22 -06:00
bpf bpf: Add namespace to BPF internal symbols 2025-04-25 09:21:23 -07:00
cdrom
core-api - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
cpu-freq
crypto crypto: remove obsolete 'comp' compression API 2025-03-21 17:39:06 +08:00
dev-tools Kbuild updates for v6.15 2025-04-05 15:46:50 -07:00
devicetree Char/Misc driver fixes for 6.15-rc4 2025-04-25 10:30:40 -07:00
doc-guide
driver-api cxl for v6.15 2025-04-02 20:04:43 -07:00
edac cxl/edac: Add CXL memory device soft PPR control feature 2025-05-23 13:25:06 -07:00
fault-injection
fb
features mseal sysmap: add arch-support txt 2025-04-01 15:17:17 -07:00
filesystems A few more miscellaneous ext4 bug fixes and cleanups including some 2025-04-13 07:15:50 -07:00
firmware_class
firmware-guide
fpga
gpu Core Changes: 2025-03-14 17:02:11 +10:00
hid
hwmon hwmon: add driver for HTU31 2025-03-18 08:03:40 -07:00
i2c
iio Char/Misc/IIO driver updates for 6.15-rc1 2025-04-01 11:26:08 -07:00
images
infiniband docs: infiniband: document the UCAP API 2025-03-09 13:13:02 -04:00
input Documentation: input: Add section pertaining to polled input devices 2025-02-21 13:29:53 -07:00
isdn
kbuild kbuild: make all file references relative to source root 2025-03-22 23:50:58 +09:00
kernel-hacking
leds
litmus-tests
livepatch docs: livepatch: move text out of code block 2025-03-04 16:01:29 +01:00
locking hwspinlock: Remove unused hwspin_lock_get_id() 2025-03-21 17:12:04 -05:00
maintainer
mhi
misc-devices
mm - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
netlabel
netlink netlink: specs: rt-neigh: prefix struct nfmsg members with ndm 2025-04-16 18:09:42 -07:00
networking net: hold instance lock during NETDEV_CHANGE 2025-04-07 11:13:39 -07:00
nvdimm
nvme Documentation: typo fixes 2025-02-18 14:01:22 -07:00
PCI PCI: endpoint: Remove unused devm_pci_epc_destroy() 2025-03-08 14:47:31 +00:00
pcmcia
peci
power Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend() 2025-04-15 19:23:58 +02:00
process Devicetree for v6.15: 2025-03-29 11:23:16 -07:00
RCU - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
rust ARM and clkdev updates for 6.15-rc1 2025-04-03 12:21:44 -07:00
scheduler Scheduler updates for v6.15: 2025-03-24 21:28:12 -07:00
scsi
security Hi, 2025-03-28 12:42:53 -07:00
sound sound updates for 6.15-rc1 2025-03-26 09:41:55 -07:00
sphinx docs: automarkup: drop legacy support 2025-02-18 13:42:46 -07:00
sphinx-static
spi
staging
sunrpc/xdr
target
tee
timers
tools Documentation/rv: Add sched pages to the indices 2025-03-27 12:02:38 -04:00
trace tracing/timers: Rename the hrtimer_init event to hrtimer_setup 2025-04-05 10:30:17 +02:00
translations OpenRISC updates for 6.15 2025-04-26 09:01:13 -07:00
usb USB/Thunderbolt update for 6.15-rc1 2025-04-02 18:23:31 -07:00
userspace-api mseal: fix typo and style in documentation 2025-04-11 17:32:35 -07:00
virt Documentation: kvm: remove KVM_CAP_MIPS_TE 2025-04-04 06:32:17 -04:00
w1
watchdog
wmi platform/x86: msi-wmi-platform: Workaround a ACPI firmware bug 2025-04-16 11:15:22 +03:00
.gitignore
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py
docutils.conf
index.rst
Kconfig
Makefile
memory-barriers.txt
SubmittingPatches
subsystem-apis.rst Documentation/EDAC: Fix warning document isn't included in any toctree 2025-04-01 22:26:47 +02:00