Go to file
Shiju Jose be9b359e05 cxl/edac: Add CXL memory device soft PPR control feature
Post Package Repair (PPR) maintenance operations may be supported by CXL
devices that implement CXL.mem protocol. A PPR maintenance operation
requests the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support PPR features
may implement PPR Maintenance operations. DRAM components may support two
types of PPR, hard PPR (hPPR), for a permanent row repair, and Soft PPR
(sPPR), for a temporary row repair. Soft PPR is much faster than hPPR,
but the repair is lost with a power cycle.

During the execution of a PPR Maintenance operation, a CXL memory device:
- May or may not retain data
- May or may not be able to process CXL.mem requests correctly, including
the ones that target the DPA involved in the repair.
These CXL Memory Device capabilities are specified by Restriction Flags
in the sPPR Feature and hPPR Feature.

Soft PPR maintenance operation may be executed at runtime, if data is
retained and CXL.mem requests are correctly processed. For CXL devices with
DRAM components, hPPR maintenance operation may be executed only at boot
because typically data may not be retained with hPPR maintenance operation.

When a CXL device identifies error on a memory component, the device
may inform the host about the need for a PPR maintenance operation by using
an Event Record, where the Maintenance Needed flag is set. The Event Record
specifies the DPA that should be repaired. A CXL device may not keep track
of the requests that have already been sent and the information on which
DPA should be repaired may be lost upon power cycle.
The userspace tool requests for maintenance operation if the number of
corrected error reported on a CXL.mem media exceeds error threshold.

CXL spec 3.2 section 8.2.10.7.1.2 describes the device's sPPR (soft PPR)
maintenance operation and section 8.2.10.7.1.3 describes the device's
hPPR (hard PPR) maintenance operation feature.

CXL spec 3.2 section 8.2.10.7.2.1 describes the sPPR feature discovery and
configuration.

CXL spec 3.2 section 8.2.10.7.2.2 describes the hPPR feature discovery and
configuration.

Add support for controlling CXL memory device soft PPR (sPPR) feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for PRR to the userspace. For example CXL PPR control for the
CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/

Add checks to ensure the memory to be repaired is offline and originates
from a CXL DRAM or CXL gen_media error record reported in the current boot,
before requesting a PPR operation on the device.

Note: Tested with QEMU patch for CXL PPR feature.
https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m70b2b010f43f7f4a6f9acee5ec9008498bf292c3

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250521124749.817-9-shiju.jose@huawei.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-05-23 13:25:06 -07:00
arch Misc fixes: 2025-04-26 09:45:54 -07:00
block block-6.15-20250424 2025-04-25 11:34:39 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto crypto: scomp - Fix off-by-one bug when calculating last page 2025-04-23 09:32:57 +08:00
Documentation cxl/edac: Add CXL memory device soft PPR control feature 2025-05-23 13:25:06 -07:00
drivers cxl/edac: Add CXL memory device soft PPR control feature 2025-05-23 13:25:06 -07:00
fs vfs-6.15-rc4.fixes 2025-04-25 15:57:21 -07:00
include cxl/edac: Add CXL memory device memory sparing control feature 2025-05-23 13:24:53 -07:00
init Kconfig: switch CONFIG_SYSFS_SYCALL default to n 2025-04-15 10:28:35 +02:00
io_uring io_uring: fix 'sync' handling of io_fallback_tw() 2025-04-24 10:32:43 -06:00
ipc treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
kernel Fix sporadic crashes in dequeue_entities() due to ... bad math. 2025-04-26 09:23:20 -07:00
lib hardening fixes for v6.15-rc3 2025-04-18 13:20:20 -07:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm mm/migrate: fix sleep in atomic for large folios and buffer heads 2025-04-22 18:16:08 +02:00
net nfsd-6.15 fixes: 2025-04-26 10:43:03 -07:00
rust Driver core fixes for 6.15-rc4 2025-04-25 10:02:59 -07:00
samples samples/bpf: Fix compilation failure for samples/bpf on LoongArch Fedora 2025-04-25 09:32:02 -07:00
scripts Fix mis-uses of 'cc-option' for warning disablement 2025-04-23 10:08:29 -07:00
security Landlock fix for v6.15-rc4 2025-04-24 12:59:05 -07:00
sound ASoC: Fixes for v6.15 2025-04-11 15:51:19 +02:00
tools cxl/edac: Add CXL memory device patrol scrub control feature 2025-05-23 13:24:09 -07:00
usr kbuild: hdrcheck: fix cross build with clang 2025-03-05 04:06:45 +09:00
virt ARM: 2025-04-08 13:47:55 -07:00
.clang-format clang-format: Update the ForEachMacros list for v6.15-rc1 2025-04-13 11:03:59 +02:00
.clippy.toml rust: give Clippy the minimum supported Rust version 2025-01-10 00:17:25 +01:00
.cocciconfig
.editorconfig
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes
.gitignore kbuild: Create intermediate vmlinux build with relocations preserved 2025-03-17 00:29:50 +09:00
.mailmap sound fixes for 6.15-rc3 2025-04-17 10:14:51 -07:00
.rustfmt.toml
COPYING
CREDITS MAINTAINERS: update SLAB ALLOCATOR maintainers 2025-04-17 20:10:06 -07:00
Kbuild drm: ensure drm headers are self-contained and pass kernel-doc 2025-02-12 10:44:43 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS pci-v6.15-fixes-3 2025-04-26 13:02:36 -07:00
Makefile Linux 6.15-rc4 2025-04-27 15:19:23 -07:00
README

Linux kernel

There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first.

In order to build the documentation, use make htmldocs or make pdfdocs. The formatted documentation can also be read online at:

https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory, several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.