
This is a simpler, gentler variant of memory_failure() for soft page offlining controlled from user space. It doesn't kill anything, just tries to invalidate and if that doesn't work migrate the page away. This is useful for predictive failure analysis, where a page has a high rate of corrected errors, but hasn't gone bad yet. Instead it can be offlined early and avoided. The offlining is controlled from sysfs, including a new generic entry point for hard page offlining for symmetry too. We use the page isolate facility to prevent re-allocation race. Normally this is only used by memory hotplug. To avoid races with memory allocation I am using lock_system_sleep(). This avoids the situation where memory hotplug is about to isolate a page range and then hwpoison undoes that work. This is a big hammer currently, but the simplest solution currently. When the page is not free or LRU we try to free pages from slab and other caches. The slab freeing is currently quite dumb and does not try to focus on the specific slab cache which might own the page. This could be potentially improved later. Thanks to Fengguang Wu and Haicheng Li for some fixes. [Added fix from Andrew Morton to adapt to new migrate_pages prototype] Signed-off-by: Andi Kleen <ak@linux.intel.com>
1.6 KiB
What: /sys/devices/system/memory/soft_offline_page Date: Sep 2009 KernelVersion: 2.6.33 Contact: andi@firstfloor.org Description: Soft-offline the memory page containing the physical address written into this file. Input is a hex number specifying the physical address of the page. The kernel will then attempt to soft-offline it, by moving the contents elsewhere or dropping it if possible. The kernel will then be placed on the bad page list and never be reused.
The offlining is done in kernel specific granuality.
Normally it's the base page size of the kernel, but
this might change.
The page must be still accessible, not poisoned. The
kernel will never kill anything for this, but rather
fail the offline. Return value is the size of the
number, or a error when the offlining failed. Reading
the file is not allowed.
What: /sys/devices/system/memory/hard_offline_page Date: Sep 2009 KernelVersion: 2.6.33 Contact: andi@firstfloor.org Description: Hard-offline the memory page containing the physical address written into this file. Input is a hex number specifying the physical address of the page. The kernel will then attempt to hard-offline the page, by trying to drop the page or killing any owner or triggering IO errors if needed. Note this may kill any processes owning the page. The kernel will avoid to access this page assuming it's poisoned by the hardware.
The offlining is done in kernel specific granuality.
Normally it's the base page size of the kernel, but
this might change.
Return value is the size of the number, or a error when
the offlining failed.
Reading the file is not allowed.