linux-yocto/Documentation
Joshua Hahn e341f9c3c8 mm/mempolicy: Weighted Interleave Auto-tuning
On machines with multiple memory nodes, interleaving page allocations
across nodes allows for better utilization of each node's bandwidth. 
Previous work by Gregory Price [1] introduced weighted interleave, which
allowed for pages to be allocated across nodes according to user-set
ratios.

Ideally, these weights should be proportional to their bandwidth, so that
under bandwidth pressure, each node uses its maximal efficient bandwidth
and prevents latency from increasing exponentially.

Previously, weighted interleave's default weights were just 1s -- which
would be equivalent to the (unweighted) interleave mempolicy, which goes
through the nodes in a round-robin fashion, ignoring bandwidth
information.

This patch has two main goals: First, it makes weighted interleave easier
to use for users who wish to relieve bandwidth pressure when using nodes
with varying bandwidth (CXL).  By providing a set of "real" default
weights that just work out of the box, users who might not have the
capability (or wish to) perform experimentation to find the most optimal
weights for their system can still take advantage of bandwidth-informed
weighted interleave.

Second, it allows for weighted interleave to dynamically adjust to
hotplugged memory with new bandwidth information.  Instead of manually
updating node weights every time new bandwidth information is reported or
taken off, weighted interleave adjusts and provides a new set of default
weights for weighted interleave to use when there is a change in bandwidth
information.

To meet these goals, this patch introduces an auto-configuration mode for
the interleave weights that provides a reasonable set of default weights,
calculated using bandwidth data reported by the system.  In auto mode,
weights are dynamically adjusted based on whatever the current bandwidth
information reports (and responds to hotplug events).

This patch still supports users manually writing weights into the nodeN
sysfs interface by entering into manual mode.  When a user enters manual
mode, the system stops dynamically updating any of the node weights, even
during hotplug events that shift the optimal weight distribution.

A new sysfs interface "auto" is introduced, which allows users to switch
between the auto (writing 1 or Y) and manual (writing 0 or N) modes.  The
system also automatically enters manual mode when a nodeN interface is
manually written to.

There is one functional change that this patch makes to the existing
weighted_interleave ABI: previously, writing 0 directly to a nodeN
interface was said to reset the weight to the system default.  Before this
patch, the default for all weights were 1, which meant that writing 0 and
1 were functionally equivalent.  With this patch, writing 0 is invalid.

Link: https://lkml.kernel.org/r/20250520141236.2987309-1-joshua.hahnjy@gmail.com
[joshua.hahnjy@gmail.com: wordsmithing changes, simplification, fixes]
  Link: https://lkml.kernel.org/r/20250511025840.2410154-1-joshua.hahnjy@gmail.com
[joshua.hahnjy@gmail.com: remove auto_kobj_attr field from struct sysfs_wi_group]
  Link: https://lkml.kernel.org/r/20250512142511.3959833-1-joshua.hahnjy@gmail.com
https://lore.kernel.org/linux-mm/20240202170238.90004-1-gregory.price@memverge.com/ [1]
Link: https://lkml.kernel.org/r/20250505182328.4148265-1-joshua.hahnjy@gmail.com
Co-developed-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Suggested-by: Yunjeong Mun <yunjeong.mun@sk.com>
Suggested-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: Ying Huang <ying.huang@linux.alibaba.com>
Suggested-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>
Reviewed-by: Honggyu Kim <honggyu.kim@sk.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21 09:55:15 -07:00
..
ABI mm/mempolicy: Weighted Interleave Auto-tuning 2025-05-21 09:55:15 -07:00
accel
accounting
admin-guide Documentation: add documentation for KHO 2025-05-12 23:50:42 -07:00
arch OpenRISC updates for 6.15 2025-04-26 09:01:13 -07:00
block Documentation: ublk: remove dead footnote 2025-03-31 07:06:22 -06:00
bpf bpf: Add namespace to BPF internal symbols 2025-04-25 09:21:23 -07:00
cdrom
core-api Documentation: KHO: add memblock bindings 2025-05-12 23:50:43 -07:00
cpu-freq
crypto crypto: remove obsolete 'comp' compression API 2025-03-21 17:39:06 +08:00
dev-tools Kbuild updates for v6.15 2025-04-05 15:46:50 -07:00
devicetree Input updates for v6.15-rc5 2025-05-11 10:29:29 -07:00
doc-guide
driver-api cxl for v6.15 2025-04-02 20:04:43 -07:00
edac
fault-injection
fb
features mseal sysmap: add arch-support txt 2025-04-01 15:17:17 -07:00
filesystems A few more miscellaneous ext4 bug fixes and cleanups including some 2025-04-13 07:15:50 -07:00
firmware_class
firmware-guide
fpga
gpu Core Changes: 2025-03-14 17:02:11 +10:00
hid
hwmon hwmon: add driver for HTU31 2025-03-18 08:03:40 -07:00
i2c
iio Char/Misc/IIO driver updates for 6.15-rc1 2025-04-01 11:26:08 -07:00
images
infiniband docs: infiniband: document the UCAP API 2025-03-09 13:13:02 -04:00
input
isdn
kbuild kbuild: make all file references relative to source root 2025-03-22 23:50:58 +09:00
kernel-hacking
leds
litmus-tests
livepatch docs: livepatch: move text out of code block 2025-03-04 16:01:29 +01:00
locking hwspinlock: Remove unused hwspin_lock_get_id() 2025-03-21 17:12:04 -05:00
maintainer
mhi
misc-devices
mm docs/mm/damon/design: fix spelling mistake 2025-05-12 23:50:49 -07:00
netlabel
netlink netlink: specs: ethtool: Remove UAPI duplication of phy-upstream enum 2025-04-28 15:49:47 -07:00
networking Update Christoph's Email address and make it consistent 2025-05-12 23:50:31 -07:00
nvdimm
nvme
PCI PCI: endpoint: Remove unused devm_pci_epc_destroy() 2025-03-08 14:47:31 +00:00
pcmcia
peci
power Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend() 2025-04-15 19:23:58 +02:00
process Devicetree for v6.15: 2025-03-29 11:23:16 -07:00
RCU - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
rust ARM and clkdev updates for 6.15-rc1 2025-04-03 12:21:44 -07:00
scheduler Scheduler updates for v6.15: 2025-03-24 21:28:12 -07:00
scsi
security Hi, 2025-03-28 12:42:53 -07:00
sound sound updates for 6.15-rc1 2025-03-26 09:41:55 -07:00
sphinx
sphinx-static
spi
staging
sunrpc/xdr
target
tee
timers
tools Documentation/rv: Add sched pages to the indices 2025-03-27 12:02:38 -04:00
trace tracing/timers: Rename the hrtimer_init event to hrtimer_setup 2025-04-05 10:30:17 +02:00
translations OpenRISC updates for 6.15 2025-04-26 09:01:13 -07:00
usb USB/Thunderbolt update for 6.15-rc1 2025-04-02 18:23:31 -07:00
userspace-api mseal: fix typo and style in documentation 2025-04-11 17:32:35 -07:00
virt Documentation: kvm: remove KVM_CAP_MIPS_TE 2025-04-04 06:32:17 -04:00
w1
watchdog
wmi platform/x86: msi-wmi-platform: Workaround a ACPI firmware bug 2025-04-16 11:15:22 +03:00
.gitignore
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py
docutils.conf
index.rst
Kconfig
Makefile
memory-barriers.txt
SubmittingPatches
subsystem-apis.rst Documentation/EDAC: Fix warning document isn't included in any toctree 2025-04-01 22:26:47 +02:00