linux-yocto/kernel/sched
kuyo chang 0caba66f00 sched/deadline: Fix dl_server runtime calculation formula
[ Upstream commit fc975cfb36 ]

In our testing with 6.12 based kernel on a big.LITTLE system, we were
seeing instances of RT tasks being blocked from running on the LITTLE
cpus for multiple seconds of time, apparently by the dl_server. This
far exceeds the default configured 50ms per second runtime.

This is due to the fair dl_server runtime calculation being scaled
for frequency & capacity of the cpu.

Consider the following case under a Big.LITTLE architecture:
Assume the runtime is: 50,000,000 ns, and Frequency/capacity
scale-invariance defined as below:
Frequency scale-invariance: 100
Capacity scale-invariance: 50
First by Frequency scale-invariance,
the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
Then by capacity scale-invariance,
it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
So it will scaled to 238,418 ns.

This smaller "accounted runtime" value is what ends up being
subtracted against the fair-server's runtime for the current period.
Thus after 50ms of real time, we've only accounted ~238us against the
fair servers runtime. This 209:1 ratio in this example means that on
the smaller cpu the fair server is allowed to continue running,
blocking RT tasks, for over 10 seconds before it exhausts its supposed
50ms of runtime.  And on other hardware configurations it can be even
worse.

For the fair deadline_server, to prevent realtime tasks from being
unexpectedly delayed, we really do want to use fixed time, and not
scaled time for smaller capacity/frequency cpus. So remove the scaling
from the fair server's accounting to fix this.

Fixes: a110a81c52 ("sched/deadline: Deferrable dl server")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: John Stultz <jstultz@google.com>
Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Acked-by: John Stultz <jstultz@google.com>
Tested-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/r/20250702021440.2594736-1-kuyo.chang@mediatek.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17 18:37:04 +02:00
..
autogroup.c sched_ext: Fix incorrect autogroup migration detection 2025-02-21 14:01:36 +01:00
autogroup.h
build_policy.c sched_ext: Disallow loading BPF scheduler if isolcpus= domain isolation is in effect 2024-07-08 09:30:13 -10:00
build_utility.c
clock.c
completion.c
core_sched.c
core.c sched/core: Fix migrate_swap() vs. hotplug 2025-07-17 18:37:03 +02:00
cpuacct.c
cpudeadline.c
cpudeadline.h
cpufreq_schedutil.c cpufreq/sched: Explicitly synchronize limits_changed flag handling 2025-04-25 10:47:52 +02:00
cpufreq.c
cpupri.c
cpupri.h
cputime.c sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime 2024-07-29 12:22:32 +02:00
deadline.c sched/deadline: Fix dl_server runtime calculation formula 2025-07-17 18:37:04 +02:00
debug.c sched/fair: Add new cfs_rq.h_nr_runnable 2025-07-10 16:04:57 +02:00
ext.c sched_ext: Make scx_group_set_weight() always update tg->scx.weight 2025-07-10 16:05:05 +02:00
ext.h sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group() 2025-06-27 11:11:38 +01:00
fair.c sched/fair: Fixup wake_up_sync() vs DELAYED_DEQUEUE 2025-07-10 16:04:57 +02:00
features.h sched/fair: Untangle NEXT_BUDDY and pick_next_task() 2025-02-08 09:56:53 +01:00
idle.c sched_ext: idle: Refresh idle masks during idle-to-idle transitions 2025-01-17 13:40:49 +01:00
isolation.c
loadavg.c
Makefile
membarrier.c
pelt.c sched/fair: Rename h_nr_running into h_nr_queued 2025-07-10 16:04:56 +02:00
pelt.h sched: Move update_other_load_avgs() to kernel/sched/pelt.c 2024-09-11 20:00:21 -10:00
psi.c sched: psi: fix bogus pressure spikes from aggregation race 2024-10-03 16:03:16 -07:00
rt.c sched: Add put_prev_task(.next) 2024-09-03 15:26:32 +02:00
sched-pelt.h
sched.h sched/fair: Add new cfs_rq.h_nr_runnable 2025-07-10 16:04:57 +02:00
smp.h
stats.c profiling: remove profile=sleep support 2024-08-04 13:36:28 -07:00
stats.h psi: Fix race when task wakes up before psi_sched_switch() adjusts flags 2025-02-08 09:56:55 +01:00
stop_task.c sched: Add put_prev_task(.next) 2024-09-03 15:26:32 +02:00
swait.c
syscalls.c sched: Fix race between yield_to() and try_to_wake_up() 2025-02-08 09:56:54 +01:00
topology.c sched/fair: Fair server interface 2024-07-29 12:22:36 +02:00
wait_bit.c
wait.c