linux-yocto/arch/loongarch
Kanglong Wang e94cdb9fb2 LoongArch: Optimize module load time by optimizing PLT/GOT counting
[ Upstream commit 63dbd8fb2af3a89466538599a9acb2d11ef65c06 ]

When enabling CONFIG_KASAN, CONFIG_PREEMPT_VOLUNTARY_BUILD and
CONFIG_PREEMPT_VOLUNTARY at the same time, there will be soft deadlock,
the relevant logs are as follows:

rcu: INFO: rcu_sched self-detected stall on CPU
...
Call Trace:
[<900000000024f9e4>] show_stack+0x5c/0x180
[<90000000002482f4>] dump_stack_lvl+0x94/0xbc
[<9000000000224544>] rcu_dump_cpu_stacks+0x1fc/0x280
[<900000000037ac80>] rcu_sched_clock_irq+0x720/0xf88
[<9000000000396c34>] update_process_times+0xb4/0x150
[<90000000003b2474>] tick_nohz_handler+0xf4/0x250
[<9000000000397e28>] __hrtimer_run_queues+0x1d0/0x428
[<9000000000399b2c>] hrtimer_interrupt+0x214/0x538
[<9000000000253634>] constant_timer_interrupt+0x64/0x80
[<9000000000349938>] __handle_irq_event_percpu+0x78/0x1a0
[<9000000000349a78>] handle_irq_event_percpu+0x18/0x88
[<9000000000354c00>] handle_percpu_irq+0x90/0xf0
[<9000000000348c74>] handle_irq_desc+0x94/0xb8
[<9000000001012b28>] handle_cpu_irq+0x68/0xa0
[<9000000001def8c0>] handle_loongarch_irq+0x30/0x48
[<9000000001def958>] do_vint+0x80/0xd0
[<9000000000268a0c>] kasan_mem_to_shadow.part.0+0x2c/0x2a0
[<90000000006344f4>] __asan_load8+0x4c/0x120
[<900000000025c0d0>] module_frob_arch_sections+0x5c8/0x6b8
[<90000000003895f0>] load_module+0x9e0/0x2958
[<900000000038b770>] __do_sys_init_module+0x208/0x2d0
[<9000000001df0c34>] do_syscall+0x94/0x190
[<900000000024d6fc>] handle_syscall+0xbc/0x158

After analysis, this is because the slow speed of loading the amdgpu
module leads to the long time occupation of the cpu and then the soft
deadlock.

When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs/GOTs that will be needed to handle all the RELAs. It
will call the count_max_entries() to find in an out-of-order date which
counting algorithm has O(n^2) complexity.

To make it faster, we sort the relocation list by info and addend. That
way, to check for a duplicate relocation, it just needs to compare with
the previous entry. This reduces the complexity of the algorithm to O(n
 log n), as done in commit d4e0340919 ("arm64/module: Optimize module
load time by optimizing PLT counting"). This gives sinificant reduction
in module load time for modules with large number of relocations.

After applying this patch, the soft deadlock problem has been solved,
and the kernel starts normally without "Call Trace".

Using the default configuration to test some modules, the results are as
follows:

Module              Size
ip_tables           36K
fat                 143K
radeon              2.5MB
amdgpu              16MB

Without this patch:
Module              Module load time (ms)	Count(PLTs/GOTs)
ip_tables           18				59/6
fat                 0				162/14
radeon              54				1221/84
amdgpu              1411			4525/1098

With this patch:
Module              Module load time (ms)	Count(PLTs/GOTs)
ip_tables           18				59/6
fat                 0				162/14
radeon              22				1221/84
amdgpu              45				4525/1098

Fixes: fcdfe9d22b ("LoongArch: Add ELF and module support")
Signed-off-by: Kanglong Wang <wangkanglong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28 16:31:14 +02:00
..
boot LoongArch: Fix GMAC's phy-mode definitions in dts 2024-06-03 15:45:53 +08:00
configs mm: z3fold: deprecate CONFIG_Z3FOLD 2024-09-17 01:07:00 -07:00
crypto move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
include LoongArch: Avoid using $r0/$r1 as "mask" for csrxchg 2025-06-27 11:11:37 +01:00
kernel LoongArch: Optimize module load time by optimizing PLT/GOT counting 2025-08-28 16:31:14 +02:00
kvm LoongArch: KVM: Make function kvm_own_lbt() robust 2025-08-28 16:31:02 +02:00
lib LoongArch: csum: Fix OoB access in IP checksum code for negative lengths 2025-02-21 14:01:17 +01:00
mm LoongArch: Fix panic caused by NULL-PMD in huge_pte_offset() 2025-06-27 11:11:37 +01:00
net LoongArch: BPF: Fix jump offset calculation in tailcall 2025-08-20 18:30:14 +02:00
pci LoongArch: Fix memleak in pci_acpi_scan_root() 2024-09-24 15:32:20 +08:00
power LoongArch: Save and restore CSR.CNTC for hibernation 2025-05-22 14:29:44 +02:00
vdso LoongArch: Fix build failure with GCC 15 (-std=gnu23) 2024-12-05 14:02:29 +01:00
Kbuild
Kconfig LoongArch: Select ARCH_USE_MEMTEST 2025-05-02 07:59:04 +02:00
Kconfig.debug LoongArch: Only allow OBJTOOL & ORC unwinder if toolchain supports -mthin-add-sub 2024-06-21 10:18:40 +08:00
Makefile LoongArch: Explicitly specify code model in Makefile 2024-12-05 14:02:45 +01:00