linux-yocto/arch/x86/include/asm/nmi.h
Waiman Long 60750801ca x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
[ Upstream commit fe37c699ae3eed6e02ee55fbf5cb9ceb7fcfd76c ]

Depending on the type of panics, it was found that the
__register_nmi_handler() function can be called in NMI context from
nmi_shootdown_cpus() leading to a lockdep splat:

  WARNING: inconsistent lock state
  inconsistent {INITIAL USE} -> {IN-NMI} usage.

   lock(&nmi_desc[0].lock);
   <Interrupt>
     lock(&nmi_desc[0].lock);

  Call Trace:
    _raw_spin_lock_irqsave
    __register_nmi_handler
    nmi_shootdown_cpus
    kdump_nmi_shootdown_cpus
    native_machine_crash_shutdown
    __crash_kexec

In this particular case, the following panic message was printed before:

  Kernel panic - not syncing: Fatal hardware error!

This message seemed to be given out from __ghes_panic() running in
NMI context.

The __register_nmi_handler() function which takes the nmi_desc lock
with irq disabled shouldn't be called from NMI context as this can
lead to deadlock.

The nmi_shootdown_cpus() function can only be invoked once. After the
first invocation, all other CPUs should be stuck in the newly added
crash_nmi_callback() and cannot respond to a second NMI.

Fix it by adding a new emergency NMI handler to the nmi_desc
structure and provide a new set_emergency_nmi_handler() helper to set
crash_nmi_callback() in any context. The new emergency handler will
preempt other handlers in the linked list. That will eliminate the need
to take any lock and serve the panic in NMI use case.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250206191844.131700-1-longman@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04 14:40:11 +02:00

69 lines
1.5 KiB
C

/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_NMI_H
#define _ASM_X86_NMI_H
#include <linux/irq_work.h>
#include <linux/pm.h>
#include <asm/irq.h>
#include <asm/io.h>
#ifdef CONFIG_X86_LOCAL_APIC
extern int reserve_perfctr_nmi(unsigned int);
extern void release_perfctr_nmi(unsigned int);
extern int reserve_evntsel_nmi(unsigned int);
extern void release_evntsel_nmi(unsigned int);
struct ctl_table;
extern int proc_nmi_enabled(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
extern int unknown_nmi_panic;
#endif /* CONFIG_X86_LOCAL_APIC */
#define NMI_FLAG_FIRST 1
enum {
NMI_LOCAL=0,
NMI_UNKNOWN,
NMI_SERR,
NMI_IO_CHECK,
NMI_MAX
};
#define NMI_DONE 0
#define NMI_HANDLED 1
typedef int (*nmi_handler_t)(unsigned int, struct pt_regs *);
struct nmiaction {
struct list_head list;
nmi_handler_t handler;
u64 max_duration;
unsigned long flags;
const char *name;
};
#define register_nmi_handler(t, fn, fg, n, init...) \
({ \
static struct nmiaction init fn##_na = { \
.list = LIST_HEAD_INIT(fn##_na.list), \
.handler = (fn), \
.name = (n), \
.flags = (fg), \
}; \
__register_nmi_handler((t), &fn##_na); \
})
int __register_nmi_handler(unsigned int, struct nmiaction *);
void unregister_nmi_handler(unsigned int, const char *);
void set_emergency_nmi_handler(unsigned int type, nmi_handler_t handler);
void stop_nmi(void);
void restart_nmi(void);
void local_touch_nmi(void);
#endif /* _ASM_X86_NMI_H */