linux-yocto/net/core
Jakub Kicinski 32a8746bd7 netpoll: prevent hanging NAPI when netcons gets enabled
[ Upstream commit 2da4def0f4 ]

Paolo spotted hangs in NIPA running driver tests against virtio.
The tests hang in virtnet_close() -> virtnet_napi_tx_disable().

The problem is only reproducible if running multiple of our tests
in sequence (I used TEST_PROGS="xdp.py ping.py netcons_basic.sh \
netpoll_basic.py stats.py"). Initial suspicion was that this is
a simple case of double-disable of NAPI, but instrumenting the
code reveals:

 Deadlocked on NAPI ffff888007cd82c0 (virtnet_poll_tx):
   state: 0x37, disabled: false, owner: 0, listed: false, weight: 64

The NAPI was not in fact disabled, owner is 0 (rather than -1),
so the NAPI "thinks" it's scheduled for CPU 0 but it's not listed
(!list_empty(&n->poll_list) => false). It seems odd that normal NAPI
processing would wedge itself like this.

Better suspicion is that netpoll gets enabled while NAPI is polling,
and also grabs the NAPI instance. This confuses napi_complete_done():

  [netpoll]                                   [normal NAPI]
                                        napi_poll()
                                          have = netpoll_poll_lock()
                                            rcu_access_pointer(dev->npinfo)
                                              return NULL # no netpoll
                                          __napi_poll()
					    ->poll(->weight)
  poll_napi()
    cmpxchg(->poll_owner, -1, cpu)
      poll_one_napi()
        set_bit(NAPI_STATE_NPSVC, ->state)
                                              napi_complete_done()
                                                if (NAPIF_STATE_NPSVC)
                                                  return false
                                           # exit without clearing SCHED

This feels very unlikely, but perhaps virtio has some interactions
with the hypervisor in the NAPI ->poll that makes the race window
larger?

Best I could to to prove the theory was to add and trigger this
warning in napi_poll (just before netpoll_poll_unlock()):

      WARN_ONCE(!have && rcu_access_pointer(n->dev->npinfo) &&
                napi_is_scheduled(n) && list_empty(&n->poll_list),
                "NAPI race with netpoll %px", n);

If this warning hits the next virtio_close() will hang.

This patch survived 30 test iterations without a hang (without it
the longest clean run was around 10). Credit for triggering this
goes to Breno's recent netconsole tests.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/c5a93ed1-9abe-4880-a3bb-8d1678018b1d@redhat.com
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20250726010846.1105875-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28 16:22:37 +02:00
..
bpf_sk_storage.c
datagram.c
datagram.h
dev_addr_lists.c
dev_ioctl.c net: dev: Convert sa_data to flexible array in struct sockaddr 2024-03-01 13:16:50 +01:00
dev.c net: openvswitch: fix race on port output 2025-05-02 07:41:07 +02:00
devlink.c
drop_monitor.c drop_monitor: fix incorrect initialization order 2025-03-13 12:47:34 +01:00
dst_cache.c
dst.c net: do not delay dst_entries_add() in dst_release() 2024-11-17 14:59:38 +01:00
failover.c
fib_notifier.c
fib_rules.c
filter.c bpf: Check flow_dissector ctx accesses are aligned 2025-08-28 16:22:35 +02:00
flow_dissector.c flow_dissector: Fix port range key handling in BPF conversion 2025-03-13 12:47:29 +01:00
flow_offload.c net: extract port range fields from fl_flow_key 2025-03-13 12:47:29 +01:00
gen_estimator.c
gen_stats.c
gro_cells.c
hwbm.c
link_watch.c net: linkwatch: use system_unbound_wq 2024-08-19 05:41:11 +02:00
lwt_bpf.c
lwtunnel.c
Makefile bpf: Clean up sockmap related Kconfigs 2025-06-27 11:04:09 +01:00
neighbour.c net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES 2025-04-10 14:30:53 +02:00
net_namespace.c net: defer final 'struct net' free in netns dismantle 2025-05-02 07:41:08 +02:00
net-procfs.c
net-sysfs.c ethtool: check device is present when getting link settings 2024-09-04 13:17:46 +02:00
net-sysfs.h
net-traces.c
netclassid_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2024-09-12 11:06:42 +02:00
netevent.c
netpoll.c netpoll: prevent hanging NAPI when netcons gets enabled 2025-08-28 16:22:37 +02:00
netprio_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2024-09-12 11:06:42 +02:00
page_pool.c page_pool: avoid infinite loop to schedule delayed worker 2025-05-02 07:40:48 +02:00
pktgen.c net: pktgen: fix access outside of user given buffer in pktgen_thread_write() 2025-06-04 14:37:04 +02:00
ptp_classifier.c
request_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-02-23 08:41:55 +01:00
rtnetlink.c rtnetlink: Allocate vfinfo size for VF GUIDs when supported 2025-04-10 14:30:59 +02:00
scm.c io_uring/unix: drop usage of io_uring socket 2024-03-26 18:21:45 -04:00
secure_seq.c
skbuff.c ipvs: Always clear ipvs_property flag in skb_scrub_packet() 2025-03-13 12:47:31 +01:00
skmsg.c bpf: Clean up sockmap related Kconfigs 2025-06-27 11:04:09 +01:00
sock_destructor.h inet: inet_defrag: prevent sk release while still in use 2024-10-17 15:07:37 +02:00
sock_diag.c sock_diag: annotate data-races around sock_diag_handlers[family] 2024-03-26 18:21:49 -04:00
sock_map.c bpf: Clean up sockmap related Kconfigs 2025-06-27 11:04:09 +01:00
sock_reuseport.c
sock.c sock: Correct error checking condition for (assign|release)_proto_idx() 2025-06-27 11:04:19 +01:00
stream.c
sysctl_net_core.c net: let net.core.dev_weight always be non-zero 2025-03-13 12:46:49 +01:00
timestamping.c
tso.c
utils.c net: Fix checksum update for ILA adj-transport 2025-06-27 11:04:24 +01:00
xdp.c xdp: fix invalid wait context of page_pool_destroy() 2024-08-19 05:40:48 +02:00