linux-yocto/net/sched
Eric Dumazet 0345552a65 net_sched: limit try_bulk_dequeue_skb() batches
After commit 100dfa74cad9 ("inet: dev_queue_xmit() llist adoption")
I started seeing many qdisc requeues on IDPF under high TX workload.

$ tc -s qd sh dev eth1 handle 1: ; sleep 1; tc -s qd sh dev eth1 handle 1:
qdisc mq 1: root
 Sent 43534617319319 bytes 268186451819 pkt (dropped 0, overlimits 0 requeues 3532840114)
 backlog 1056Kb 6675p requeues 3532840114
qdisc mq 1: root
 Sent 43554665866695 bytes 268309964788 pkt (dropped 0, overlimits 0 requeues 3537737653)
 backlog 781164b 4822p requeues 3537737653

This is caused by try_bulk_dequeue_skb() being only limited by BQL budget.

perf record -C120-239 -e qdisc:qdisc_dequeue sleep 1 ; perf script
...
 netperf 75332 [146]  2711.138269: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1292 skbaddr=0xff378005a1e9f200
 netperf 75332 [146]  2711.138953: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1213 skbaddr=0xff378004d607a500
 netperf 75330 [144]  2711.139631: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1233 skbaddr=0xff3780046be20100
 netperf 75333 [147]  2711.140356: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1093 skbaddr=0xff37800514845b00
 netperf 75337 [151]  2711.141037: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1353 skbaddr=0xff37800460753300
 netperf 75337 [151]  2711.141877: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1367 skbaddr=0xff378004e72c7b00
 netperf 75330 [144]  2711.142643: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1202 skbaddr=0xff3780045bd60000
...

This is bad because :

1) Large batches hold one victim cpu for a very long time.

2) Driver often hit their own TX ring limit (all slots are used).

3) We call dev_requeue_skb()

4) Requeues are using a FIFO (q->gso_skb), breaking qdisc ability to
   implement FQ or priority scheduling.

5) dequeue_skb() gets packets from q->gso_skb one skb at a time
   with no xmit_more support. This is causing many spinlock games
   between the qdisc and the device driver.

Requeues were supposed to be very rare, lets keep them this way.

Limit batch sizes to /proc/sys/net/core/dev_weight (default 64) as
__qdisc_run() was designed to use.

Fixes: 5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20251109161215.2574081-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11 17:56:50 -08:00
..
act_api.c net_sched: act: remove tcfa_qstats 2025-09-02 15:52:24 -07:00
act_bpf.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
act_connmark.c net: sched: act_connmark: initialize struct tc_ife to fix kernel leak 2025-11-11 15:00:08 +01:00
act_csum.c net_sched: add back BH safety to tcf_lock 2025-09-02 15:51:45 -07:00
act_ct.c net_sched: add back BH safety to tcf_lock 2025-09-02 15:51:45 -07:00
act_ctinfo.c net_sched: add back BH safety to tcf_lock 2025-09-02 15:51:45 -07:00
act_gact.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_gate.c net/sched: Switch to use hrtimer_setup() 2025-02-18 10:35:44 +01:00
act_ife.c net: sched: act_ife: initialize struct tc_ife to fix KMSAN kernel-infoleak 2025-11-11 15:00:08 +01:00
act_meta_mark.c
act_meta_skbprio.c
act_meta_skbtcindex.c
act_mirred.c net/sched: act_mirred: Move the recursion counter struct netdev_xmit 2025-05-15 15:23:31 +02:00
act_mpls.c net_sched: add back BH safety to tcf_lock 2025-09-02 15:51:45 -07:00
act_nat.c net_sched: add back BH safety to tcf_lock 2025-09-02 15:51:45 -07:00
act_pedit.c net_sched: add back BH safety to tcf_lock 2025-09-02 15:51:45 -07:00
act_police.c net_sched: act_police: use RCU in tcf_police_dump() 2025-07-11 16:01:17 -07:00
act_sample.c net: sched: act_sample: add action cookie to sample 2024-07-05 17:45:47 -07:00
act_simple.c net/sched: Remove redundant memset(0) call in reset_policy() 2025-08-12 17:13:29 -07:00
act_skbedit.c net_sched: add back BH safety to tcf_lock 2025-09-02 15:51:45 -07:00
act_skbmod.c net_sched: add back BH safety to tcf_lock 2025-09-02 15:51:45 -07:00
act_tunnel_key.c net_sched: add back BH safety to tcf_lock 2025-09-02 15:51:45 -07:00
act_vlan.c net_sched: add back BH safety to tcf_lock 2025-09-02 15:51:45 -07:00
bpf_qdisc.c bpf: Clean up individual BTF_ID code 2025-07-16 18:34:42 -07:00
cls_api.c tc: Ensure we have enough buffer space when sending filter netlink notifications 2025-04-08 13:57:49 +02:00
cls_basic.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
cls_bpf.c net: sched: refine software bypass handling in tc_run 2025-01-20 09:21:27 +00:00
cls_cgroup.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
cls_flow.c treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
cls_flower.c net: fix geneve_opt length integer overflow 2025-04-03 15:47:35 -07:00
cls_fw.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
cls_matchall.c net: sched: refine software bypass handling in tc_run 2025-01-20 09:21:27 +00:00
cls_route.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
cls_u32.c net: sched: refine software bypass handling in tc_run 2025-01-20 09:21:27 +00:00
em_canid.c net: fill in MODULE_DESCRIPTION()s for net/sched 2024-02-09 14:12:02 -08:00
em_cmp.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
em_ipset.c
em_ipt.c
em_meta.c net: dismiss sk_forward_alloc_get() 2025-02-19 19:05:28 -08:00
em_nbyte.c net: fill in MODULE_DESCRIPTION()s for net/sched 2024-02-09 14:12:02 -08:00
em_text.c net/sched: replace strncpy with strscpy 2025-06-23 16:56:51 -07:00
em_u32.c net: fill in MODULE_DESCRIPTION()s for net/sched 2024-02-09 14:12:02 -08:00
ematch.c net_sched: reject TCF_EM_SIMPLE case for complex ematch module 2022-12-19 09:43:18 +00:00
Kconfig sched: Add enqueue/dequeue of dualpi2 qdisc 2025-07-23 17:52:07 -07:00
Makefile sched: Add enqueue/dequeue of dualpi2 qdisc 2025-07-23 17:52:07 -07:00
sch_api.c net/sched: Abort __tc_modify_qdisc if parent is a clsact/ingress qdisc 2025-11-10 16:57:56 -08:00
sch_blackhole.c
sch_cake.c net/sched: Make cake_enqueue return NET_XMIT_CN when past buffer_limit 2025-08-20 19:27:08 -07:00
sch_cbs.c net/sched: cbs: Fix integer overflow in cbs_set_port_rate() 2024-10-15 18:25:47 -07:00
sch_choke.c net: sched: fix ordering of qlen adjustment 2024-12-04 12:54:22 +00:00
sch_codel.c net/sched: Fix backlog accounting in qdisc_dequeue_internal 2025-08-14 17:52:29 -07:00
sch_drr.c net_sched: drr: Fix double list add in class with netem as child qdisc 2025-04-28 15:55:06 -07:00
sch_dualpi2.c net/sched: sch_dualpi2: Run prob update timer in softirq to avoid deadlock 2025-08-19 17:49:01 -07:00
sch_etf.c net_sched: sch_tfs: implement lockless etf_dump() 2024-04-19 11:34:07 +01:00
sch_ets.c net/sched: ets: use old 'nbands' while purging unused classes 2025-08-13 18:11:48 -07:00
sch_fifo.c pfifo_tail_enqueue: Drop new packet when sch->limit == 0 2025-02-05 18:13:58 -08:00
sch_fq_codel.c net/sched: Fix backlog accounting in qdisc_dequeue_internal 2025-08-14 17:52:29 -07:00
sch_fq_pie.c net/sched: Fix backlog accounting in qdisc_dequeue_internal 2025-08-14 17:52:29 -07:00
sch_fq.c net/sched: Fix backlog accounting in qdisc_dequeue_internal 2025-08-14 17:52:29 -07:00
sch_frag.c net/sched: Use nested-BH locking for sch_frag_data_storage 2025-05-15 15:23:31 +02:00
sch_generic.c net_sched: limit try_bulk_dequeue_skb() batches 2025-11-11 17:56:50 -08:00
sch_gred.c sched: address a potential NULL pointer dereference in the GRED scheduler. 2025-03-06 16:35:14 -08:00
sch_hfsc.c net/sched: sch_qfq: Fix null-deref in agg_dequeue 2025-07-10 11:08:35 +02:00
sch_hhf.c net/sched: Fix backlog accounting in qdisc_dequeue_internal 2025-08-14 17:52:29 -07:00
sch_htb.c net/sched: Remove unnecessary WARNING condition for empty child qdisc in htb_activate 2025-08-20 19:27:08 -07:00
sch_ingress.c bpf: Fix too early release of tcx_entry 2024-07-08 14:07:31 -07:00
sch_mq.c net: sched: add rcu annotations around qdisc->qdisc_sleeping 2023-06-07 10:25:39 +01:00
sch_mqprio_lib.c net: sched: Fill in missing MODULE_DESCRIPTION for qdiscs 2023-11-01 21:49:09 -07:00
sch_mqprio_lib.h net/sched: mqprio: allow per-TC user input of FP adminStatus 2023-04-13 22:22:10 -07:00
sch_mqprio.c net/sched: mqprio: fix stack out-of-bounds write in tc entry parsing 2025-08-04 17:22:20 -07:00
sch_multiq.c net: sched: sch_multiq: fix possible OOB write in multiq_tune() 2024-06-05 10:50:19 +01:00
sch_netem.c net/sched: Restrict conditions for adding duplicating netems to qdisc tree 2025-07-11 15:50:21 -07:00
sch_pie.c net/sched: Fix backlog accounting in qdisc_dequeue_internal 2025-08-14 17:52:29 -07:00
sch_plug.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
sch_prio.c net_sched: prio: fix a race in prio_tune() 2025-06-12 08:05:49 -07:00
sch_qfq.c net/sched: sch_qfq: Avoid triggering might_sleep in atomic context in qfq_delete_class 2025-07-22 11:48:34 +02:00
sch_red.c Including fixes from bluetooth and wireless. 2025-06-12 09:50:36 -07:00
sch_sfb.c net/sched: Add drop reasons for AQM-based qdiscs 2024-12-17 13:27:29 +01:00
sch_sfq.c Including fixes from bluetooth and wireless. 2025-06-12 09:50:36 -07:00
sch_skbprio.c net_sched: skbprio: Remove overly strict queue assertions 2025-04-02 16:03:32 -07:00
sch_taprio.c net/sched: taprio: enforce minimum value for picos_per_byte 2025-08-01 15:15:28 -07:00
sch_tbf.c net_sched: tbf: fix a race in tbf_change() 2025-06-12 08:05:50 -07:00
sch_teql.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00