linux-yocto/net/xfrm
Cosmin Ratiu 25e4700489 xfrm_output: Force software GSO only in tunnel mode
[ Upstream commit 0aae2867aa6067f73d066bc98385e23c8454a1d7 ]

The cited commit fixed a software GSO bug with VXLAN + IPSec in tunnel
mode. Unfortunately, it is slightly broader than necessary, as it also
severely affects performance for Geneve + IPSec transport mode over a
device capable of both HW GSO and IPSec crypto offload. In this case,
xfrm_output unnecessarily triggers software GSO instead of letting the
HW do it. In simple iperf3 tests over Geneve + IPSec transport mode over
a back-2-back pair of NICs with MTU 1500, the performance was observed
to be up to 6x worse when doing software GSO compared to leaving it to
the hardware.

This commit makes xfrm_output only trigger software GSO in crypto
offload cases for already encapsulated packets in tunnel mode, as not
doing so would then cause the inner tunnel skb->inner_networking_header
to be overwritten and break software GSO for that packet later if the
device turns out to not be capable of HW GSO.

Taking a closer look at the conditions for the original bug, to better
understand the reasons for this change:
- vxlan_build_skb -> iptunnel_handle_offloads sets inner_protocol and
  inner network header.
- then, udp_tunnel_xmit_skb -> ip_tunnel_xmit adds outer transport and
  network headers.
- later in the xmit path, xfrm_output -> xfrm_outer_mode_output ->
  xfrm4_prepare_output -> xfrm4_tunnel_encap_add overwrites the inner
  network header with the one set in ip_tunnel_xmit before adding the
  second outer header.
- __dev_queue_xmit -> validate_xmit_skb checks whether GSO segmentation
  needs to happen based on dev features. In the original bug, the hw
  couldn't segment the packets, so skb_gso_segment was invoked.
- deep in the .gso_segment callback machinery, __skb_udp_tunnel_segment
  tries to use the wrong inner network header, expecting the one set in
  iptunnel_handle_offloads but getting the one set by xfrm instead.
- a bit later, ipv6_gso_segment accesses the wrong memory based on that
  wrong inner network header.

With the new change, the original bug (or similar ones) cannot happen
again, as xfrm will now trigger software GSO before applying a tunnel.
This concern doesn't exist in packet offload mode, when the HW adds
encapsulation headers. For the non-offloaded packets (crypto in SW),
software GSO is still done unconditionally in the else branch.

Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Fixes: a204aef9fd ("xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-28 22:03:25 +01:00
..
espintcp.c net: move netdev_max_backlog to net_hotdata 2024-03-07 21:12:42 -08:00
Kconfig ipsec: Select CRYPTO_AEAD 2023-10-01 16:28:14 +08:00
Makefile xfrm: support sending NAT keepalives in ESP in UDP states 2024-06-26 13:22:42 +02:00
xfrm_algo.c net: fill in MODULE_DESCRIPTION()s for xfrm 2024-02-09 14:12:01 -08:00
xfrm_compat.c xfrm: Add support for per cpu xfrm state handling. 2025-02-08 09:58:00 +01:00
xfrm_device.c xfrm: extract dst lookup parameters into a struct 2024-09-23 07:02:07 +02:00
xfrm_hash.c
xfrm_hash.h xfrm: add state hashtable keyed by seq 2021-05-14 13:52:01 +02:00
xfrm_inout.h xfrm: move xfrm4_extract_header to common helper 2020-05-06 09:40:08 +02:00
xfrm_input.c xfrm: Add an inbound percpu state cache. 2025-02-08 09:58:00 +01:00
xfrm_interface_bpf.c bpf: treewide: Annotate BPF kfuncs in BTF 2024-01-31 20:40:56 -08:00
xfrm_interface_core.c netdev_features: convert NETIF_F_LLTX to dev->lltx 2024-09-03 11:36:43 +02:00
xfrm_ipcomp.c net: introduce and use skb_frag_fill_page_desc() 2023-05-13 19:47:56 +01:00
xfrm_nat_keepalive.c xfrm: support sending NAT keepalives in ESP in UDP states 2024-06-26 13:22:42 +02:00
xfrm_output.c xfrm_output: Force software GSO only in tunnel mode 2025-03-28 22:03:25 +01:00
xfrm_policy.c xfrm: Cache used outbound xfrm states at the policy. 2025-02-08 09:58:00 +01:00
xfrm_proc.c xfrm: Add dir validation to "in" data path lookup 2024-05-01 10:06:27 +02:00
xfrm_replay.c xfrm: replay: Fix the update of replay_esn->oseq_hi for GSO 2025-02-08 09:58:00 +01:00
xfrm_state_bpf.c bpf: treewide: Annotate BPF kfuncs in BTF 2024-01-31 20:40:56 -08:00
xfrm_state.c xfrm: Fix acquire state insertion. 2025-02-08 09:58:18 +01:00
xfrm_sysctl.c net: Remove ctl_table sentinel elements from several networking subsystems 2024-05-03 13:29:42 +01:00
xfrm_user.c xfrm: Add error handling when nla_put_u32() returns an error 2025-02-08 09:58:18 +01:00