linux-yocto/net/core
Paul Chaignon 355b052633 bpf: Scrub packet on bpf_redirect_peer
[ Upstream commit c4327229948879814229b46aa26a750718888503 ]

When bpf_redirect_peer is used to redirect packets to a device in
another network namespace, the skb isn't scrubbed. That can lead skb
information from one namespace to be "misused" in another namespace.

As one example, this is causing Cilium to drop traffic when using
bpf_redirect_peer to redirect packets that just went through IPsec
decryption to a container namespace. The following pwru trace shows (1)
the packet path from the host's XFRM layer to the container's XFRM
layer where it's dropped and (2) the number of active skb extensions at
each function.

    NETNS       MARK  IFACE  TUPLE                                FUNC
    4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm_rcv_cb
                             .active_extensions = (__u8)2,
    4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm4_rcv_cb
                             .active_extensions = (__u8)2,
    4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  gro_cells_receive
                             .active_extensions = (__u8)2,
    [...]
    4026533547  0     eth0   10.244.3.124:35473->10.244.2.158:53  skb_do_redirect
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv_core
                             .active_extensions = (__u8)2,
    [...]
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  udp_queue_rcv_one_skb
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_policy_check
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_decode_session
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  security_xfrm_decode_session
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  kfree_skb_reason(SKB_DROP_REASON_XFRM_POLICY)
                             .active_extensions = (__u8)2,

In this case, there are no XFRM policies in the container's network
namespace so the drop is unexpected. When we decrypt the IPsec packet,
the XFRM state used for decryption is set in the skb extensions. This
information is preserved across the netns switch. When we reach the
XFRM policy check in the container's netns, __xfrm_policy_check drops
the packet with LINUX_MIB_XFRMINNOPOLS because a (container-side) XFRM
policy can't be found that matches the (host-side) XFRM state used for
decryption.

This patch fixes this by scrubbing the packet when using
bpf_redirect_peer, as is done on typical netns switches via veth
devices except skb->mark and skb->tstamp are not zeroed.

Fixes: 9aa1206e8f ("bpf: Add redirect_peer helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/1728ead5e0fe45e7a6542c36bd4e3ca07a73b7d6.1746460653.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18 08:24:05 +02:00
..
bpf_sk_storage.c bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing 2023-07-27 10:07:56 -07:00
datagram.c net: fix rc7's __skb_datagram_iter() 2024-07-18 13:21:13 +02:00
dev_addr_lists_test.c kunit: Use KUNIT_EXPECT_MEMEQ macro 2022-10-27 02:40:14 -06:00
dev_addr_lists.c net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
dev_ioctl.c net: omit ndo_hwtstamp_get() call when possible in dev_set_hwtstamp_phylib() 2023-08-06 13:25:10 +01:00
dev.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2025-05-09 09:43:57 +02:00
dev.h net: fix removing a namespace with conflicting altnames 2024-01-31 16:19:01 -08:00
drop_monitor.c drop_monitor: fix incorrect initialization order 2025-02-27 04:10:51 -08:00
dst_cache.c ipv6: introduce dst_rt6_info() helper 2024-12-14 19:59:35 +01:00
dst.c net: decrease cached dst counters in dst_release 2025-04-10 14:37:40 +02:00
failover.c net: failover: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf 2022-12-12 15:18:25 -08:00
fib_notifier.c
fib_rules.c
filter.c bpf: Scrub packet on bpf_redirect_peer 2025-05-18 08:24:05 +02:00
flow_dissector.c flow_dissector: Fix port range key handling in BPF conversion 2025-02-27 04:10:49 -08:00
flow_offload.c tc: flower: Enable offload support IPSEC SPI field. 2023-08-02 10:09:32 +01:00
gen_estimator.c net: use unrcu_pointer() helper 2024-12-09 10:32:10 +01:00
gen_stats.c net: Remove the obsolte u64_stats_fetch_*_irq() users (net). 2022-10-28 20:13:54 -07:00
gro_cells.c net: drop the weight argument from netif_napi_add 2022-09-28 18:57:14 -07:00
gro.c net: Clear old fragment checksum value in napi_reuse_skb 2025-03-07 16:45:41 +01:00
gso.c net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
hwbm.c
link_watch.c ipvlan: Fix use-after-free in ipvlan_get_iflink(). 2025-01-17 13:36:13 +01:00
lwt_bpf.c lwt: Fix return values of BPF xmit ops 2023-08-18 16:05:26 +02:00
lwtunnel.c net: lwtunnel: disable BHs when required 2025-05-02 07:50:43 +02:00
Makefile net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
neighbour.c net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES 2025-03-28 21:59:53 +01:00
net_namespace.c net: add exit_batch_rtnl() method 2025-01-23 17:21:10 +01:00
net-procfs.c net-sysfs: display two backlog queue len separately 2023-03-22 12:03:52 +01:00
net-sysfs.c ethtool: check device is present when getting link settings 2024-09-04 13:28:26 +02:00
net-sysfs.h
net-traces.c udp6: add a missing call into udp_fail_queue_rcv_skb tracepoint 2023-07-07 09:16:52 +01:00
netclassid_cgroup.c core: Variable type completion 2022-08-31 09:40:34 +01:00
netdev-genl-gen.c net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
netdev-genl-gen.h net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
netdev-genl.c netdev-genl: use struct genl_info for reply construction 2023-08-15 15:01:03 -07:00
netevent.c
netpoll.c netpoll: hold rcu read lock in __netpoll_send_skb() 2025-03-22 12:50:38 -07:00
netprio_cgroup.c
of_net.c net: Explicitly include correct DT includes 2023-07-27 20:33:16 -07:00
page_pool.c page_pool: avoid infinite loop to schedule delayed worker 2025-04-25 10:45:14 +02:00
pktgen.c pktgen: Avoid out-of-bounds access in get_imix_entries 2025-01-23 17:21:10 +01:00
ptp_classifier.c
request_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-01-31 16:19:00 -08:00
rtnetlink.c rtnetlink: Allocate vfinfo size for VF GUIDs when supported 2025-04-10 14:37:35 +02:00
scm.c io_uring/unix: drop usage of io_uring socket 2024-03-26 18:19:09 -04:00
secure_seq.c tcp: Fix data-races around sysctl knobs related to SYN option. 2022-07-20 10:14:49 +01:00
selftests.c net: selftests: initialize TCP header and skb payload with zero 2025-05-02 07:50:46 +02:00
skbuff.c ipvs: Always clear ipvs_property flag in skb_scrub_packet() 2025-03-07 16:45:40 +01:00
skmsg.c bpf: Fix wrong copied_seq calculation 2025-02-27 04:10:50 -08:00
sock_destructor.h
sock_diag.c net: use unrcu_pointer() helper 2024-12-09 10:32:10 +01:00
sock_map.c bpf: Disable non stream socket for strparser 2025-02-27 04:10:50 -08:00
sock_reuseport.c soreuseport: Fix socket selection for SO_INCOMING_CPU. 2022-10-25 11:35:16 +02:00
sock.c net: restrict SO_REUSEPORT to inet sockets 2025-01-09 13:32:02 +01:00
stream.c net: Return error from sk_stream_wait_connect() if sk_wait_event() fails 2024-01-01 12:42:30 +00:00
sysctl_net_core.c net: set the minimum for net_hotdata.netdev_budget_usecs 2025-03-07 16:45:38 +01:00
timestamping.c
tso.c net: tso: inline tso_count_descs() 2022-12-12 15:04:39 -08:00
utils.c net: core: inet[46]_pton strlen len types 2022-11-01 21:14:39 -07:00
xdp.c xdp: fix invalid wait context of page_pool_destroy() 2024-08-03 08:53:44 +02:00