linux-yocto/tools/include/uapi
Paul Chaignon 2516299184 bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETE
commit ead7f9b8de upstream.

In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
things, update the L4 checksum after reverse SNATing IPv6 packets. That
use case is however not currently supported and leads to invalid
skb->csum values in some cases. This patch adds support for IPv6 address
changes in bpf_l4_csum_update via a new flag.

When calling bpf_l4_csum_replace in Cilium, it ends up calling
inet_proto_csum_replace_by_diff:

    1:  void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
    2:                                       __wsum diff, bool pseudohdr)
    3:  {
    4:      if (skb->ip_summed != CHECKSUM_PARTIAL) {
    5:          csum_replace_by_diff(sum, diff);
    6:          if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
    7:              skb->csum = ~csum_sub(diff, skb->csum);
    8:      } else if (pseudohdr) {
    9:          *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
    10:     }
    11: }

The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
updated one of the IPv6 addresses. The helper now updates the L4 header
checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.

For an IPv6 packet, the updates of the IPv6 address and of the L4
checksum will cancel each other. The checksums are set such that
computing a checksum over the packet including its checksum will result
in a sum of 0. So the same is true here when we update the L4 checksum
on line 5. We'll update it as to cancel the previous IPv6 address
update. Hence skb->csum should remain untouched in this case.

The same bug doesn't affect IPv4 packets because, in that case, three
fields are updated: the IPv4 address, the IP checksum, and the L4
checksum. The change to the IPv4 address and one of the checksums still
cancel each other in skb->csum, but we're left with one checksum update
and should therefore update skb->csum accordingly. That's exactly what
inet_proto_csum_replace_by_diff does.

This special case for IPv6 L4 checksums is also described atop
inet_proto_csum_replace16, the function we should be using in this case.

This patch introduces a new bpf_l4_csum_replace flag, BPF_F_IPV6,
to indicate that we're updating the L4 checksum of an IPv6 packet. When
the flag is set, inet_proto_csum_replace_by_diff will skip the
skb->csum update.

Fixes: 7d672345ed ("bpf: add generic bpf_csum_diff helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/96a6bc3a443e6f0b21ff7b7834000e17fb549e05.1748509484.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27 11:11:40 +01:00
..
asm asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch 2023-06-22 17:04:36 +02:00
asm-generic tools/include: Sync uapi/asm-generic/unistd.h with the kernel sources 2024-08-07 10:58:51 -07:00
drm tools/include: Sync uapi/drm/i915_drm.h with the kernel sources 2024-08-06 12:30:31 -07:00
linux bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETE 2025-06-27 11:11:40 +01:00
README perf tools: Add tools/include/uapi/README 2024-08-06 12:30:08 -07:00

Why we want a copy of kernel headers in tools?

There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model.

The way these headers are used in perf are not restricted to just including them to compile something.

There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc.

E.g.:

$ ls -1 tools/perf/trace/beauty/*.sh | head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char *fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $

The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files.

So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers.

Another explanation from Ingo Molnar: It's better than all the alternatives we tried so far:

  • Symbolic links and direct #includes: this was the original approach but was pushed back on from the kernel side, when tooling modified the headers and broke them accidentally for kernel builds.

  • Duplicate self-defined ABI headers like glibc: double the maintenance burden, double the chance for mistakes, plus there's no tech-driven notification mechanism to look at new kernel side changes.

What we are doing now is a third option:

  • A software-enforced copy-on-write mechanism of kernel headers to tooling, driven by non-fatal warnings on the tooling side build when kernel headers get modified:

    Warning: Kernel ABI header differences: diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h ...

    The tooling policy is to always pick up the kernel side headers as-is, and integate them into the tooling build. The warnings above serve as a notification to tooling maintainers that there's changes on the kernel side.

We've been using this for many years now, and it might seem hacky, but works surprisingly well.