Commit Graph

43453 Commits

Author SHA1 Message Date
Christophe Leroy
ecb8bd70d5 selftests: vDSO: build tests with O2 optimization
Without -O2, the generated code for testing chacha function is awful.
GCC even implements rol32() as a function of 20 instructions instead of
just using the rotlwi instruction.

	~# time ./vdso_test_chacha
	TAP version 13
	1..1
	ok 1 chacha: PASS
	real    0m 37.16s
	user    0m 36.89s
	sys     0m 0.26s

Several other selftests directory add -O2, and the kernel is also
always built with optimisation active. Do the same for vDSO selftests.

With this patch the time is reduced by approximately 15%.

	~# time ./vdso_test_chacha
	TAP version 13
	1..1
	ok 1 chacha: PASS
	real    0m 32.09s
	user    0m 31.86s
	sys     0m 0.22s

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:35 +02:00
Xi Ruoyao
18efd0b10e LoongArch: vDSO: Wire up getrandom() vDSO implementation
Hook up the generic vDSO implementation to the LoongArch vDSO data page
by providing the required __arch_chacha20_blocks_nostack,
__arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also
wire up the selftests.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Acked-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:35 +02:00
Jason A. Donenfeld
67a121ac8f selftests: vDSO: fix cross build for getrandom and chacha tests
Unlike the check for the standalone x86 test, the check for building the
vDSO getrandom and chacaha tests looks at the architecture for the host
rather than the architecture for the target when deciding if they should
be built. Since the chacha test includes some assembler code this means
that cross building with x86 as either the target or host is broken.

There's also some additional complications, where ARCH can legitimately
be either x86_64 or x86, but the source code we need to compile lives in
a directory path containing arch/x86. The standard SRCARCH variable
handles that. And actually, all these variables and proper substitutions
are already described in tools/scripts/Makefile.arch, so just include
that to handle it.

Similarly, ARCH=x86 can actually describe ARCH=x86_64,
just with CONFIG_64BIT, so we can't rely on ARCH for selecting
non-32-bit tests. For that, check against $(ARCH)$(CONFIG_X86_32). This
won't help for people manually running this inside the vDSO selftest
directory (which isn't really supported anyway and has problems on
various archs), but it should work for builds of the kselftests, where
the CONFIG_* variables are defined. On x86_64 machines,
$(ARCH)$(CONFIG_X86_32) will evaluate to x86. On arm64 machines, it will
evaluate to arm64. On 32-bit x86 machines, it will evaluate to x86y,
which won't match the filter list.

Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:35 +02:00
Jason A. Donenfeld
7fe5b3e4e7 selftests: vDSO: open code basic chacha instead of linking to libsodium
Linking to libsodium makes building this test annoying in cross
compilation environments and is just way too much. Since this is just a
basic correctness test, simply open code a simple, unoptimized, dumb
chacha, rather than linking to libsodium.

This also fixes a correctness issue on big endian systems. The kernel's
random.c doesn't bother doing a le32_to_cpu operation on the random
bytes that are passed as the key, and consequently neither does
vgetrandom-chacha.S. However, libsodium's chacha _does_ do this, since
it takes the key as an array of bytes. This meant that the test was
broken on big endian systems, which this commit rectifies.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:35 +02:00
Jason A. Donenfeld
6fd13b282f random: vDSO: move prototype of arch chacha function to vdso/getrandom.h
Having the prototype for __arch_chacha20_blocks_nostack in
arch/x86/include/asm/vdso/getrandom.h meant that the prototype and large
doc comment were cloned by every architecture, which has been causing
unnecessary churn. Instead move it into include/vdso/getrandom.h, where
it can be shared by all archs implementing it.

As a side bonus, this then lets us use that prototype in the
vdso_test_chacha self test, to ensure that it matches the source, and
indeed doing so turned up some inconsistencies, which are rectified
here.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:35 +02:00
Jason A. Donenfeld
2aec90036d selftests: vDSO: ensure vgetrandom works in a time namespace
After verifying that vDSO getrandom does work, which ensures that the
RNG is initialized, test to see if it also works inside of a time
namespace. This is important to test, because the vvar pages get
swizzled around there. If the arch code isn't careful, the RNG will
appear uninitialized inside of a time namespace.

Because broken code makes the RNG appear uninitialized, test that
everything works by issuing a call to vgetrandom from a fork in a time
namespace, and use ptrace to ensure that the actual syscall getrandom
doesn't get called. If it doesn't get called, then the test succeeds.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:35 +02:00
Jakub Kicinski
3b7dc7000e bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZuH9UQAKCRDbK58LschI
 g0/zAP99WOcCBp1M/jSTUOba230+eiol7l5RirDEA6wu7TqY2QEAuvMG0KfCCpTI
 I0WqStrK1QMbhwKPodJC1k+17jArKgw=
 =jfMU
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-09-11

We've added 12 non-merge commits during the last 16 day(s) which contain
a total of 20 files changed, 228 insertions(+), 30 deletions(-).

There's a minor merge conflict in drivers/net/netkit.c:
  00d066a4d4 ("netdev_features: convert NETIF_F_LLTX to dev->lltx")
  d966087948 ("netkit: Disable netpoll support")

The main changes are:

1) Enable bpf_dynptr_from_skb for tp_btf such that this can be used
   to easily parse skbs in BPF programs attached to tracepoints,
   from Philo Lu.

2) Add a cond_resched() point in BPF's sock_hash_free() as there have
   been several syzbot soft lockup reports recently, from Eric Dumazet.

3) Fix xsk_buff_can_alloc() to account for queue_empty_descs which
   got noticed when zero copy ice driver started to use it,
   from Maciej Fijalkowski.

4) Move the xdp:xdp_cpumap_kthread tracepoint before cpumap pushes skbs
   up via netif_receive_skb_list() to better measure latencies,
   from Daniel Xu.

5) Follow-up to disable netpoll support from netkit, from Daniel Borkmann.

6) Improve xsk selftests to not assume a fixed MAX_SKB_FRAGS of 17 but
   instead gather the actual value via /proc/sys/net/core/max_skb_frags,
   also from Maciej Fijalkowski.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  sock_map: Add a cond_resched() in sock_hash_free()
  selftests/bpf: Expand skb dynptr selftests for tp_btf
  bpf: Allow bpf_dynptr_from_skb() for tp_btf
  tcp: Use skb__nullable in trace_tcp_send_reset
  selftests/bpf: Add test for __nullable suffix in tp_btf
  bpf: Support __nullable argument suffix for tp_btf
  bpf, cpumap: Move xdp:xdp_cpumap_kthread tracepoint before rcv
  selftests/xsk: Read current MAX_SKB_FRAGS from sysctl knob
  xsk: Bump xsk_queue::queue_empty_descs in xp_can_alloc()
  tcp_bpf: Remove an unused parameter for bpf_tcp_ingress()
  bpf, sockmap: Correct spelling skmsg.c
  netkit: Disable netpoll support

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================

Link: https://patch.msgid.link/20240911211525.13834-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12 20:22:44 -07:00
Ihor Solodrai
ea02a94687 libbpf: Add bpf_object__token_fd accessor
Add a LIBBPF_API function to retrieve the token_fd from a bpf_object.

Without this accessor, if user needs a token FD they have to get it
manually via bpf_token_create, even though a token might have been
already created by bpf_object__load.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240913001858.3345583-1-ihor.solodrai@pm.me
2024-09-12 19:07:13 -07:00
Willem de Bruijn
e874be276e selftests/net: packetdrill: import tcp/slow_start
Same import process as previous tests.

Also add CONFIG_NET_SCH_FQ to config, as one test uses that.

Same test process as previous tests. Both with and without debug mode.
Recording the steps once:

make mrproper
vng --build \
        --config tools/testing/selftests/net/packetdrill/config \
        --config kernel/configs/debug.config
vng -v --run . --user root --cpus 4 -- \
	make -C tools/testing/selftests TARGETS=net/packetdrill run_tests

Link: https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style#how-to-build
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240912005317.1253001-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12 19:04:38 -07:00
Willem de Bruijn
1e42f73fd3 selftests/net: packetdrill: import tcp/zerocopy
Same as initial tests, import verbatim from
github.com/google/packetdrill, aside from:

- update `source ./defaults.sh` path to adjust for flat dir
- add SPDX headers
- remove author statements if any
- drop blank lines at EOF (new)

Also import set_sysctls.py, which many scripts depend on to set
sysctls and then restore them later. This is no longer strictly needed
for namespacified sysctl. But not all sysctls are namespacified, and
doesn't hurt if they are.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240912005317.1253001-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12 19:04:37 -07:00
Willem de Bruijn
cded7e0479 selftests/net: packetdrill: run in netns and expand config
Run packetdrill tests inside netns.
They may change system settings, such as sysctl.

Also expand config with a few more needed CONFIGs.

Link: https://lore.kernel.org/netdev/20240910152640.429920be@kernel.org/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240912005317.1253001-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12 19:04:37 -07:00
Jakub Kicinski
46ae4d0a48 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts (sort of) and no adjacent changes.

This merge reverts commit b3c9e65eb2 ("net: hsr: remove seqnr_lock")
from net, as it was superseded by
commit 430d67bdcb ("net: hsr: Use the seqnr lock for frames received via interlink port.")
in net-next.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12 17:11:24 -07:00
Linus Torvalds
5abfdfd402 There is a recently notified BT regression with no fix yet. I
*think* such fix will not land in the next week.
 
 Including fixes from netfilter.
 
 Current release - regressions:
 
   - core: tighten bad gso csum offset check in virtio_net_hdr
 
   - netfilter: move nf flowtable bpf initialization in nf_flow_table_module_init()
 
   - eth: ice: stop calling pci_disable_device() as we use pcim
 
   - eth: fou: fix null-ptr-deref in GRO.
 
 Current release - new code bugs:
 
   - hsr: prevent NULL pointer dereference in hsr_proxy_announce()
 
 Previous releases - regressions:
 
   - hsr: remove seqnr_lock
 
   - netfilter: nft_socket: fix sk refcount leaks
 
   - mptcp: pm: fix uaf in __timer_delete_sync
 
   - phy: dp83822: fix NULL pointer dereference on DP83825 devices
 
   - eth: revert "virtio_net: rx enable premapped mode by default"
 
   - eth: octeontx2-af: Modify SMQ flush sequence to drop packets
 
 Previous releases - always broken:
 
   - eth: mlx5: fix bridge mode operations when there are no VFs
 
   - eth: igb: Always call igb_xdp_ring_update_tail() under Tx lock
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmbi8isSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkzNsQALiaGTGqOcFwVwchlWOfAuheiDeKtxM6
 LihsxBFFi+s7p5p75yL+ko3mpxz8ZxO3joPNevh+wBy7cOXgEuikPbeNokUsHjeG
 ofmz2B9+CHpf8PL0PgE+oyAi8kTZyn81oDrVLereBJqT50hKXjWbbip/s8niTJaY
 tXsYiJPZgvTFdkkJjV6INrHRWAse/tXP1o+KqIkbuEw8aerlxjOaZ1dubtuzcj3C
 rAwXNU9r6ojqmQLovfUx3Gw/RsMwAGxra0Ni9AdUxiKtqKD7r0dnfNvjUI76HWcf
 S62zMwjKS+DZ/cqiTnsqTIK7HN4gQ/R+W+C7RWlVFhu5CQoX7zL+4pkpbJn3ULEM
 mjFKnBBPoU0ET5hMOXY79iAkGIDiQSpMz5fNi85S0drPLG6ooJxIBBLgFdDspG2f
 7TASuV5Dne3Toh9YMm4mZtgWZTCR85yF8Dzy8kA6/bTNtvVQmMMQh7RYW10SrB1I
 ntDYGQcoxzPVzU42gWaDwa5ubDz6XTxLM6vsweK9mbjm9U1R1GEd6cx2cGvt88oD
 gIgLcrP7szImDzdASb0Ce3kEdKc/g0xMften10MOjPFHJGcehvawwgvRJK8Oz720
 g6Xa+WBwfGF5QrljWVk7V9sGiSK1ssAut81VBO+lBCEFwI8iMzjbjTC9kzCvO6nj
 ZECYg5JZa0XR
 =tbcX
 -----END PGP SIGNATURE-----

Merge tag 'net-6.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  There is a recently notified BT regression with no fix yet. I do not
  think a fix will land in the next week.

  Current release - regressions:

   - core: tighten bad gso csum offset check in virtio_net_hdr

   - netfilter: move nf flowtable bpf initialization in
     nf_flow_table_module_init()

   - eth: ice: stop calling pci_disable_device() as we use pcim

   - eth: fou: fix null-ptr-deref in GRO.

  Current release - new code bugs:

   - hsr: prevent NULL pointer dereference in hsr_proxy_announce()

  Previous releases - regressions:

   - hsr: remove seqnr_lock

   - netfilter: nft_socket: fix sk refcount leaks

   - mptcp: pm: fix uaf in __timer_delete_sync

   - phy: dp83822: fix NULL pointer dereference on DP83825 devices

   - eth: revert "virtio_net: rx enable premapped mode by default"

   - eth: octeontx2-af: Modify SMQ flush sequence to drop packets

  Previous releases - always broken:

   - eth: mlx5: fix bridge mode operations when there are no VFs

   - eth: igb: Always call igb_xdp_ring_update_tail() under Tx lock"

* tag 'net-6.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
  net: netfilter: move nf flowtable bpf initialization in nf_flow_table_module_init()
  net: tighten bad gso csum offset check in virtio_net_hdr
  netlink: specs: mptcp: fix port endianness
  net: dpaa: Pad packets to ETH_ZLEN
  mptcp: pm: Fix uaf in __timer_delete_sync
  net: libwx: fix number of Rx and Tx descriptors
  net: dsa: felix: ignore pending status of TAS module when it's disabled
  net: hsr: prevent NULL pointer dereference in hsr_proxy_announce()
  selftests: mptcp: include net_helper.sh file
  selftests: mptcp: include lib.sh file
  selftests: mptcp: join: restrict fullmesh endp on 1st sf
  netfilter: nft_socket: make cgroupsv2 matching work with namespaces
  netfilter: nft_socket: fix sk refcount leaks
  MAINTAINERS: Add ethtool pse-pd to PSE NETWORK DRIVER
  dt-bindings: net: tja11xx: fix the broken binding
  selftests: net: csum: Fix checksums for packets with non-zero padding
  net: phy: dp83822: Fix NULL pointer dereference on DP83825 devices
  virtio_net: disable premapped mode by default
  Revert "virtio_net: big mode skip the unmap check"
  Revert "virtio_net: rx remove premapped failover code"
  ...
2024-09-12 12:45:24 -07:00
Brendan Jackman
e4835f1da4 kunit: tool: Build compile_commands.json
compile_commands.json is used by clangd[1] to provide code navigation
and completion functionality to editors. See [2] for an example
configuration that includes this functionality for VSCode.

It can currently be built manually when using kunit.py, by running:

  ./scripts/clang-tools/gen_compile_commands.py -d .kunit

With this change however, it's built automatically so you don't need to
manually keep it up to date.

Unlike the manual approach, having make build the compile_commands.json
means that it appears in the build output tree instead of at the root of
the source tree, so you'll need to add --compile-commands-dir=.kunit to
your clangd args for it to be found. This might turn out to be pretty
annoying, I'm not sure yet. If so maybe we can later add some hackery to
kunit.py to work around it.

[1] https://clangd.llvm.org/
[2] https://github.com/FlorentRevest/linux-kernel-vscode

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-12 09:52:36 -06:00
Dave Jiang
8d8081cecf cxl: Move mailbox related bits to the same context
Create a new 'struct cxl_mailbox' and move all mailbox related bits to
it. This allows isolation of all CXL mailbox data in order to export
some of the calls to external kernel callers and avoid exporting of CXL
driver specific bits such has device states. The allocation of
'struct cxl_mailbox' is also split out with cxl_mailbox_init() so the
mailbox can be created independently.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20240905223711.1990186-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-12 08:38:01 -07:00
Will Deacon
2ef52ca02c Merge branch 'for-next/selftests' into for-next/core
* for-next/selftests:
  kselftest/arm64: Fix build warnings for ptrace
  kselftest/arm64: Actually test SME vector length changes via sigreturn
  kselftest/arm64: signal: fix/refactor SVE vector length enumeration
2024-09-12 13:43:57 +01:00
Mark Brown
f10d52087c
spi: Merge up fixes
A patch for Qualcomm depends on some fixes.
2024-09-12 12:38:44 +01:00
Marc Zyngier
f77e63e274 Merge branch kvm-arm64/selftests-6.12 into kvmarm-master/next
* kvm-arm64/selftests-6.12:
  : .
  : KVM/arm64 selftest updates for 6.12
  :
  : - Check for a bunch of timer emulation corner cases (COlton Lewis)
  : .
  KVM: arm64: selftests: Add arch_timer_edge_cases selftest
  KVM: arm64: selftests: Ensure pending interrupts are handled in arch_timer test

Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12 08:37:20 +01:00
Mina Almasry
d0caf9876a netdev: add dmabuf introspection
Add dmabuf information to page_pool stats:

$ ./cli.py --spec ../netlink/specs/netdev.yaml --dump page-pool-get
...
 {'dmabuf': 10,
  'id': 456,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 455,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 454,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 453,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 452,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 451,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 450,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 449,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},

And queue stats:

$ ./cli.py --spec ../netlink/specs/netdev.yaml --dump queue-get
...
{'dmabuf': 10, 'id': 8, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 9, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 10, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 11, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 12, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 13, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 14, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 15, 'ifindex': 3, 'type': 'rx'},

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-14-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11 20:44:32 -07:00
Mina Almasry
85585b4bc8 selftests: add ncdevmem, netcat for devmem TCP
ncdevmem is a devmem TCP netcat. It works similarly to netcat, but it
sends and receives data using the devmem TCP APIs. It uses udmabuf as
the dmabuf provider. It is compatible with a regular netcat running on
a peer, or a ncdevmem running on a peer.

In addition to normal netcat support, ncdevmem has a validation mode,
where it sends a specific pattern and validates this pattern on the
receiver side to ensure data integrity.

Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20240910171458.219195-13-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11 20:44:32 -07:00
Mina Almasry
3efd7ab46d net: netdev netlink api to bind dma-buf to a net device
API takes the dma-buf fd as input, and binds it to the netdevice. The
user can specify the rx queues to bind the dma-buf to.

Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-3-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11 20:44:31 -07:00
Matthieu Baerts (NGI0)
c66c08e51b selftests: mptcp: include net_helper.sh file
Similar to the previous commit, the net_helper.sh file from the parent
directory is used by the MPTCP selftests and it needs to be present when
running the tests.

This file then needs to be listed in the Makefile to be included when
exporting or installing the tests, e.g. with:

  make -C tools/testing/selftests \
          TARGETS=net/mptcp \
          install INSTALL_PATH=$KSFT_INSTALL_PATH

  cd $KSFT_INSTALL_PATH
  ./run_kselftest.sh -c net/mptcp

Fixes: 1af3bc912e ("selftests: mptcp: lib: use wait_local_port_listen helper")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-3-8f124aa9156d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11 15:18:20 -07:00
Matthieu Baerts (NGI0)
1a5a2d19e8 selftests: mptcp: include lib.sh file
The lib.sh file from the parent directory is used by the MPTCP selftests
and it needs to be present when running the tests.

This file then needs to be listed in the Makefile to be included when
exporting or installing the tests, e.g. with:

  make -C tools/testing/selftests \
          TARGETS=net/mptcp \
          install INSTALL_PATH=$KSFT_INSTALL_PATH

  cd $KSFT_INSTALL_PATH
  ./run_kselftest.sh -c net/mptcp

Fixes: f265d3119a ("selftests: mptcp: lib: use setup/cleanup_ns helpers")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-2-8f124aa9156d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11 15:18:19 -07:00
Matthieu Baerts (NGI0)
49ac6f05ac selftests: mptcp: join: restrict fullmesh endp on 1st sf
A new endpoint using the IP of the initial subflow has been recently
added to increase the code coverage. But it breaks the test when using
old kernels not having commit 86e39e0448 ("mptcp: keep track of local
endpoint still available for each msk"), e.g. on v5.15.

Similar to commit d4c81bbb86 ("selftests: mptcp: join: support local
endpoint being tracked or not"), it is possible to add the new endpoint
conditionally, by checking if "mptcp_pm_subflow_check_next" is present
in kallsyms: this is not directly linked to the commit introducing this
symbol but for the parent one which is linked anyway. So we can know in
advance what will be the expected behaviour, and add the new endpoint
only when it makes sense to do so.

Fixes: 4878f9f842 ("selftests: mptcp: join: validate fullmesh endp on 1st sf")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124aa9156d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11 15:18:19 -07:00
Arnaldo Carvalho de Melo
1de5b5dcb8 perf trace: Mark the 'head' arg in the set_robust_list syscall as coming from user space
With that it uses the generic BTF based pretty printer:

This one we need to think about, not being acquainted with this syscall,
should we _traverse_ that list somehow? Would that be useful?

  root@number:~# perf trace -e set_robust_list sleep 1
       0.000 ( 0.004 ms): sleep/1206493 set_robust_list(head: (struct robust_list_head){.list = (struct robust_list){.next = (struct robust_list *)0x7f48a9a02a20,},.futex_offset = (long int)-32,}, len: 24) =
  root@number:~#

strace prints the default integer args:

  root@number:~# strace -e set_robust_list sleep 1
  set_robust_list(0x7efd99559a20, 24)     = 0
  +++ exited with 0 +++
  root@number:~#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org
Link: https://lore.kernel.org/lkml/ZuH6MquMraBvODRp@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 17:25:45 -03:00
Tao Chen
7eab3a58ac bpf/selftests: Check errno when percpu map value size exceeds
This test case checks the errno message when percpu map value size
exceeds PCPU_MIN_UNIT_SIZE.

root@debian:~# ./test_maps
...
test_map_percpu_stats_hash_of_maps:PASS
test_map_percpu_stats_map_value_size:PASS
test_sk_storage_map:PASS

Signed-off-by: Jinke Han <jinkehan@didiglobal.com>
Signed-off-by: Tao Chen <chen.dylane@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240910144111.1464912-3-chen.dylane@gmail.com
2024-09-11 13:22:45 -07:00
Arnaldo Carvalho de Melo
0c1019e346 perf trace: Mark the 'rseq' arg in the rseq syscall as coming from user space
With that it uses the generic BTF based pretty printer:

  root@number:~# grep -w rseq /sys/kernel/tracing/events/syscalls/sys_enter_rseq/format
  	field:struct rseq * rseq;	offset:16;	size:8;	signed:0;
  print fmt: "rseq: 0x%08lx, rseq_len: 0x%08lx, flags: 0x%08lx, sig: 0x%08lx", ((unsigned long)(REC->rseq)), ((unsigned long)(REC->rseq_len)), ((unsigned long)(REC->flags)), ((unsigned long)(REC->sig))
  root@number:~#

Before:

  root@number:~# perf trace -e rseq
       0.000 ( 0.017 ms): Isolated Web C/1195452 rseq(rseq: 0x7ff0ecfe6fe0, rseq_len: 32, sig: 1392848979)             = 0
      74.018 ( 0.006 ms): :1195453/1195453 rseq(rseq: 0x7f2af20fffe0, rseq_len: 32, sig: 1392848979)             = 0
    1817.220 ( 0.009 ms): Isolated Web C/1195454 rseq(rseq: 0x7f5c9ec7dfe0, rseq_len: 32, sig: 1392848979)             = 0
    2515.526 ( 0.034 ms): :1195455/1195455 rseq(rseq: 0x7f61503fffe0, rseq_len: 32, sig: 1392848979)             = 0
  ^Croot@number:~#

After:

  root@number:~# perf trace -e rseq
       0.000 ( 0.019 ms): Isolated Web C/1197258 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)4,.cpu_id = (__u32)4,.mm_cid = (__u32)5,}, rseq_len: 32, sig: 1392848979) = 0
    1663.835 ( 0.019 ms): Isolated Web C/1197259 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)24,.cpu_id = (__u32)24,.mm_cid = (__u32)2,}, rseq_len: 32, sig: 1392848979) = 0
    4750.444 ( 0.018 ms): Isolated Web C/1197260 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)8,.cpu_id = (__u32)8,.mm_cid = (__u32)4,}, rseq_len: 32, sig: 1392848979) = 0
    4994.132 ( 0.018 ms): Isolated Web C/1197261 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)10,.cpu_id = (__u32)10,.mm_cid = (__u32)1,}, rseq_len: 32, sig: 1392848979) = 0
    4997.578 ( 0.011 ms): Isolated Web C/1197263 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)16,.cpu_id = (__u32)16,.mm_cid = (__u32)4,}, rseq_len: 32, sig: 1392848979) = 0
    4997.462 ( 0.014 ms): Isolated Web C/1197262 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)17,.cpu_id = (__u32)17,.mm_cid = (__u32)3,}, rseq_len: 32, sig: 1392848979) = 0
  ^Croot@number:~#

We'll probably need to come up with some way for using the BTF info to
synthesize a test that then gets used and captures the output of the
'perf trace' output to check if the arguments are the ones synthesized,
randomically, for now, lets make do manually:

  root@number:~# cat ~acme/c/rseq.c
  #include <sys/syscall.h>     /* Definition of SYS_* constants */
  #include <linux/rseq.h>
  #include <errno.h>
  #include <string.h>
  #include <unistd.h>
  #include <stdint.h>
  #include <stdio.h>

  /* Provide own rseq stub because glibc doesn't */
  __attribute__((weak))
  int sys_rseq(struct rseq *rseq, __u32 rseq_len, int flags, __u32 sig)
  {
  	return syscall(SYS_rseq, rseq, rseq_len, flags, sig);
  }

  int main(int argc, char *argv[])
  {
  	struct rseq rseq = {
  		.cpu_id_start = 12,
  		.cpu_id = 34,
  		.rseq_cs = 56,
  		.flags = 78,
  		.node_id = 90,
  		.mm_cid = 12,
  	};
  	int err = sys_rseq(&rseq, sizeof(rseq), 98765, 0xdeadbeaf);

  	printf("sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, %d, 0) = %d (%s)\n", sizeof(rseq), err, strerror(errno));
  	return err;
  }
  root@number:~# perf trace -e rseq ~acme/c/rseq
  sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument)
       0.000 ( 0.003 ms): rseq/1200640 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979)            =
       0.064 ( 0.001 ms): rseq/1200640 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)12,.cpu_id = (__u32)34,.rseq_cs = (__u64)56,.flags = (__u32)78,.node_id = (__u32)90,.mm_cid = (__u32)12,}, rseq_len: 32, flags: 98765, sig: 3735928495) = -1 EINVAL (Invalid argument)
  root@number:~#root@number:~# cat ~acme/c/rseq.c
  #include <sys/syscall.h>     /* Definition of SYS_* constants */
  #include <linux/rseq.h>
  #include <errno.h>
  #include <string.h>
  #include <unistd.h>
  #include <stdint.h>
  #include <stdio.h>

  /* Provide own rseq stub because glibc doesn't */
  __attribute__((weak))
  int sys_rseq(struct rseq *rseq, __u32 rseq_len, int flags, __u32 sig)
  {
  	return syscall(SYS_rseq, rseq, rseq_len, flags, sig);
  }

  int main(int argc, char *argv[])
  {
  	struct rseq rseq = {
  		.cpu_id_start = 12,
  		.cpu_id = 34,
  		.rseq_cs = 56,
  		.flags = 78,
  		.node_id = 90,
  		.mm_cid = 12,
  	};
  	int err = sys_rseq(&rseq, sizeof(rseq), 98765, 0xdeadbeaf);

  	printf("sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, %d, 0) = %d (%s)\n", sizeof(rseq), err, strerror(errno));
  	return err;
  }
  root@number:~# perf trace -e rseq ~acme/c/rseq
  sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument)
       0.000 ( 0.003 ms): rseq/1200640 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979)            =
       0.064 ( 0.001 ms): rseq/1200640 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)12,.cpu_id = (__u32)34,.rseq_cs = (__u64)56,.flags = (__u32)78,.node_id = (__u32)90,.mm_cid = (__u32)12,}, rseq_len: 32, flags: 98765, sig: 3735928495) = -1 EINVAL (Invalid argument)
  root@number:~#

Interesting, glibc seems to be using rseq here, as in addition to the
totally fake one this test case uses, we have this one, around these
other syscalls:

     0.175 ( 0.001 ms): rseq/1201095 set_tid_address(tidptr: 0x7f6def759a10)                               = 1201095 (rseq)
     0.177 ( 0.001 ms): rseq/1201095 set_robust_list(head: 0x7f6def759a20, len: 24)                        = 0
     0.178 ( 0.001 ms): rseq/1201095 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979)            =
     0.231 ( 0.005 ms): rseq/1201095 mprotect(start: 0x7f6def93f000, len: 16384, prot: READ)               = 0
     0.238 ( 0.003 ms): rseq/1201095 mprotect(start: 0x403000, len: 4096, prot: READ)                      = 0
     0.244 ( 0.004 ms): rseq/1201095 mprotect(start: 0x7f6def99c000, len: 8192, prot: READ)

Matches strace (well, not really as the strace in fedora:40 doesn't know
about rseq, printing just integer values in hex):

  set_robust_list(0x7fbc6acc7a20, 24)     = 0
  rseq(0x7fbc6acc8060, 0x20, 0, 0x53053053) = 0
  mprotect(0x7fbc6aead000, 16384, PROT_READ) = 0
  mprotect(0x403000, 4096, PROT_READ)     = 0
  mprotect(0x7fbc6af0a000, 8192, PROT_READ) = 0
  prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
  munmap(0x7fbc6aebd000, 81563)           = 0
  rseq(0x7fff15bb9920, 0x20, 0x181cd, 0xdeadbeaf) = -1 EINVAL (Invalid argument)
  fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x9), ...}) = 0
  getrandom("\xd0\x34\x97\x17\x61\xc2\x2b\x10", 8, GRND_NONBLOCK) = 8
  brk(NULL)                               = 0x18ff4000
  brk(0x19015000)                         = 0x19015000
  write(1, "sys_rseq({ .cpu_id_start = 12, ."..., 136sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument)
  ) = 136
  exit_group(-1)                          = ?
  +++ exited with 255 +++
  root@number:~#

And also the focus for the v6.13 should be to have a better, strace
like BTF pretty printer as one of the outputs we can get from the libbpf
BTF dumper.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuH2K1LLt1pIDkbd@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 17:05:23 -03:00
Yonghong Song
2897b1e2a2 selftests/bpf: Fix arena_atomics failure due to llvm change
llvm change [1] made a change such that __sync_fetch_and_{and,or,xor}()
will generate atomic_fetch_*() insns even if the return value is not used.
This is a deliberate choice to make sure barrier semantics are preserved
from source code to asm insn.

But the change in [1] caused arena_atomics selftest failure.

  test_arena_atomics:PASS:arena atomics skeleton open 0 nsec
  libbpf: prog 'and': BPF program load failed: Permission denied
  libbpf: prog 'and': -- BEGIN PROG LOAD LOG --
  arg#0 reference type('UNKNOWN ') size cannot be determined: -22
  0: R1=ctx() R10=fp0
  ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87
  0: (18) r1 = 0xffffc90000064000       ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4)
  2: (61) r6 = *(u32 *)(r1 +0)          ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,v
ar_off=(0x0; 0xffffffff))
  3: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
  4: (77) r0 >>= 32                     ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  5: (5d) if r0 != r6 goto pc+11        ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x)
  ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91
  6: (18) r1 = 0x100000000060           ; R1_w=scalar()
  8: (bf) r1 = addr_space_cast(r1, 0, 1)        ; R1_w=arena
  9: (18) r2 = 0x1100000000             ; R2_w=0x1100000000
  11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2)
  BPF_ATOMIC stores into R1 arena is not allowed
  processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
  -- END PROG LOAD LOG --
  libbpf: prog 'and': failed to load: -13
  libbpf: failed to load object 'arena_atomics'
  libbpf: failed to load BPF skeleton 'arena_atomics': -13
  test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13)
  #3       arena_atomics:FAIL

The reason of the failure is due to [2] where atomic{64,}_fetch_{and,or,xor}() are not
allowed by arena addresses.

Version 2 of the patch fixed the issue by using inline asm ([3]). But further discussion
suggested to find a way from source to generate locked insn which is more user
friendly. So in not-merged llvm patch ([4]), if relax memory ordering is used and
the return value is not used, locked insn could be generated.

So with llvm patch [4] to compile the bpf selftest, the following code
  __c11_atomic_fetch_and(&and64_value, 0x011ull << 32, memory_order_relaxed);
is able to generate locked insn, hence fixing the selftest failure.

  [1] https://github.com/llvm/llvm-project/pull/106494
  [2] d503a04f8b ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
  [3] https://lore.kernel.org/bpf/20240803025928.4184433-1-yonghong.song@linux.dev/
  [4] https://github.com/llvm/llvm-project/pull/107343

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240909223431.1666305-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-11 10:07:10 -07:00
Rafael J. Wysocki
0a06811d66 Merge branches 'pm-sleep', 'pm-opp' and 'pm-tools'
Merge updates related to system sleep, operating performance points
(OPP) updates, and PM tooling updates for 6.12-rc1:

 - Remove unused stub for saveable_highmem_page() and remove deprecated
   macros from power management documentation (Andy Shevchenko).

 - Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM
   sysfs interface (Xueqin Luo).

 - Update the maintainers information for the operating-points-v2-ti-cpu DT
   binding (Dhruva Gole).

 - Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring).

 - Update directory handling and installation process in the pm-graph
   Makefile and add .gitignore to ignore sleepgraph.py artifacts to
   pm-graph (Amit Vadhavana, Yo-Jung Lin).

 - Make cpupower display residency value in idle-info (Aboorva
   Devarajan).

 - Add missing powercap_set_enabled() stub function to cpupower (John
   B. Wyatt IV).

 - Add SWIG support to cpupower (John B. Wyatt IV).

* pm-sleep:
  PM: hibernate: Remove unused stub for saveable_highmem_page()
  Documentation: PM: Discourage use of deprecated macros
  PM: sleep: Use sysfs_emit() and sysfs_emit_at() in "show" functions
  PM: hibernate: Use sysfs_emit() and sysfs_emit_at() in "show" functions

* pm-opp:
  dt-bindings: opp: operating-points-v2-ti-cpu: Update maintainers
  opp: ti: Drop unnecessary of_match_ptr()

* pm-tools:
  pm:cpupower: Add error warning when SWIG is not installed
  MAINTAINERS: Add Maintainers for SWIG Python bindings
  pm:cpupower: Include test_raw_pylibcpupower.py
  pm:cpupower: Add SWIG bindings files for libcpupower
  pm:cpupower: Add missing powercap_set_enabled() stub function
  pm-graph: Update directory handling and installation process in Makefile
  pm-graph: Make git ignore sleepgraph.py artifacts
  tools/cpupower: display residency value in idle-info
2024-09-11 19:02:23 +02:00
Andrii Nakryiko
3c217a1820 selftests/bpf: add build ID tests
Add a new set of tests validating behavior of capturing stack traces
with build ID. We extend uprobe_multi target binary with ability to
trigger uprobe (so that we can capture stack traces from it), but also
we allow to force build ID data to be either resident or non-resident in
memory (see also a comment about quirks of MADV_PAGEOUT).

That way we can validate that in non-sleepable context we won't get
build ID (as expected), but with sleepable uprobes we will get that
build ID regardless of it being physically present in memory.

Also, we add a small add-on linker script which reorders
.note.gnu.build-id section and puts it after (big) .text section,
putting build ID data outside of the very first page of ELF file. This
will test all the relaxations we did in build ID parsing logic in kernel
thanks to freader abstraction.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-11-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-11 09:58:31 -07:00
Vincent Donnefort
75d7ff9aa0 selftests/ring-buffer: Handle meta-page bigger than the system
Handle the case where the meta-page content is bigger than the system
page-size. This prepares the ground for extending features covered by
the meta-page.

Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-kselftest@vger.kernel.org
Link: https://lore.kernel.org/20240910162335.2993310-3-vdonnefort@google.com
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-09-11 12:25:12 -04:00
Vincent Donnefort
21ff365b5c selftests/ring-buffer: Verify the entire meta-page padding
Improve the ring-buffer meta-page test coverage by checking for the
entire padding region to be 0 instead of just looking at the first 4
bytes.

Cc: linux-kselftest@vger.kernel.org
Link: https://lore.kernel.org/20240910162335.2993310-2-vdonnefort@google.com
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-09-11 12:25:02 -04:00
Kan Liang
edf3ce0ed3 perf env: Find correct branch counter info on hybrid
No event is printed in the "Branch Counter" column on hybrid machines.

For example,

  $ perf record -e "{cpu_core/branch-instructions/pp,cpu_core/branches/}:S" -j any,counter
  $ perf report --total-cycles

  # Branch counter abbr list:
  # cpu_core/branch-instructions/pp = A
  # cpu_core/branches/ = B
  # '-' No event occurs
  # '+' Event occurrences may be lost due to branch counter saturated
  #
  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles  Branch Counter
  # ...............  ..............  ...........  ..........  ..............
            44.54%          727.1K        0.00%           1   |+   |+   |
            36.31%          592.7K        0.00%           2   |+   |+   |
            17.83%          291.1K        0.00%           1   |+   |+   |

The branch counter information (br_cntr_width and br_cntr_nr) in the
perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS
is not available on hybrid machines. Without the width information, the
number of occurrences of an event cannot be calculated.

For a hybrid machine, the caps information should be retrieved from the
PMU_CAPS, and stored in the perf_env->pmu_caps.

Add a perf_env__find_br_cntr_info() to return the correct branch counter
information from the corresponding fields.

Committer notes:

While testing I couldn't s ee those "Branch counter" columns enabled by
pressing 'B' on the TUI, after reporting it to the list Kan explained
the situation:

<quote Kan Liang>
For a hybrid client, the "Branch Counter" feature is only supported
starting from the just released Lunar Lake. Perf falls back to only
"ANY" on your Raptor Lake.

The "The branch counter is not available" message is expected.

Here is the 'perf evlist' result from my Lunar Lake machine,

  # perf evlist -v
  cpu_core/branch-instructions/pp: type: 4 (cpu_core), size: 136, config: 0xc4 (branch-instructions), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|READ|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID|GROUP|LOST, disabled: 1, freq: 1, enable_on_exec: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, branch_sample_type: ANY|COUNTERS
  #
</quote>

Fixes: 6f9d8d1de2 ("perf script: Add branch counters")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240909184201.553519-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 13:08:46 -03:00
Kan Liang
9953807c9e perf evlist: Print hint for group
An event group is a critical relationship. There is a -g option that can
display the relationship. But it's hard for a user to know when should
this option be applied.

If there is an event group in the perf record, print a hint to suggest
the user apply the -g to display the group information.

With the patch,

  $ perf record -e "{cycles,instructions},instructions" sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.024 MB perf.data (4 samples) ]
  $

  $ perf evlist
  cycles
  instructions
  instructions
  # Tip: use 'perf evlist -g' to show group information

  $ perf evlist -g
  {cycles,instructions}
  instructions
  $

Committer testing:

So for a perf.data file _with_ a group:

  root@number:~# perf evlist -g
  {cpu_core/branch-instructions/pp,cpu_core/branches/}
  dummy:u
  root@number:~# perf evlist
  cpu_core/branch-instructions/pp
  cpu_core/branches/
  dummy:u
  # Tip: use 'perf evlist -g' to show group information
  root@number:~#

Then for something _without_ a group, no hint:

  root@number:~# perf record ls
  <SNIP>
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.035 MB perf.data (7 samples) ]
  root@number:~# perf evlist
  cpu_atom/cycles/P
  cpu_core/cycles/P
  dummy:u
  root@number:~#

No suggestion, good.

Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Closes: https://lore.kernel.org/lkml/ZttgvduaKsVn1r4p@x1/
Link: https://lore.kernel.org/r/20240908202847.176280-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 13:08:45 -03:00
Sam James
eb9b9a6f5a tools: Drop nonsensical -O6
-O6 is very much not-a-thing. Really, this should've been dropped
entirely in 49b3cd306e ("tools: Set the maximum optimization level
according to the compiler being used") instead of just passing it for
not-Clang.

Just collapse it down to -O3, instead of "-O6 unless Clang, in which case
-O3".

GCC interprets > -O3 as -O3. It doesn't even interpret > -O3 as -Ofast,
which is a good thing, given -Ofast has specific (non-)requirements for
code built using it. So, this does nothing except look a bit daft.

Remove the silliness and also save a few lines in the Makefiles accordingly.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com>
Signed-off-by: Sam James <sam@gentoo.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/4f01524fa4ea91c7146a41e26ceaf9dae4c127e4.1725821201.git.sam@gentoo.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 13:08:36 -03:00
Philo Lu
83dff60171 selftests/bpf: Expand skb dynptr selftests for tp_btf
Add 3 test cases for skb dynptr used in tp_btf:
- test_dynptr_skb_tp_btf: use skb dynptr in tp_btf and make sure it is
  read-only.
- skb_invalid_ctx_fentry/skb_invalid_ctx_fexit: bpf_dynptr_from_skb
  should fail in fentry/fexit.

In test_dynptr_skb_tp_btf, to trigger the tracepoint in kfree_skb,
test_pkt_access is used for its test_run, as in kfree_skb.c. Because the
test process is different from others, a new setup type is defined,
i.e., SETUP_SKB_PROG_TP.

The result is like:
$ ./test_progs -t 'dynptr/test_dynptr_skb_tp_btf'
  #84/14   dynptr/test_dynptr_skb_tp_btf:OK
  #84      dynptr:OK
  #127     kfunc_dynptr_param:OK
  Summary: 2/1 PASSED, 0 SKIPPED, 0 FAILED

$ ./test_progs -t 'dynptr/skb_invalid_ctx_f'
  #84/85   dynptr/skb_invalid_ctx_fentry:OK
  #84/86   dynptr/skb_invalid_ctx_fexit:OK
  #84      dynptr:OK
  #127     kfunc_dynptr_param:OK
  Summary: 2/2 PASSED, 0 SKIPPED, 0 FAILED

Also fix two coding style nits (change spaces to tabs).

Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240911033719.91468-6-lulie@linux.alibaba.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11 08:57:54 -07:00
Philo Lu
2060f07f86 selftests/bpf: Add test for __nullable suffix in tp_btf
Add a tracepoint with __nullable suffix in bpf_testmod, and add cases
for it:

$ ./test_progs -t "tp_btf_nullable"
 #406/1   tp_btf_nullable/handle_tp_btf_nullable_bare1:OK
 #406/2   tp_btf_nullable/handle_tp_btf_nullable_bare2:OK
 #406     tp_btf_nullable:OK
 Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240911033719.91468-3-lulie@linux.alibaba.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11 08:56:42 -07:00
zhang jiao
a0474b8d59 selftests: kselftest: Use strerror() on nolibc
Nolibc gained an implementation of strerror() recently.
Use it and drop the ifndef.

Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com>
Acked-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-11 09:52:33 -06:00
Ian Rogers
89c0a55e55 perf pmu: To info add event_type_desc
All PMU events are assumed to be "Kernel PMU event", however, this
isn't true for fake PMUs and won't be true with the addition of more
software PMUs. Make the PMU's type description name configurable -
largely for printing callbacks.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240907050830.6752-5-irogers@google.com
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:29:20 -03:00
Ian Rogers
f08cc25843 perf evsel: Add accessor for tool_event
Currently tool events use a dedicated variable within the evsel. Later
changes will move this to the unused struct perf_event_attr config for
these events. Add an accessor to allow the later change to be well
typed and avoid changing all uses.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240907050830.6752-4-irogers@google.com
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:28:27 -03:00
Ian Rogers
925320737a perf pmus: Fake PMU clean up
Rather than passing a fake PMU around, just pass that the fake PMU
should be used - true when doing testing. Move the fake PMU into
pmus.[ch] and try to abstract the PMU's properties in pmu.c, ie so
there is less "if fake_pmu" in non-PMU code. Give the fake PMU a made
up type number.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240907050830.6752-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:27:42 -03:00
Ian Rogers
d3d5c1a00f perf list: Avoid potential out of bounds memory read
If a desc string is 0 length then -1 will be out of bounds, add a
check.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240907050830.6752-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:26:27 -03:00
Andrew Kreimer
4ae354d73a perf help: Fix a typo ("bellow")
Fix a typo in comments.

Reported-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20240907131006.18510-1-algonell@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:24:12 -03:00
Maciej Fijalkowski
d41905b3bb selftests/xsk: Read current MAX_SKB_FRAGS from sysctl knob
Currently, xskxceiver assumes that MAX_SKB_FRAGS value is always 17
which is not true - since the introduction of BIG TCP this can now take
any value between 17 to 45 via CONFIG_MAX_SKB_FRAGS.

Adjust the TOO_MANY_FRAGS test case to read the currently configured
MAX_SKB_FRAGS value by reading it from /proc/sys/net/core/max_skb_frags.
If running system does not provide that sysctl file then let us try
running the test with a default value.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20240910124129.289874-1-maciej.fijalkowski@intel.com
2024-09-11 15:48:35 +02:00
Changbin Du
74298dd8ac perf ftrace: Detect whether ftrace is enabled on system
To make error messages more accurate, this change detects whether ftrace is
enabled on system by checking trace file "set_ftrace_pid".

Before:

  # perf ftrace
  failed to reset ftrace
  #

After:

  # perf ftrace
  ftrace is not supported on this system
  #

Committer testing:

Doing it in an unprivileged toolbox container on Fedora 40:

Before:

  acme@number:~/git/perf-tools-next$ toolbox enter perf
  ⬢[acme@toolbox perf-tools-next]$ sudo su -
  ⬢[root@toolbox ~]# ~acme/bin/perf ftrace
  failed to reset ftrace
  ⬢[root@toolbox ~]#

After this patch:

  ⬢[root@toolbox ~]# ~acme/bin/perf ftrace
  ftrace is not supported on this system
  ⬢[root@toolbox ~]#

Maybe we could check if we are in such as situation, inside an
unprivileged container, and provide a HINT line?

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240911100126.900779-1-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:35 -03:00
Arnaldo Carvalho de Melo
83420d5f58 perf test shell probe_vfs_getname: Remove extraneous '=' from probe line number regex
Thomas reported the vfs_getname perf tests failing on s/390, it seems it
was just to some extraneous '=' somehow getting into the regexp, remove
it, now:

  root@x1:~# perf test getname
   91: Add vfs_getname probe to get syscall args filenames             : Ok
   93: Use vfs_getname probe to get syscall args filenames             : FAILED!
  126: Check open filename arg using perf trace + vfs_getname          : Ok
  root@x1:~#

Second one remains a mistery, have to take some time to nail it down.

Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>,
Link: https://lore.kernel.org/lkml/1d7f3b7b-9edc-4d90-955c-9345428563f1@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:34 -03:00
Arnaldo Carvalho de Melo
9327f0ecad perf build: Require at least clang 16.0.6 to build BPF skeletons
Howard reported problems using perf features that use BPF:

  perf $ clang -v
  Debian clang version 15.0.6
  Target: x86_64-pc-linux-gnu
  Thread model: posix
  InstalledDir: /bin
  Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Candidate multilib: .;@m64
  Selected multilib: .;@m64
  perf $ ./perf trace -e write --max-events=1
  libbpf: prog 'sys_enter_rename': BPF program load failed: Permission denied
  libbpf: prog 'sys_enter_rename': -- BEGIN PROG LOAD LOG --
  0: R1=ctx() R10=fp0

But it works with:

  perf $ clang -v
  Debian clang version 16.0.6 (15~deb12u1)
  Target: x86_64-pc-linux-gnu
  Thread model: posix
  InstalledDir: /bin
  Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Candidate multilib: .;@m64
  Selected multilib: .;@m64
  perf $ ./perf trace -e write --max-events=1
       0.000 ( 0.009 ms): gmain/1448 write(fd: 4, buf: \1\0\0\0\0\0\0\0, count: 8)                         = 8 (kworker/0:0-eve)
  perf $

So lets make that the required version, if you happen to have a slightly
older version where this work, please report so that we can adjust the
minimum required version.

Reported-by: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuGL9ROeTV2uXoSp@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:34 -03:00
Arnaldo Carvalho de Melo
4c1af9bf97 perf trace: If a syscall arg is marked as 'const', assume it is coming _from_ userspace
We need to decide where to copy syscall arg contents, if at the
syscalls:sys_entry hook, meaning is something that is coming from
user to kernel space, or if it is a response, i.e. if it is something
the _kernel_ is filling in and thus going to userspace.

Since we have 'const' used in those syscalls, and unsure about this
being consistent, doing:

  root@number:~# echo $(grep const /sys/kernel/tracing/events/syscalls/sys_enter_*/format  | grep struct | cut -c47- | cut -d'/' -f1)
  clock_nanosleep clock_settime epoll_pwait2 futex io_pgetevents landlock_create_ruleset listmount mq_getsetattr mq_notify mq_timedreceive mq_timedsend preadv2 preadv prlimit64 process_madvise process_vm_readv process_vm_readv process_vm_writev process_vm_writev pwritev2 pwritev readv rt_sigaction rt_sigtimedwait semtimedop statmount timerfd_settime timer_settime vmsplice writev
  root@number:~#

Seems to indicate that we can use that for the ones that have the
'const' to mark it as coming from user space, do it.

Most notable/frequent syscall that now gets BTF pretty printed in a
system wide 'perf trace' session is:

  root@number:~# perf trace
     21.160 (         ): MediaSu~isor #/1028597 futex(uaddr: 0x7f49e1dfe964, op: WAIT_BITSET|PRIVATE_FLAG, utime: (struct __kernel_timespec){.tv_sec = (__kernel_time64_t)50290,.tv_nsec = (long long int)810362837,}, val3: MATCH_ANY) ...
      21.166 ( 0.000 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa00, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
      21.169 ( 0.001 ms): RemVidChild/6995 sendmsg(fd: 25<socket:[78915]>, msg: 0x7f49e9af9da0, flags: DONTWAIT) = 280
      21.172 ( 0.289 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa58, op: WAIT_BITSET|PRIVATE_FLAG|CLOCK_REALTIME, val3: MATCH_ANY) = 0
      21.463 ( 0.000 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa00, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
      21.467 ( 0.001 ms): RemVidChild/6995 futex(uaddr: 0x7f49e28bb964, op: WAKE|PRIVATE_FLAG, val: 1)           = 1
      21.160 ( 0.314 ms): MediaSu~isor #/1028597  ... [continued]: futex())                                            = 0
      21.469 (         ): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa5c, op: WAIT_BITSET|PRIVATE_FLAG|CLOCK_REALTIME, val3: MATCH_ANY) ...
      21.475 ( 0.000 ms): MediaSu~isor #/1028597 futex(uaddr: 0x7f49d0223040, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
      21.478 ( 0.001 ms): MediaSu~isor #/1028597 futex(uaddr: 0x7f49e26ac964, op: WAKE|PRIVATE_FLAG, val: 1)           = 1
  ^Croot@number:~#
  root@number:~# cat /sys/kernel/tracing/events/syscalls/sys_enter_futex/format
  name: sys_enter_futex
  ID: 454
  format:
  	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
  	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
  	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
  	field:int common_pid;	offset:4;	size:4;	signed:1;

  	field:int __syscall_nr;	offset:8;	size:4;	signed:1;
  	field:u32 * uaddr;	offset:16;	size:8;	signed:0;
  	field:int op;	offset:24;	size:8;	signed:0;
  	field:u32 val;	offset:32;	size:8;	signed:0;
  	field:const struct __kernel_timespec * utime;	offset:40;	size:8;	signed:0;
  	field:u32 * uaddr2;	offset:48;	size:8;	signed:0;
  	field:u32 val3;	offset:56;	size:8;	signed:0;

  print fmt: "uaddr: 0x%08lx, op: 0x%08lx, val: 0x%08lx, utime: 0x%08lx, uaddr2: 0x%08lx, val3: 0x%08lx", ((unsigned long)(REC->uaddr)), ((unsigned long)(REC->op)), ((unsigned long)(REC->val)), ((unsigned long)(REC->utime)), ((unsigned long)(REC->uaddr2)), ((unsigned long)(REC->val3))
  root@number:~#

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fWnuQrrBoTn6Rrn6vM_xQ2fCoc9i-AitD7abTcNi-4o1Q@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:34 -03:00
Yang Li
e37b315c17 perf parse-events: Remove duplicated include in parse-events.c
The header files parse-events.h is included twice in parse-events.c,
so one inclusion of each can be removed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10822
Link: https://lore.kernel.org/r/20240910005522.35994-1-yang.lee@linux.alibaba.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:27 -03:00
Jakub Kicinski
ea403549da ipsec-next-2024-09-10
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmbf6xAACgkQrB3Eaf9P
 W7eZQA/9HuHTWBg0V43QDT1rjNnKult+uBKYpKrh045outqMs+cU8bsww5ZuIAKx
 ktN66OCE67d7XeFttb9UAJUPqQ98RjwjVUOpjRJ5iRDtj2bmn/5VGSYuH7zx5so0
 msFs5gkomo2ZZNjcMOSrDVGUoCdlHh1og5L2KN/FgztSA1smDdUBQOWNm1peezbI
 eJFt2Q6KCNfzwPthmQte0dmDnK5gWPducereSx03tMuSyUmPML1zrzOFXBXSg09e
 dAlDTxbAXZDrXS4Ii0y/FEM2Ugkjg9FXbE1kvM0i05GIc/SGnEBGEcdW5YbmRhOL
 4JlLnpiLTmKTaIZ0GdpADv7XZMga6R01AalSPsJz+H7aNAHTKkK+SzQY4YXRucZy
 SsASM39oRLzo9Bm4ZZ773Nw83cxBgO/ZixK4KVvCZI/1ftD+9zn72eqk+CeveSeE
 ChaXGuWpRdfAOsgozFJNFx/ffK5qzxFKkIeN9KN0QYV/XJuZJ7nD6eQkH9ydgvTI
 4cexY+cs4wgfdi9dDkVHPVhCR7mRlfi5r/VL8rtWWnWzR07okKF4rW6dgvx33m60
 9MmF1/EdD2uh3CLcBMjNg6qXdC07VeDpFLqWs+utJvSHMuI43uE4FkRQui/J6T9N
 RX7zzkFBsPvPpm5GHLx2u/wvnzX1co1Rk9xzbC+J6FEPlm2/0vI=
 =ErGl
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2024-09-10

1) Remove an unneeded WARN_ON on packet offload.
   From Patrisious Haddad.

2) Add a copy from skb_seq_state to buffer function.
   This is needed for the upcomming IPTFS patchset.
   From Christian Hopps.

3) Spelling fix in xfrm.h.
   From Simon Horman.

4) Speed up xfrm policy insertions.
   From Florian Westphal.

5) Add and revert a patch to support xfrm interfaces
   for packet offload. This patch was just half cooked.

6) Extend usage of the new xfrm_policy_is_dead_or_sk helper.
   From Florian Westphal.

7) Update comments on sdb and xfrm_policy.
   From Florian Westphal.

8) Fix a null pointer dereference in the new policy insertion
   code From Florian Westphal.

9) Fix an uninitialized variable in the new policy insertion
   code. From Nathan Chancellor.

* tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()
  xfrm: policy: fix null dereference
  Revert "xfrm: add SA information to the offloaded packet"
  xfrm: minor update to sdb and xfrm_policy comments
  xfrm: policy: use recently added helper in more places
  xfrm: add SA information to the offloaded packet
  xfrm: policy: remove remaining use of inexact list
  xfrm: switch migrate to xfrm_policy_lookup_bytype
  xfrm: policy: don't iterate inexact policies twice at insert time
  selftests: add xfrm policy insertion speed test script
  xfrm: Correct spelling in xfrm.h
  net: add copy from skb_seq_state to buffer function
  xfrm: Remove documentation WARN_ON to limit return values for offloaded SA
====================

Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10 19:00:47 -07:00
Jason Xing
fffe8efd68 net-timestamp: add selftests for SOF_TIMESTAMPING_OPT_RX_FILTER
Test a few possible cases where we use SOF_TIMESTAMPING_OPT_RX_FILTER
with software or hardware report/generation flag.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240909015612.3856-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10 16:55:23 -07:00
Sean Anderson
e8a63d473b selftests: net: csum: Fix checksums for packets with non-zero padding
Padding is not included in UDP and TCP checksums. Therefore, reduce the
length of the checksummed data to include only the data in the IP
payload. This fixes spurious reported checksum failures like

rx: pkt: sport=33000 len=26 csum=0xc850 verify=0xf9fe
pkt: bad csum

Technically it is possible for there to be trailing bytes after the UDP
data but before the Ethernet padding (e.g. if sizeof(ip) + sizeof(udp) +
udp.len < ip.len). However, we don't generate such packets.

Fixes: 91a7de8560 ("selftests/net: add csum offload test")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240906210743.627413-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10 16:33:31 -07:00
Ian Rogers
02b2705017 perf callchain: Allow symbols to be optional when resolving a callchain
In uses like 'perf inject' it is not necessary to gather the symbol for
each call chain location, the map for the sample IP is wanted so that
build IDs and the like can be injected. Make gathering the symbol in the
callchain_cursor optional.

For a 'perf inject -B' command this lowers the peak RSS from 54.1MB to
29.6MB by avoiding loading symbols.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Ian Rogers
64eed019f3 perf inject: Lazy build-id mmap2 event insertion
Add -B option that lazily inserts mmap2 events thereby dropping all
mmap events without samples. This is similar to the behavior of -b
where only build_id events are inserted when a dso is accessed in a
sample.

File size savings can be significant in system-wide mode, consider:

  $ perf record -g -a -o perf.data sleep 1
  $ perf inject -B -i perf.data -o perf.new.data
  $ ls -al perf.data perf.new.data
           5147049 perf.data
           2248493 perf.new.data

Give test coverage of the new option in pipe test.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Ian Rogers
d762ba020d perf inject: Add new mmap2-buildid-all option
Add an option that allows all mmap or mmap2 events to be rewritten as
mmap2 events with build IDs.

This is similar to the existing -b/--build-ids and --buildid-all options
except instead of adding a build_id event an existing mmap/mmap2 event
is used as a template and a new mmap2 event synthesized from it.

As mmap2 events are typical this avoids the insertion of build_id
events.

Add test coverage to the pipe test.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Ian Rogers
ae39ba1655 perf inject: Fix build ID injection
Build ID injection wasn't inserting a sample ID and aligning events to
64 bytes rather than 8. No sample ID means events are unordered and two
different build_id events for the same path, as happens when a file is
replaced, can't be differentiated.

Add in sample ID insertion for the build_id events alongside some
refactoring. The refactoring better aligns the function arguments for
different use cases, such as synthesizing build_id events without
needing to have a dso. The misc bits are explicitly passed as with
callchains the maps/dsos may span user and kernel land, so using
sample->cpumode isn't good enough.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Namhyung Kim
02648783c2 perf annotate-data: Add pr_debug_scope()
The pr_debug_scope() is to print more information about the scope DIE
during the instruction tracking so that it can help finding relevant
debug info and the source code like inlined functions more easily.

  $ perf --debug type-profile annotate --data-type
  ...
  -----------------------------------------------------------
  find data type for 0(reg0, reg12) at set_task_cpu+0xdd
  CU for kernel/sched/core.c (die:0x1268dae)
  frame base: cfa=1 fbreg=7
  scope: [3/3] (die:12b6d28) [inlined] set_task_rq       <<<--- (here)
  bb: [9f - dd]
  var [9f] reg3 type='struct task_struct*' size=0x8 (die:0x126aff0)
  var [9f] reg6 type='unsigned int' size=0x4 (die:0x1268e0d)

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240909214251.3033827-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Namhyung Kim
c8b9358778 perf annotate: Treat 'call' instruction as stack operation
I found some portion of mem-store events sampled on CALL instruction
which has no memory access.  But it actually saves a return address
into stack.  It should be considered as a stack operation like RET
instruction.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240909214251.3033827-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
James Clark
332f60ac05 perf build: Remove unused feature test target
llvm-version was removed in commit 56b11a2126 ("perf bpf: Remove
support for embedding clang for compiling BPF events (-e foo.c)") but
some parts were left in the Makefile so finish removing them.

Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Manu Bretelle <chantr4@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <qmo@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20240910140405.568791-2-james.clark@linaro.org
[ Removed one leftover, 'llvm-version' from FEATURE_TESTS_EXTRA ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
James Clark
206dcfca1f perf build: Autodetect minimum required llvm-dev version
The new LLVM addr2line feature requires a minimum version of 13 to
compile. Add a feature check for the version so that NO_LLVM=1 doesn't
need to be explicitly added. Leave the existing llvm feature check
intact because it's used by tools other than Perf.

This fixes the following compilation error when the llvm-dev version
doesn't match:

  util/llvm-c-helpers.cpp: In function 'char* llvm_name_for_code(dso*, const char*, u64)':
  util/llvm-c-helpers.cpp:178:21: error: 'std::remove_reference_t<llvm::DILineInfo>' {aka 'struct llvm::DILineInfo'} has no member named 'StartAddress'
    178 |   addr, res_or_err->StartAddress ? *res_or_err->StartAddress : 0);

Fixes: c3f8644c21 ("perf report: Support LLVM for addr2line()")
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Manu Bretelle <chantr4@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <qmo@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20240910140405.568791-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:46 -03:00
Arnaldo Carvalho de Melo
375f9262ac perf trace: Mark the rlim arg in the prlimit64 and setrlimit syscalls as coming from user space
With that it uses the generic BTF based pretty printer:

  root@number:~# perf trace -e prlimit64
       0.000 ( 0.004 ms): :3417020/3417020 prlimit64(resource: NOFILE, old_rlim: 0x7fb8842fe3b0)      = 0
       0.126 ( 0.003 ms): Chroot Helper/3417022 prlimit64(resource: NOFILE, old_rlim: 0x7fb8842fdfd0) = 0
      12.557 ( 0.005 ms): firefox/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1b80)        = 0
      26.640 ( 0.006 ms): MainThread/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1780)     = 0
      27.553 ( 0.002 ms): Web Content/3417020 prlimit64(resource: AS, old_rlim: 0x7ffe9ade1660)       = 0
      29.405 ( 0.003 ms): Web Content/3417020 prlimit64(resource: NOFILE, old_rlim: 0x7ffe9ade0c80)   = 0
      30.471 ( 0.002 ms): Web Content/3417020 prlimit64(resource: RTTIME, old_rlim: 0x7ffe9ade1370)   = 0
      30.485 ( 0.001 ms): Web Content/3417020 prlimit64(resource: RTTIME, new_rlim: (struct rlimit64){.rlim_cur = (__u64)50000,.rlim_max = (__u64)200000,}) = 0
      31.779 ( 0.001 ms): Web Content/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1670)    = 0
  ^Croot@number:~#

Better than before, still needs improvements in the configurability of
the libbpf BTF dumper to get it to the strace output standard.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuBQI-f8CGpuhIdH@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:46 -03:00
Arnaldo Carvalho de Melo
f3f16112c6 perf trace: Support collecting 'union's with the BPF augmenter
And reuse the BTF based struct pretty printer, with that we can offer
initial support for the 'bpf' syscall's second argument, a 'union
bpf_attr' pointer.

But this is not that satisfactory as the libbpf btf dumper will pretty
print _all_ the union, we need to have a way to say that the first arg
selects the type for the union member to be pretty printed, something
like what pahole does translating the PERF_RECORD_ selector into a name,
and using that name to find a matching struct.

In the case of 'union bpf_attr' it would map PROG_LOAD to one of the
union members, but unfortunately there is no such mapping:

  root@number:~# pahole bpf_attr
  union bpf_attr {
  	struct {
  		__u32              map_type;           /*     0     4 */
  		__u32              key_size;           /*     4     4 */
  		__u32              value_size;         /*     8     4 */
  		__u32              max_entries;        /*    12     4 */
  		__u32              map_flags;          /*    16     4 */
  		__u32              inner_map_fd;       /*    20     4 */
  		__u32              numa_node;          /*    24     4 */
  		char               map_name[16];       /*    28    16 */
  		__u32              map_ifindex;        /*    44     4 */
  		__u32              btf_fd;             /*    48     4 */
  		__u32              btf_key_type_id;    /*    52     4 */
  		__u32              btf_value_type_id;  /*    56     4 */
  		__u32              btf_vmlinux_value_type_id; /*    60     4 */
  		/* --- cacheline 1 boundary (64 bytes) --- */
  		__u64              map_extra;          /*    64     8 */
  		__s32              value_type_btf_obj_fd; /*    72     4 */
  		__s32              map_token_fd;       /*    76     4 */
  	};                                             /*     0    80 */
  	struct {
  		__u32              map_fd;             /*     0     4 */

  		/* XXX 4 bytes hole, try to pack */

  		__u64              key;                /*     8     8 */
  		union {
  			__u64      value;              /*    16     8 */
  			__u64      next_key;           /*    16     8 */
  		};                                     /*    16     8 */
  		__u64              flags;              /*    24     8 */
  	};                                             /*     0    32 */
  	struct {
  		__u64              in_batch;           /*     0     8 */
  		__u64              out_batch;          /*     8     8 */
  		__u64              keys;               /*    16     8 */
  		__u64              values;             /*    24     8 */
  		__u32              count;              /*    32     4 */
  		__u32              map_fd;             /*    36     4 */
  		__u64              elem_flags;         /*    40     8 */
  		__u64              flags;              /*    48     8 */
  	} batch;                                       /*     0    56 */
  	struct {
  		__u32              prog_type;          /*     0     4 */
  		__u32              insn_cnt;           /*     4     4 */
  		__u64              insns;              /*     8     8 */
  		__u64              license;            /*    16     8 */
  		__u32              log_level;          /*    24     4 */
  		__u32              log_size;           /*    28     4 */
  		__u64              log_buf;            /*    32     8 */
  		__u32              kern_version;       /*    40     4 */
  		__u32              prog_flags;         /*    44     4 */
  		char               prog_name[16];      /*    48    16 */
  		/* --- cacheline 1 boundary (64 bytes) --- */
  		__u32              prog_ifindex;       /*    64     4 */
  		__u32              expected_attach_type; /*    68     4 */
  		__u32              prog_btf_fd;        /*    72     4 */
  		__u32              func_info_rec_size; /*    76     4 */
  		__u64              func_info;          /*    80     8 */
  		__u32              func_info_cnt;      /*    88     4 */
  		__u32              line_info_rec_size; /*    92     4 */
  		__u64              line_info;          /*    96     8 */
  		__u32              line_info_cnt;      /*   104     4 */
  		__u32              attach_btf_id;      /*   108     4 */
  		union {
  			__u32      attach_prog_fd;     /*   112     4 */
  			__u32      attach_btf_obj_fd;  /*   112     4 */
  		};                                     /*   112     4 */
  		__u32              core_relo_cnt;      /*   116     4 */
  		__u64              fd_array;           /*   120     8 */
  		/* --- cacheline 2 boundary (128 bytes) --- */
  		__u64              core_relos;         /*   128     8 */
  		__u32              core_relo_rec_size; /*   136     4 */
  		__u32              log_true_size;      /*   140     4 */
  		__s32              prog_token_fd;      /*   144     4 */
  	};                                             /*     0   152 */
  	struct {
  		__u64              pathname;           /*     0     8 */
  		__u32              bpf_fd;             /*     8     4 */
  		__u32              file_flags;         /*    12     4 */
  		__s32              path_fd;            /*    16     4 */
  	};                                             /*     0    24 */
  	struct {
  		union {
  			__u32      target_fd;          /*     0     4 */
  			__u32      target_ifindex;     /*     0     4 */
  		};                                     /*     0     4 */
  		__u32              attach_bpf_fd;      /*     4     4 */
  		__u32              attach_type;        /*     8     4 */
  		__u32              attach_flags;       /*    12     4 */
  		__u32              replace_bpf_fd;     /*    16     4 */
  		union {
  			__u32      relative_fd;        /*    20     4 */
  			__u32      relative_id;        /*    20     4 */
  		};                                     /*    20     4 */
  		__u64              expected_revision;  /*    24     8 */
  	};                                             /*     0    32 */
  	struct {
  		__u32              prog_fd;            /*     0     4 */
  		__u32              retval;             /*     4     4 */
  		__u32              data_size_in;       /*     8     4 */
  		__u32              data_size_out;      /*    12     4 */
  		__u64              data_in;            /*    16     8 */
  		__u64              data_out;           /*    24     8 */
  		__u32              repeat;             /*    32     4 */
  		__u32              duration;           /*    36     4 */
  		__u32              ctx_size_in;        /*    40     4 */
  		__u32              ctx_size_out;       /*    44     4 */
  		__u64              ctx_in;             /*    48     8 */
  		__u64              ctx_out;            /*    56     8 */
  		/* --- cacheline 1 boundary (64 bytes) --- */
  		__u32              flags;              /*    64     4 */
  		__u32              cpu;                /*    68     4 */
  		__u32              batch_size;         /*    72     4 */
  	} test;                                        /*     0    80 */
  	struct {
  		union {
  			__u32      start_id;           /*     0     4 */
  			__u32      prog_id;            /*     0     4 */
  			__u32      map_id;             /*     0     4 */
  			__u32      btf_id;             /*     0     4 */
  			__u32      link_id;            /*     0     4 */
  		};                                     /*     0     4 */
  		__u32              next_id;            /*     4     4 */
  		__u32              open_flags;         /*     8     4 */
  	};                                             /*     0    12 */
  	struct {
  		__u32              bpf_fd;             /*     0     4 */
  		__u32              info_len;           /*     4     4 */
  		__u64              info;               /*     8     8 */
  	} info;                                        /*     0    16 */
  	struct {
  		union {
  			__u32      target_fd;          /*     0     4 */
  			__u32      target_ifindex;     /*     0     4 */
  		};                                     /*     0     4 */
  		__u32              attach_type;        /*     4     4 */
  		__u32              query_flags;        /*     8     4 */
  		__u32              attach_flags;       /*    12     4 */
  		__u64              prog_ids;           /*    16     8 */
  		union {
  			__u32      prog_cnt;           /*    24     4 */
  			__u32      count;              /*    24     4 */
  		};                                     /*    24     4 */

  		/* XXX 4 bytes hole, try to pack */

  		__u64              prog_attach_flags;  /*    32     8 */
  		__u64              link_ids;           /*    40     8 */
  		__u64              link_attach_flags;  /*    48     8 */
  		__u64              revision;           /*    56     8 */
  	} query;                                       /*     0    64 */
  	struct {
  		__u64              name;               /*     0     8 */
  		__u32              prog_fd;            /*     8     4 */

  		/* XXX 4 bytes hole, try to pack */

  		__u64              cookie;             /*    16     8 */
  	} raw_tracepoint;                              /*     0    24 */
  	struct {
  		__u64              btf;                /*     0     8 */
  		__u64              btf_log_buf;        /*     8     8 */
  		__u32              btf_size;           /*    16     4 */
  		__u32              btf_log_size;       /*    20     4 */
  		__u32              btf_log_level;      /*    24     4 */
  		__u32              btf_log_true_size;  /*    28     4 */
  		__u32              btf_flags;          /*    32     4 */
  		__s32              btf_token_fd;       /*    36     4 */
  	};                                             /*     0    40 */
  	struct {
  		__u32              pid;                /*     0     4 */
  		__u32              fd;                 /*     4     4 */
  		__u32              flags;              /*     8     4 */
  		__u32              buf_len;            /*    12     4 */
  		__u64              buf;                /*    16     8 */
  		__u32              prog_id;            /*    24     4 */
  		__u32              fd_type;            /*    28     4 */
  		__u64              probe_offset;       /*    32     8 */
  		__u64              probe_addr;         /*    40     8 */
  	} task_fd_query;                               /*     0    48 */
  	struct {
  		union {
  			__u32      prog_fd;            /*     0     4 */
  			__u32      map_fd;             /*     0     4 */
  		};                                     /*     0     4 */
  		union {
  			__u32      target_fd;          /*     4     4 */
  			__u32      target_ifindex;     /*     4     4 */
  		};                                     /*     4     4 */
  		__u32              attach_type;        /*     8     4 */
  		__u32              flags;              /*    12     4 */
  		union {
  			__u32      target_btf_id;      /*    16     4 */
  			struct {
  				__u64 iter_info;       /*    16     8 */
  				__u32 iter_info_len;   /*    24     4 */
  			};                             /*    16    16 */
  			struct {
  				__u64 bpf_cookie;      /*    16     8 */
  			} perf_event;                  /*    16     8 */
  			struct {
  				__u32 flags;           /*    16     4 */
  				__u32 cnt;             /*    20     4 */
  				__u64 syms;            /*    24     8 */
  				__u64 addrs;           /*    32     8 */
  				__u64 cookies;         /*    40     8 */
  			} kprobe_multi;                /*    16    32 */
  			struct {
  				__u32 target_btf_id;   /*    16     4 */

  				/* XXX 4 bytes hole, try to pack */

  				__u64 cookie;          /*    24     8 */
  			} tracing;                     /*    16    16 */
  			struct {
  				__u32 pf;              /*    16     4 */
  				__u32 hooknum;         /*    20     4 */
  				__s32 priority;        /*    24     4 */
  				__u32 flags;           /*    28     4 */
  			} netfilter;                   /*    16    16 */
  			struct {
  				union {
  					__u32  relative_fd; /*    16     4 */
  					__u32  relative_id; /*    16     4 */
  				};                     /*    16     4 */

  				/* XXX 4 bytes hole, try to pack */

  				__u64 expected_revision; /*    24     8 */
  			} tcx;                         /*    16    16 */
  			struct {
  				__u64 path;            /*    16     8 */
  				__u64 offsets;         /*    24     8 */
  				__u64 ref_ctr_offsets; /*    32     8 */
  				__u64 cookies;         /*    40     8 */
  				__u32 cnt;             /*    48     4 */
  				__u32 flags;           /*    52     4 */
  				__u32 pid;             /*    56     4 */
  			} uprobe_multi;                /*    16    48 */
  			struct {
  				union {
  					__u32  relative_fd; /*    16     4 */
  					__u32  relative_id; /*    16     4 */
  				};                     /*    16     4 */

  				/* XXX 4 bytes hole, try to pack */

  				__u64 expected_revision; /*    24     8 */
  			} netkit;                      /*    16    16 */
  		};                                     /*    16    48 */
  	} link_create;                                 /*     0    64 */
  	struct {
  		__u32              link_fd;            /*     0     4 */
  		union {
  			__u32      new_prog_fd;        /*     4     4 */
  			__u32      new_map_fd;         /*     4     4 */
  		};                                     /*     4     4 */
  		__u32              flags;              /*     8     4 */
  		union {
  			__u32      old_prog_fd;        /*    12     4 */
  			__u32      old_map_fd;         /*    12     4 */
  		};                                     /*    12     4 */
  	} link_update;                                 /*     0    16 */
  	struct {
  		__u32              link_fd;            /*     0     4 */
  	} link_detach;                                 /*     0     4 */
  	struct {
  		__u32              type;               /*     0     4 */
  	} enable_stats;                                /*     0     4 */
  	struct {
  		__u32              link_fd;            /*     0     4 */
  		__u32              flags;              /*     4     4 */
  	} iter_create;                                 /*     0     8 */
  	struct {
  		__u32              prog_fd;            /*     0     4 */
  		__u32              map_fd;             /*     4     4 */
  		__u32              flags;              /*     8     4 */
  	} prog_bind_map;                               /*     0    12 */
  	struct {
  		__u32              flags;              /*     0     4 */
  		__u32              bpffs_fd;           /*     4     4 */
  	} token_create;                                /*     0     8 */
  };

  root@number:~#

So this is one case where BTF gets us only that far, not getting all
the way to automate the pretty printing of unions designed like 'union
bpf_attr', we will need a custom pretty printer for this union, as using
the libbpf union BTF dumper is way too verbose:

  root@number:~# perf trace --max-events 1 -e bpf bpftool map
       0.000 ( 0.054 ms): bpftool/3409073 bpf(cmd: PROG_LOAD, uattr: (union bpf_attr){(struct){.map_type = (__u32)1,.key_size = (__u32)2,.value_size = (__u32)2755142048,.max_entries = (__u32)32764,.map_flags = (__u32)150263906,.inner_map_fd = (__u32)21920,},(struct){.map_fd = (__u32)1,.key = (__u64)140723063628192,(union){.value = (__u64)94145833392226,.next_key = (__u64)94145833392226,},},.batch = (struct){.in_batch = (__u64)8589934593,.out_batch = (__u64)140723063628192,.keys = (__u64)94145833392226,},(struct){.prog_type = (__u32)1,.insn_cnt = (__u32)2,.insns = (__u64)140723063628192,.license = (__u64)94145833392226,},(struct){.pathname = (__u64)8589934593,.bpf_fd = (__u32)2755142048,.file_flags = (__u32)32764,.path_fd = (__s32)150263906,},(struct){(union){.target_fd = (__u32)1,.target_ifindex = (__u32)1,},.attach_bpf_fd = (__u32)2,.attach_type = (__u32)2755142048,.attach_flags = (__u32)32764,.replace_bpf_fd = (__u32)150263906,(union){.relative_fd = (__u32)21920,.relative_id = (__u32)21920,},},.test = (struct){.prog_fd = (__u32)1,.retval = (__u32)2,.data_size_in = (__u32)2755142048,.data_size_out = (__u32)32764,.data_in = (__u64)94145833392226,},(struct){(union){.start_id = (__u32)1,.prog_id = (__u32)1,.map_id = (__u32)1,.btf_id = (__u32)1,.link_id = (__u32)1,},.next_id = (__u32)2,.open_flags = (__u32)2755142048,},.info = (struct){.bpf_fd = (__u32)1,.info_len = (__u32)2,.info = (__u64)140723063628192,},.query = (struct){(union){.target_fd = (__u32)1,.target_ifindex = (__u32)1,},.attach_type = (__u32)2,.query_flags = (__u32)2755142048,.attach_flags = (__u32)32764,.prog_ids = (__u64)94145833392226,},.raw_tracepoint = (struct){.name = (__u64)8589934593,.prog_fd = (__u32)2755142048,.cookie = (__u64)94145833392226,},(struct){.btf = (__u64)8589934593,.btf_log_buf = (__u64)140723063628192,.btf_size = (__u32)150263906,.btf_log_size = (__u32)21920,},.task_fd_query = (struct){.pid = (__u32)1,.fd = (__u32)2,.flags = (__u32)2755142048,.buf_len = (__u32)32764,.buf = (__u64)94145833392226,},.link_create = (struct){(union){.prog_fd = (__u32)1,.map_fd = (__u32)1,},(u) = 3
  root@number:~# 2: prog_array  name hid_jmp_table  flags 0x0
  	key 4B  value 4B  max_entries 1024  memlock 8440B
  	owner_prog_type tracing  owner jited
  13: hash_of_maps  name cgroup_hash  flags 0x0
  	key 8B  value 4B  max_entries 2048  memlock 167584B
  	pids systemd(1)
  960: array  name libbpf_global  flags 0x0
  	key 4B  value 32B  max_entries 1  memlock 280B
  961: array  name pid_iter.rodata  flags 0x480
  	key 4B  value 4B  max_entries 1  memlock 8192B
  	btf_id 1846  frozen
  	pids bpftool(3409073)
  962: array  name libbpf_det_bind  flags 0x0
  	key 4B  value 32B  max_entries 1  memlock 280B

  root@number:~#

For simpler unions this may be better than not seeing any payload, so
keep it there.

Acked-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuBLat8cbadILNLA@x1
[ Removed needless parenteses in the if block leading to the trace__btf_scnprintf() call, as per Howard's review comments ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:31:51 -03:00
Kuan-Wei Chiu
f04e2ad394 bpftool: Fix undefined behavior in qsort(NULL, 0, ...)
When netfilter has no entry to display, qsort is called with
qsort(NULL, 0, ...). This results in undefined behavior, as UBSan
reports:

net.c:827:2: runtime error: null pointer passed as argument 1, which is declared to never be null

Although the C standard does not explicitly state whether calling qsort
with a NULL pointer when the size is 0 constitutes undefined behavior,
Section 7.1.4 of the C standard (Use of library functions) mentions:

"Each of the following statements applies unless explicitly stated
otherwise in the detailed descriptions that follow: If an argument to a
function has an invalid value (such as a value outside the domain of
the function, or a pointer outside the address space of the program, or
a null pointer, or a pointer to non-modifiable storage when the
corresponding parameter is not const-qualified) or a type (after
promotion) not expected by a function with variable number of
arguments, the behavior is undefined."

To avoid this, add an early return when nf_link_info is NULL to prevent
calling qsort with a NULL pointer.

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240910150207.3179306-1-visitorckw@gmail.com
2024-09-10 11:40:55 -07:00
Jiri Olsa
8c8b475974 libbpf: Fix uretprobe.multi.s programs auto attachment
As reported by Andrii we don't currently recognize uretprobe.multi.s
programs as return probes due to using (wrong) strcmp function.

Using str_has_pfx() instead to match uretprobe.multi prefix.

Tests are passing, because the return program was executed
as entry program and all counts were incremented properly.

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240910125336.3056271-1-jolsa@kernel.org
2024-09-10 11:35:13 -07:00
Rafael J. Wysocki
ffa1f26d3d linux-cpupower-6.12-rc1-2
This cpupower second update for Linux 6.12-rc1 consists of a fix
 and a new feature.
 
 -- adds missing powercap_set_enabled() stub function
 -- adds SWIG bindings files for libcpupower
 
 SWIG is a tool packaged in Fedora and other distros that can generate
 bindings from C and C++ code for several languages including Python,
 Perl, and Go.
 
 These bindings allows users to easily write scripts that use and extend
 libcpupower's functionality. Currently, only Python is provided in the
 makefile, but additional languages may be added if there is demand.
 
 Note that while SWIG itself is GPL v3+ licensed; the resulting output,
 the bindings code, is permissively licensed + the license of the .o
 files. Please see the following for more details.
 
 - https://swig.org/legal.html.
 - https://lore.kernel.org/linux-pm/Zqv9BOjxLAgyNP5B@hatbackup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmbgg2gACgkQCwJExA0N
 QxxNPw//b3viBhRCf4Ir0nq/3buJtLr6EWKmhEp/nbsYomCeCCttb6JQBgoLr91T
 iemjU8Hy6RvhFPM/Pqj2CbdriG123F7rN9WinZL4Un855heSZM38CKQxZAtvfRc3
 +lMYolsoJYk+JiumNDU+UZNsUoAI4x8JHYh2Q1XO0VhkFsfre7pfEEQZ70/egM/z
 esqMB0GpFmtJOfHSLu/vzwseIjQfHCTcf94cTDysFjV4NzmNix43yTBX53yKCyan
 sVR6S9rXUlFwr042fA9R63vlKVHnc1RdmQvUbaCt488rYgIzKhE4eJICRZrM4WEL
 qudhvMHeqjd5uukTBGCldtYWy/9TTRv0BB8a3lvffoligTkOQ2k/WDpazgFRayuE
 zngzGKgpNyF7NGG40bGx+jooyw/ilrQ6qIEOXUP3Hr58h88yi5re7yLcleBpPvmd
 qZIBkpDeiLNCidDrInGWwiwMEYvnwPa6KmC1hNeAFsd8zQ/9H3VrVyMHAe3NXqj5
 JYtx+5DgZ1EXYRKj2bom0ydSNPjfEAAu+wxhQzdBugYtQxt1aeR5nYYpAbZ6Fz9L
 59XON1FQzpfD6k6G3389fkFQ5St5HAdVabEcBgMYYS1yvSxqi3pJMNDOiJH/aBVe
 poFEUNL20ZWLaKq2EyFuUOSOFf96KUO2rOvYS4zl/u3QtliilsA=
 =1xxI
 -----END PGP SIGNATURE-----

Merge tag 'linux-cpupower-6.12-rc1-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shuah/linux

Merge the second round of cpupower utility updates for 6.12-rc1 from
Shuah Khan:

"This cpupower second update for Linux 6.12-rc1 consists of a fix
 and a new feature.

 -- adds missing powercap_set_enabled() stub function
 -- adds SWIG bindings files for libcpupower

 SWIG is a tool packaged in Fedora and other distros that can generate
 bindings from C and C++ code for several languages including Python,
 Perl, and Go.

 These bindings allows users to easily write scripts that use and extend
 libcpupower's functionality. Currently, only Python is provided in the
 makefile, but additional languages may be added if there is demand.

 Note that while SWIG itself is GPL v3+ licensed; the resulting output,
 the bindings code, is permissively licensed + the license of the .o
 files. Please see the following for more details.

 - https://swig.org/legal.html.
 - https://lore.kernel.org/linux-pm/Zqv9BOjxLAgyNP5B@hatbackup"

* tag 'linux-cpupower-6.12-rc1-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
  pm:cpupower: Add error warning when SWIG is not installed
  MAINTAINERS: Add Maintainers for SWIG Python bindings
  pm:cpupower: Include test_raw_pylibcpupower.py
  pm:cpupower: Add SWIG bindings files for libcpupower
  pm:cpupower: Add missing powercap_set_enabled() stub function
2024-09-10 19:59:16 +02:00
Howard Chu
3278024540 perf trace: Add --force-btf for debugging
If --force-btf is enabled, prefer btf_dump general pretty printer to
perf trace's customized pretty printers.

Mostly for debug purposes.

Committer testing:

diff before/after shows we need several improvements to be able to
compare the changes, first we need to cut off/disable mutable data such
as pids and timestamps, then what is left are the buffer addresses
passed from userspace, returned from kernel space, maybe we can ask
'perf trace' to go on making those reproducible.

That would entail a Pointer Address Translation (PAT) like for
networking, that would, for simple, reproducible if not for these
details, workloads, that we would then use in our regression tests.

Enough digression, this is one such diff:

   openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) = 3
  -fstat(fd: 3, statbuf: 0x7fff01f212a0)                                 = 0
  -read(fd: 3, buf: 0x5596bab2d630, count: 4096)                         = 2998
  -read(fd: 3, buf: 0x5596bab2d630, count: 4096)                         = 0
  +fstat(fd: 3, statbuf: 0x7ffc163cf0e0)                                 = 0
  +read(fd: 3, buf: 0x55b4e0631630, count: 4096)                         = 2998
  +read(fd: 3, buf: 0x55b4e0631630, count: 4096)                         = 0
   close(fd: 3)                                                          = 0
   openat(dfd: CWD, filename: "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
   openat(dfd: CWD, filename: "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
  @@ -45,7 +45,7 @@
   openat(dfd: CWD, filename: "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
   openat(dfd: CWD, filename: "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
   openat(dfd: CWD, filename: "/usr/share/locale/en/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
  -{ .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7fff01f21990) = 0
  +(struct __kernel_timespec){.tv_sec = (__kernel_time64_t)1,}, rmtp: 0x7ffc163cf7d0) =

The problem more close to our hands is to make the libbpf BTF pretty
printer to have a mode that closely resembles what we're trying to
resemble: strace output.

Being able to run something with 'perf trace' and with 'strace' and get
the exact same output should be of interest of anybody wanting to have
strace and 'perf trace' regression tested against each other.

That last part is 'perf trace' shot at being something so useful as
strace... ;-)

Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240824163322.60796-8-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:27 -03:00
Howard Chu
a68fd6a6cd perf trace: Collect augmented data using BPF
Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads
TRACE_AUG_MAX_BUF bytes of buffer maximum.

Determine what type of argument and how many bytes to read from user space, us ing the
value in the beauty_map. This is the relation of parameter type and its corres ponding
value in the beauty map, and how many bytes we read eventually:

string: 1                          -> size of string (till null)
struct: size of struct             -> size of struct
buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF)

After reading from user space, we output the augmented data using
bpf_perf_event_output().

If the struct augmenter, augment_sys_enter() failed, we fall back to
using bpf_tail_call().

I have to make the payload 6 times the size of augmented_arg, to pass the
BPF verifier.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-10-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-7-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:20 -03:00
Howard Chu
b257fac12f perf trace: Pretty print buffer data
Define TRACE_AUG_MAX_BUF in trace_augment.h data, which is the maximum
buffer size we can augment. BPF will include this header too.

Print buffer in a way that's different than just printing a string, we
print all the control characters in \digits (such as \0 for null, and
\10 for newline, LF).

For character that has a bigger value than 127, we print the digits
instead of the character itself as well.

Committer notes:

Simplified the buffer scnprintf to avoid using multiple buffers as
discussed in the patch review thread.

We can't really all 'buf' args to SCA_BUF as we're collecting so far
just on the sys_enter path, so we would be printing the previous 'read'
arg buffer contents, not what the kernel puts there.

So instead of:
   static int syscall_fmt__cmp(const void *name, const void *fmtp)
  @@ -1987,8 +1989,6 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
  -               else if (strstr(field->type, "char *") && strstr(field->name, "buf"))
  -                       arg->scnprintf = SCA_BUF;

Do:

static const struct syscall_fmt syscall_fmts[] = {
  +       { .name     = "write",      .errpid = true,
  +         .arg = { [1] = { .scnprintf = SCA_BUF /* buf */, from_user = true, }, }, },

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-8-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-6-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:13 -03:00
Howard Chu
cb32035214 perf trace: Pretty print struct data
Change the arg->augmented.args to arg->augmented.args->value to skip the
header for customized pretty printers, since we collect data in BPF
using the general augment_sys_enter(), which always adds the header.

Use btf_dump API to pretty print augmented struct pointer.

Prefer existed pretty-printer than btf general pretty-printer.

set compact = true and skip_names = true, so that no newline character
and argument name are printed.

Committer notes:

Simplified the btf_dump_snprintf callback to avoid using multiple
buffers, as discussed in the thread accessible via the Link tag below.

Also made it do:

  dump_data_opts.skip_names = !arg->trace->show_arg_names;

I.e. show the type and struct field names according to that tunable, we
probably need another tunable just for this, but for now if the user
wants to see syscall names in addition to its value, it makes sense to
see the struct field names according to that tunable.

Committer testing:

The following have explicitely set beautifiers (SCA_FILENAME,
SCA_SOCKADDR and SCA_PERF_ATTR), SCA_FILENAME is here just because we
have been wiring up the "renameat2" ("renameat" until recently), so it
doesn't use the introduced generic fallback (btf_struct_scnprintf(), see
the definition of SCA_PERF_ATTR, SCA_SOCKADDR to see the more feature
rich beautifiers, that are not using BTF):

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
       0.000 ( 0.039 ms): mv/258478 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
       0.000 ( 0.014 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
       0.040 ( 0.003 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a6980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
      18.742 ( 0.020 ms): ping/258481 sendto(fd: 5, buff: 0x7ffc04768df0, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20
  PING www.google.com (142.251.129.68) 56(84) bytes of data.
      18.783 ( 0.012 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0
      18.797 ( 0.001 ms): ping/258481 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0
      18.815 ( 0.002 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.129.68 }, addrlen: 16) = 0
      18.862 ( 0.023 ms): ping/258481 sendto(fd: 3, buff: 0x55bc317a0ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.129.68 }, addr_len: 0x10) = 64
      63.330 ( 0.038 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
      63.435 ( 0.010 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a8340, len: 110, flags: DONTWAIT|NOSIGNAL) = 110
  64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.2 ms

  --- www.google.com ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 44.158/44.158/44.158/0.000 ms
  root@number:~# perf trace -e perf_event_open perf stat -e instructions,cache-misses,syscalls:sys_enter*sleep* sleep 1.23456789
       0.000 ( 0.010 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0xa00000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599052088, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
       0.016 ( 0.002 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0x400000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599044082, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
       1.838 ( 0.006 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5
       1.846 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 6
       1.849 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 7
       1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9
       1.853 ( 0.600 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x190 (syscalls:sys_enter_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 10
       2.456 ( 0.016 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x196 (syscalls:sys_enter_clock_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 11

   Performance counter stats for 'sleep 1.23456789':

           1,402,839      cpu_atom/instructions/
       <not counted>      cpu_core/instructions/                                                  (0.00%)
              11,066      cpu_atom/cache-misses/
       <not counted>      cpu_core/cache-misses/                                                  (0.00%)
                   0      syscalls:sys_enter_nanosleep
                   1      syscalls:sys_enter_clock_nanosleep

         1.236246714 seconds time elapsed

         0.000000000 seconds user
         0.001308000 seconds sys

  root@number:~#

Now if we use it even for the ones we have a specific beautifier in
tools/perf/trace/beauty, i.e. use btf_struct_scnprintf() for all
structs, by adding the following patch:

  @@ -2316,7 +2316,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,

   			default_scnprintf = sc->arg_fmt[arg.idx].scnprintf;

  -			if (default_scnprintf == NULL || default_scnprintf == SCA_PTR) {
  +			if (1 || (default_scnprintf == NULL || default_scnprintf == SCA_PTR)) {
   				btf_printed = trace__btf_scnprintf(trace, &arg, bf + printed,
   								   size - printed, val, field->type);
   				if (btf_printed) {

We get:

  root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
  PING www.google.com (142.251.129.68) 56(84) bytes of data.
       0.000 ( 0.015 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0
       0.046 ( 0.004 ms): ping/283259 sendto(fd: 5, buff: 0x559b008ae980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
       0.353 ( 0.012 ms): ping/283259 sendto(fd: 5, buff: 0x7ffc01294960, len: 20, addr: (struct sockaddr){.sa_family = (sa_family_t)16,}, addr_len: 0xc) = 20
       0.377 ( 0.006 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addrlen: 16) = 0
       0.388 ( 0.010 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)10,}, addrlen: 28) = 0
       0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0
       0.425 ( 0.045 ms): ping/283259 sendto(fd: 3, buff: 0x559b008a8ac0, len: 64, addr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addr_len: 0x10) = 64
  64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.1 ms

  --- www.google.com ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 44.113/44.113/44.113/0.000 ms
      44.849 ( 0.038 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0
      44.927 ( 0.006 ms): ping/283259 sendto(fd: 5, buff: 0x559b008b03d0, len: 110, flags: DONTWAIT|NOSIGNAL) = 110
  root@number:~#

Which looks sane, i.e.:

  18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0

Becomes:

   0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0

And.

  #define AF_UNIX         1       /* Unix domain sockets          */
  #define AF_LOCAL        1       /* POSIX name for AF_UNIX       */
  #define AF_INET         2       /* Internet IP Protocol         */
  <SNIP>
  #define AF_INET6        10      /* IP version 6                 */

And 'D' == 68, so the preexisting sockaddr BPF collector is working with
the new generic BTF pretty printer (btf_struct_scnprintf()), its just
that it doesn't know about 'struct sockaddr' besides what is in BTF,
i.e. its an array of bytes, not an IPv4 address that needs extra
massaging.

Ditto for the 'struct perf_event_attr' case:

       1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9

Becomes:

       2.081 ( 0.002 ms): :283304/283304 perf_event_open(attr_uptr: (struct perf_event_attr){.size = (__u32)136,.config = (__u64)17179869187,.sample_type = (__u64)65536,.read_format = (__u64)3,.disabled = (__u64)0x1,.inherit = (__u64)0x1,.enable_on_exec = (__u64)0x1,.exclude_guest = (__u64)0x1,}, pid: 283305 (sleep), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9

hex(17179869187) = 0x400000003, etc.

read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING is

enum perf_event_read_format {
        PERF_FORMAT_TOTAL_TIME_ENABLED          = 1U << 0,
        PERF_FORMAT_TOTAL_TIME_RUNNING          = 1U << 1,

and so on.

We need to work with the libbpf btf dump api to get one output that
matches the 'perf trace'/strace expectations/format, but having this in
this current form is already an improvement to 'perf trace', so lets
improve from what we have.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-7-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-5-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:07 -03:00
Howard Chu
7f40306728 perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
Set up beauty_map, load it to BPF, in such format: if argument No.3 is a
struct of size 32 bytes (of syscall number 114) beauty_map[114][2] = 32;

if argument No.3 is a string (of syscall number 114) beauty_map[114][2] =
1;

if argument No.3 is a buffer, its size is indicated by argument No.4 (of
syscall number 114) beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this
buffer size in BPF  */

Committer notes:

Moved syscall_arg_fmt__cache_btf_struct() from a ifdef
HAVE_LIBBPF_SUPPORT to closer to where it is used, that is ifdef'ed on
HAVE_BPF_SKEL and thus breaks the build when building with
BUILD_BPF_SKEL=0, as detected using 'make -C tools/perf build-test'.

Also add 'struct beauty_map_enter' to tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
as we're using it in this patch, otherwise we get this while trying to
build at this point in the original patch series:

  builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’:
  builtin-trace.c:3725:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’
   3725 |         int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
        |

We also have to take into account syscall_arg_fmt.from_user when telling
the kernel what to copy in the sys_enter generic collector, we don't
want to collect bogus data in buffers that will only be available to us
at sys_exit time, i.e. after the kernel has filled it, so leave this for
when we have such a sys_exit based collector.

Committer testing:

Not wired up yet, so all continues to work, using the existing BPF
collector and userspace beautifiers that are augmentation aware:

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
       0.000 ( 0.031 ms): mv/20888 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
       0.000 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
       0.040 ( 0.003 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff17980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
       0.480 ( 0.017 ms): ping/20892 sendto(fd: 5, buff: 0x7ffd82d07150, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20
       0.526 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0
       0.542 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
       0.544 ( 0.004 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.135.100 }, addrlen: 16) = 0
       0.559 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.135.100 }, addrlen: 16PING www.google.com (142.251.135.100) 56(84) bytes of data.
  ) = 0
       0.589 ( 0.058 ms): ping/20892 sendto(fd: 3, buff: 0x560b4ff11ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.135.100 }, addr_len: 0x10) = 64
      45.250 ( 0.029 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
      45.344 ( 0.012 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff19340, len: 111, flags: DONTWAIT|NOSIGNAL) = 111
  64 bytes from rio09s08-in-f4.1e100.net (142.251.135.100): icmp_seq=1 ttl=49 time=44.4 ms

  --- www.google.com ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 44.361/44.361/44.361/0.000 ms
  root@number:~#

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-4-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-3-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:51:59 -03:00
Arnaldo Carvalho de Melo
d92f490cba perf trace: Mark bpf's attr as from_user
This one has no specific pretty printer right now, so will be handled by
the generic BTF based one later in this patch series.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:51:51 -03:00
Thomas Gleixner
2f7eedca6c Merge branch 'linus' into timers/core
To update with the latest fixes.
2024-09-10 13:49:53 +02:00
Zhu Jun
a8927f69e8 tools/virtio:Fix the wrong format specifier
The unsigned int should use "%u" instead of "%d".

Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com>
Message-Id: <20240724074108.9530-1-zhujun2@cmss.chinamobile.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
2024-09-10 02:51:48 -04:00
Sean Christopherson
c32e028057 KVM: selftests: Verify single-stepping a fastpath VM-Exit exits to userspace
In x86's debug_regs test, change the RDMSR(MISC_ENABLES) in the single-step
testcase to a WRMSR(TSC_DEADLINE) in order to verify that KVM honors
KVM_GUESTDBG_SINGLESTEP when handling a fastpath VM-Exit.

Note, the extra coverage is effectively Intel-only, as KVM only handles
TSC_DEADLINE in the fastpath when the timer is emulated via the hypervisor
timer, a.k.a. the VMX preemption timer.

Link: https://lore.kernel.org/r/20240830044448.130449-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09 20:12:12 -07:00
Willem de Bruijn
8a405552fd selftests/net: integrate packetdrill with ksft
Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.

Florian recently added support for packetdrill tests in nf_conntrack,
in commit a8a388c2aa ("selftests: netfilter: add packetdrill based
conntrack tests").

This patch takes a slightly different approach. It relies on
ksft_runner.sh to run every *.pkt file in the directory.

Any future imports of packetdrill tests should require no additional
coding. Just add the *.pkt files.

Initially import only two features/directories from github. One with a
single script, and one with two. This was the only reason to pick
tcp/inq and tcp/md5.

The path replaces the directory hierarchy in github with a flat space
of files: $(subst /,_,$(wildcard tcp/**/*.pkt)). This is the most
straightforward option to integrate with kselftests. The Linked thread
reviewed two ways to maintain the hierarchy: TEST_PROGS_RECURSE and
PRESERVE_TEST_DIRS. But both introduce significant changes to
kselftest infra and with that risk to existing tests.

Implementation notes:
- restore alphabetical order when adding the new directory to
  tools/testing/selftests/Makefile
- imported *.pkt files and support verbatim from the github project,
  except for
    - update `source ./defaults.sh` path (to adjust for flat dir)
    - add SPDX headers
    - remove one author statement
    - Acknowledgment: drop an e (checkpatch)

Tested:
	make -C tools/testing/selftests \
	  TARGETS=net/packetdrill \
	  run_tests

	make -C tools/testing/selftests \
	  TARGETS=net/packetdrill \
	  install INSTALL_PATH=$KSFT_INSTALL_PATH

	# in virtme-ng
	./run_kselftest.sh -c net/packetdrill
	./run_kselftest.sh -t net/packetdrill:tcp_inq_client.pkt

Link: https://lore.kernel.org/netdev/20240827193417.2792223-1-willemdebruijn.kernel@gmail.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09 17:38:02 -07:00
Willem de Bruijn
dbd61921a6 selftests: support interpreted scripts with ksft_runner.sh
Support testcases that are themselves not executable, but need an
interpreter to run them.

If a test file is not executable, but an executable file
ksft_runner.sh exists in the TARGET dir, kselftest will run

    ./ksft_runner.sh ./$BASENAME_TEST

Packetdrill may add hundreds of packetdrill scripts for testing. These
scripts must be passed to the packetdrill process.

Have kselftest run each test directly, as it already solves common
runner requirements like parallel execution and isolation (netns).
A previous RFC added a wrapper in between, which would have to
reimplement such functionality.

Link: https://lore.kernel.org/netdev/66d4d97a4cac_3df182941a@willemb.c.googlers.com.notmuch/T/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09 17:38:02 -07:00
Kuniyuki Iwashima
5aa57d9f2d af_unix: Don't return OOB skb in manage_oob().
syzbot reported use-after-free in unix_stream_recv_urg(). [0]

The scenario is

  1. send(MSG_OOB)
  2. recv(MSG_OOB)
     -> The consumed OOB remains in recv queue
  3. send(MSG_OOB)
  4. recv()
     -> manage_oob() returns the next skb of the consumed OOB
     -> This is also OOB, but unix_sk(sk)->oob_skb is not cleared
  5. recv(MSG_OOB)
     -> unix_sk(sk)->oob_skb is used but already freed

The recent commit 8594d9b85c ("af_unix: Don't call skb_get() for OOB
skb.") uncovered the issue.

If the OOB skb is consumed and the next skb is peeked in manage_oob(),
we still need to check if the skb is OOB.

Let's do so by falling back to the following checks in manage_oob()
and add the test case in selftest.

Note that we need to add a similar check for SIOCATMARK.

[0]:
BUG: KASAN: slab-use-after-free in unix_stream_read_actor+0xa6/0xb0 net/unix/af_unix.c:2959
Read of size 4 at addr ffff8880326abcc4 by task syz-executor178/5235

CPU: 0 UID: 0 PID: 5235 Comm: syz-executor178 Not tainted 6.11.0-rc5-syzkaller-00742-gfbdaffe41adc #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 unix_stream_read_actor+0xa6/0xb0 net/unix/af_unix.c:2959
 unix_stream_recv_urg+0x1df/0x320 net/unix/af_unix.c:2640
 unix_stream_read_generic+0x2456/0x2520 net/unix/af_unix.c:2778
 unix_stream_recvmsg+0x22b/0x2c0 net/unix/af_unix.c:2996
 sock_recvmsg_nosec net/socket.c:1046 [inline]
 sock_recvmsg+0x22f/0x280 net/socket.c:1068
 ____sys_recvmsg+0x1db/0x470 net/socket.c:2816
 ___sys_recvmsg net/socket.c:2858 [inline]
 __sys_recvmsg+0x2f0/0x3e0 net/socket.c:2888
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5360d6b4e9
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff29b3a458 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 00007fff29b3a638 RCX: 00007f5360d6b4e9
RDX: 0000000000002001 RSI: 0000000020000640 RDI: 0000000000000003
RBP: 00007f5360dde610 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007fff29b3a628 R14: 0000000000000001 R15: 0000000000000001
 </TASK>

Allocated by task 5235:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
 unpoison_slab_object mm/kasan/common.c:312 [inline]
 __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
 kasan_slab_alloc include/linux/kasan.h:201 [inline]
 slab_post_alloc_hook mm/slub.c:3988 [inline]
 slab_alloc_node mm/slub.c:4037 [inline]
 kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4080
 __alloc_skb+0x1c3/0x440 net/core/skbuff.c:667
 alloc_skb include/linux/skbuff.h:1320 [inline]
 alloc_skb_with_frags+0xc3/0x770 net/core/skbuff.c:6528
 sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2815
 sock_alloc_send_skb include/net/sock.h:1778 [inline]
 queue_oob+0x108/0x680 net/unix/af_unix.c:2198
 unix_stream_sendmsg+0xd24/0xf80 net/unix/af_unix.c:2351
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:745
 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
 ___sys_sendmsg net/socket.c:2651 [inline]
 __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 5235:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
 kasan_slab_free include/linux/kasan.h:184 [inline]
 slab_free_hook mm/slub.c:2252 [inline]
 slab_free mm/slub.c:4473 [inline]
 kmem_cache_free+0x145/0x350 mm/slub.c:4548
 unix_stream_read_generic+0x1ef6/0x2520 net/unix/af_unix.c:2917
 unix_stream_recvmsg+0x22b/0x2c0 net/unix/af_unix.c:2996
 sock_recvmsg_nosec net/socket.c:1046 [inline]
 sock_recvmsg+0x22f/0x280 net/socket.c:1068
 __sys_recvfrom+0x256/0x3e0 net/socket.c:2255
 __do_sys_recvfrom net/socket.c:2273 [inline]
 __se_sys_recvfrom net/socket.c:2269 [inline]
 __x64_sys_recvfrom+0xde/0x100 net/socket.c:2269
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff8880326abc80
 which belongs to the cache skbuff_head_cache of size 240
The buggy address is located 68 bytes inside of
 freed 240-byte region [ffff8880326abc80, ffff8880326abd70)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x326ab
ksm flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xfdffffff(slab)
raw: 00fff00000000000 ffff88801eaee780 ffffea0000b7dc80 dead000000000003
raw: 0000000000000000 00000000800c000c 00000001fdffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 4686, tgid 4686 (udevadm), ts 32357469485, free_ts 28829011109
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1493
 prep_new_page mm/page_alloc.c:1501 [inline]
 get_page_from_freelist+0x2e4c/0x2f10 mm/page_alloc.c:3439
 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4695
 __alloc_pages_node_noprof include/linux/gfp.h:269 [inline]
 alloc_pages_node_noprof include/linux/gfp.h:296 [inline]
 alloc_slab_page+0x5f/0x120 mm/slub.c:2321
 allocate_slab+0x5a/0x2f0 mm/slub.c:2484
 new_slab mm/slub.c:2537 [inline]
 ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3723
 __slab_alloc+0x58/0xa0 mm/slub.c:3813
 __slab_alloc_node mm/slub.c:3866 [inline]
 slab_alloc_node mm/slub.c:4025 [inline]
 kmem_cache_alloc_node_noprof+0x1fe/0x320 mm/slub.c:4080
 __alloc_skb+0x1c3/0x440 net/core/skbuff.c:667
 alloc_skb include/linux/skbuff.h:1320 [inline]
 alloc_uevent_skb+0x74/0x230 lib/kobject_uevent.c:289
 uevent_net_broadcast_untagged lib/kobject_uevent.c:326 [inline]
 kobject_uevent_net_broadcast+0x2fd/0x580 lib/kobject_uevent.c:410
 kobject_uevent_env+0x57d/0x8e0 lib/kobject_uevent.c:608
 kobject_synth_uevent+0x4ef/0xae0 lib/kobject_uevent.c:207
 uevent_store+0x4b/0x70 drivers/base/bus.c:633
 kernfs_fop_write_iter+0x3a1/0x500 fs/kernfs/file.c:334
 new_sync_write fs/read_write.c:497 [inline]
 vfs_write+0xa72/0xc90 fs/read_write.c:590
page last free pid 1 tgid 1 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1094 [inline]
 free_unref_page+0xd22/0xea0 mm/page_alloc.c:2612
 kasan_depopulate_vmalloc_pte+0x74/0x90 mm/kasan/shadow.c:408
 apply_to_pte_range mm/memory.c:2797 [inline]
 apply_to_pmd_range mm/memory.c:2841 [inline]
 apply_to_pud_range mm/memory.c:2877 [inline]
 apply_to_p4d_range mm/memory.c:2913 [inline]
 __apply_to_page_range+0x8a8/0xe50 mm/memory.c:2947
 kasan_release_vmalloc+0x9a/0xb0 mm/kasan/shadow.c:525
 purge_vmap_node+0x3e3/0x770 mm/vmalloc.c:2208
 __purge_vmap_area_lazy+0x708/0xae0 mm/vmalloc.c:2290
 _vm_unmap_aliases+0x79d/0x840 mm/vmalloc.c:2885
 change_page_attr_set_clr+0x2fe/0xdb0 arch/x86/mm/pat/set_memory.c:1881
 change_page_attr_set arch/x86/mm/pat/set_memory.c:1922 [inline]
 set_memory_nx+0xf2/0x130 arch/x86/mm/pat/set_memory.c:2110
 free_init_pages arch/x86/mm/init.c:924 [inline]
 free_kernel_image_pages arch/x86/mm/init.c:943 [inline]
 free_initmem+0x79/0x110 arch/x86/mm/init.c:970
 kernel_init+0x31/0x2b0 init/main.c:1476
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Memory state around the buggy address:
 ffff8880326abb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880326abc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff8880326abc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff8880326abd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
 ffff8880326abd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb

Fixes: 93c99f21db ("af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.")
Reported-by: syzbot+8811381d455e3e9ec788@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8811381d455e3e9ec788
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240905193240.17565-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09 17:14:27 -07:00
Matthieu Baerts (NGI0)
a92d1db0c9 selftests: mptcp: connect: remove duplicated spaces in TAP output
It is nice to have a visual alignment in the test output to present the
different results, but it makes less sense in the TAP output that is
there for computers.

It sounds then better to remove the duplicated whitespaces in the TAP
output, also because it can cause some issues with TAP parsers expecting
only one space around the directive delimiter (#).

While at it, change the variable name (result_msg) to something more
explicit.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-5-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09 16:52:04 -07:00
Matthieu Baerts (NGI0)
a5b6be42aa selftests: mptcp: diag: remove trailing whitespace
It doesn't need to be there, and it can cause some issues with TAP
parsers expecting only one space around the directive delimiter (#).

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-4-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09 16:52:04 -07:00
Matthieu Baerts (NGI0)
d4e192728e selftests: mptcp: reset the last TS before the first test
Just to slightly improve the precision of the duration of the first
test.

In mptcp_join.sh, the last append_prev_results is now done as soon as
the last test is over: this will add the last result in the list, and
get a more precise time for this last test.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-3-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09 16:52:04 -07:00
Matthieu Baerts (NGI0)
1a38cee4bb selftests: mptcp: connect: remote time in TAP output
It is now added by the MPTCP lib automatically, see the parent commit.

The time in the TAP output might be slightly different from the one
displayed before, but that's OK.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-2-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09 16:52:04 -07:00
Matthieu Baerts (NGI0)
f58817c852 selftests: mptcp: lib: add time per subtests in TAP output
It adds 'time=<N>ms' in the diagnostic data of the TAP output, e.g.

  ok 1 - pm_netlink: defaults addr list # time=9ms

This addition is useful to quickly identify which subtests are taking a
longer time than the others, or more than expected.

Note that there are no specific formats to follow to show this time
according to the TAP 13 [1], TAP 14 [2] and KTAP [3] specifications.
Let's then define this one here.

Link: https://testanything.org/tap-version-13-specification.html [1]
Link: https://testanything.org/tap-version-14-specification.html [2]
Link: https://docs.kernel.org/dev-tools/ktap.html [3]
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-1-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09 16:52:04 -07:00
zhangjiao
0aa75a2b3f tools/mm: rm thp_swap_allocator_test when make clean
rm thp_swap_allocator_test when make clean

Link: https://lkml.kernel.org/r/20240829042008.6937-1-zhangjiao2@cmss.chinamobile.com
Signed-off-by: zhangjiao <zhangjiao2@cmss.chinamobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:47:41 -07:00
Tejun Heo
2d285d5615 scx_qmap: Implement highpri boosting
Implement a silly boosting mechanism for nice -20 tasks. The only purpose is
demonstrating and testing scx_bpf_dispatch_from_dsq(). The boosting only
works within SHARED_DSQ and makes only minor differences with increased
dispatch batch (-b).

This exercises moving tasks to a user DSQ and all local DSQs from
ops.dispatch() and BPF timerfn.

v2: - Updated to use scx_bpf_dispatch_from_dsq_set_{slice|vtime}().

    - Drop the workaround for the iterated tasks not being trusted by the
      verifier. The issue is fixed from BPF side.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
Cc: David Vernet <void@manifault.com>
Cc: Changwoo Min <multics69@gmail.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
2024-09-09 13:42:47 -10:00
Tejun Heo
4c30f5ce4f sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()
Once a task is put into a DSQ, the allowed operations are fairly limited.
Tasks in the built-in local and global DSQs are executed automatically and,
ignoring dequeue, there is only one way a task in a user DSQ can be
manipulated - scx_bpf_consume() moves the first task to the dispatching
local DSQ. This inflexibility sometimes gets in the way and is an area where
multiple feature requests have been made.

Implement scx_bpf_dispatch[_vtime]_from_dsq(), which can be called during
DSQ iteration and can move the task to any DSQ - local DSQs, global DSQ and
user DSQs. The kfuncs can be called from ops.dispatch() and any BPF context
which dosen't hold a rq lock including BPF timers and SYSCALL programs.

This is an expansion of an earlier patch which only allowed moving into the
dispatching local DSQ:

  http://lkml.kernel.org/r/Zn4Cw4FDTmvXnhaf@slm.duckdns.org

v2: Remove @slice and @vtime from scx_bpf_dispatch_from_dsq[_vtime]() as
    they push scx_bpf_dispatch_from_dsq_vtime() over the kfunc argument
    count limit and often won't be needed anyway. Instead provide
    scx_bpf_dispatch_from_dsq_set_{slice|vtime}() kfuncs which can be called
    only when needed and override the specified parameter for the subsequent
    dispatch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
Cc: David Vernet <void@manifault.com>
Cc: Changwoo Min <multics69@gmail.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
2024-09-09 13:42:47 -10:00
Jason Xing
a7e387375f selftests: return failure when timestamps can't be reported
When I was trying to modify the tx timestamping feature, I found that
running "./txtimestamp -4 -C -L 127.0.0.1" didn't reflect the error:
I succeeded to generate timestamp stored in the skb but later failed
to report it to the userspace (which means failed to put css into cmsg).
It can happen when someone writes buggy codes in __sock_recv_timestamp(),
for example.

After adding the check so that running ./txtimestamp will reflect the
result correctly like this if there is a bug in the reporting phase:
protocol:     TCP
payload:      10
server port:  9000

family:       INET
test SND
    USR: 1725458477 s 667997 us (seq=0, len=0)
Failed to report timestamps
    USR: 1725458477 s 718128 us (seq=0, len=0)
Failed to report timestamps
    USR: 1725458477 s 768273 us (seq=0, len=0)
Failed to report timestamps
    USR: 1725458477 s 818416 us (seq=0, len=0)
Failed to report timestamps
...

In the future, it will help us detect whether the new coming patch has
bugs or not.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905160035.62407-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09 16:42:28 -07:00
Dev Jain
536ab838a5 selftests/mm: relax test to fail after 100 migration failures
It was recently observed at [1] that during the folio unmapping stage of
migration, when the PTEs are cleared, a racing thread faulting on that
folio may increase the refcount of the folio, sleep on the folio lock (the
migration path has the lock), and migration ultimately fails when
asserting the actual refcount against the expected.  Thereby, the
migration selftest fails on shared-anon mappings.  The above enforces the
fact that migration is a best-effort service, therefore, it is wrong to
fail the test for just a single failure; hence, fail the test after 100
consecutive failures (where 100 is still a subjective choice).  Note that,
this has no effect on the execution time of the test since that is
controlled by a timeout.

[1] https://lore.kernel.org/all/20240801081657.1386743-1-dev.jain@arm.com/

Link: https://lkml.kernel.org/r/20240830051609.4037834-1-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:05 -07:00
Alexander Zhu
391e869711 mm: selftest to verify zero-filled pages are mapped to zeropage
When a THP is split, any subpage that is zero-filled will be mapped to the
shared zeropage, hence saving memory.  Add selftest to verify this by
allocating zero-filled THP and comparing RssAnon before and after split.

Link: https://lkml.kernel.org/r/20240830100438.3623486-4-usamaarif642@gmail.com
Signed-off-by: Alexander Zhu <alexlzhu@fb.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuang Zhai <zhais@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Shuang Zhai <szhai2@cs.rochester.edu>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:03 -07:00
Yusheng Zheng
41d0c4677f libbpf: Fix some typos in comments
Fix some spelling errors in the code comments of libbpf:

betwen -> between
paremeters -> parameters
knowning -> knowing
definiton -> definition
compatiblity -> compatibility
overriden -> overridden
occured -> occurred
proccess -> process
managment -> management
nessary -> necessary

Signed-off-by: Yusheng Zheng <yunwei356@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240909225952.30324-1-yunwei356@gmail.com
2024-09-09 16:05:40 -07:00
Maxim Mikityanskiy
bee109b7b3 bpf: Fix error message on kfunc arg type mismatch
When "arg#%d expected pointer to ctx, but got %s" error is printed, both
template parts actually point to the type of the argument, therefore, it
will also say "but got PTR", regardless of what was the actual register
type.

Fix the message to print the register type in the second part of the
template, change the existing test to adapt to the new format, and add a
new test to test the case when arg is a pointer to context, but reg is a
scalar.

Fixes: 00b85860fe ("bpf: Rewrite kfunc argument handling")
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240909133909.1315460-1-maxim@isovalent.com
2024-09-09 15:58:17 -07:00
Andrew Kreimer
f028d7716c bpftool: Fix typos
Fix typos in documentation.

Reported-by: Matthew Wilcox <willy@infradead.org>
Reported-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240909092452.4293-1-algonell@gmail.com
2024-09-09 15:57:52 -07:00
Kuan-Wei Chiu
4cdc0e4ce5 bpftool: Fix undefined behavior caused by shifting into the sign bit
Replace shifts of '1' with '1U' in bitwise operations within
__show_dev_tc_bpf() to prevent undefined behavior caused by shifting
into the sign bit of a signed integer. By using '1U', the operations
are explicitly performed on unsigned integers, avoiding potential
integer overflow or sign-related issues.

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240908140009.3149781-1-visitorckw@gmail.com
2024-09-09 15:57:09 -07:00
Shuyi Cheng
12707b9159 libbpf: Fixed getting wrong return address on arm64 architecture
ARM64 has a separate lr register to store the return address, so here
you only need to read the lr register to get the return address, no need
to dereference it again.

Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1725787433-77262-1-git-send-email-chengshuyi@linux.alibaba.com
2024-09-09 15:56:22 -07:00
Arnaldo Carvalho de Melo
c790f2bafb perf trace: Introduce SCA_TIMESPEC_FROM_USER() to set .from_user = true
Paving the way for the generic BPF BTF based syscall arg augmenter.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:23:04 -03:00
Arnaldo Carvalho de Melo
be14a71984 perf trace: Introduce SCA_SOCKADDR_FROM_USER() to set .from_user = true
Paving the way for the generic BPF BTF based syscall arg augmenter.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:23:04 -03:00
Arnaldo Carvalho de Melo
690eda6508 perf trace: Introduce SCA_PERF_ATTR_FROM_USER() to set .from_user = true
Paving the way for the generic BPF BTF based syscall arg augmenter.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:23:04 -03:00
Arnaldo Carvalho de Melo
2f2e439ba5 perf trace: Mark which syscall arguments go from user space to kernel space
We need to know where to collect it in the BPF augmenters, if in the
sys_enter hook or in the sys_exit hook.

Start with the SCA_FILENAME one, that is just from user to kernel space.

The alternative, better, but takes a bit more time than I have now, is
to use the __user information that is already in the syscall args and
encoded in BTF via a tag, do it later.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:23:03 -03:00
Arnaldo Carvalho de Melo
c90a88d33a perf trace: Use a common encoding for augmented arguments, with size + error + payload
We were using a more compact format, without explicitely encoding the
size and possible error in the payload for an argument.

To do it generically, at least as Howard Chu did in his GSoC activities,
it is more convenient to use the same model that was being used for
string arguments, passing { size, error, payload }.

So use that for the non string syscall args we have so far:

  struct timespec
  struct perf_event_attr
  struct sockaddr (this one has even a variable size)

With this in place we have the userspace pretty printers:

  perf_event_attr___scnprintf()
  syscall_arg__scnprintf_augmented_sockaddr()
  syscall_arg__scnprintf_augmented_timespec()

Ready to have the generic BPF collector in tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
sending its generic payload and thus we'll use them instead of a generic
libbpf btf_dump interface that doesn't know about about the sockaddr
mux, perf_event_attr non-trivial fields (sample_type, etc), leaving it
as a (useful) fallback that prints just basic types until we put in
place a more sophisticated pretty printer infrastructure that associates
synthesized enums to struct fields using the header scrapers we have in
tools/perf/trace/beauty/, some of them in this list:

  $ ls tools/perf/trace/beauty/*.sh
  tools/perf/trace/beauty/arch_errno_names.sh
  tools/perf/trace/beauty/kcmp_type.sh
  tools/perf/trace/beauty/perf_ioctl.sh
  tools/perf/trace/beauty/statx_mask.sh
  tools/perf/trace/beauty/clone.sh
  tools/perf/trace/beauty/kvm_ioctl.sh
  tools/perf/trace/beauty/pkey_alloc_access_rights.sh
  tools/perf/trace/beauty/sync_file_range.sh
  tools/perf/trace/beauty/drm_ioctl.sh
  tools/perf/trace/beauty/madvise_behavior.sh
  tools/perf/trace/beauty/prctl_option.sh
  tools/perf/trace/beauty/usbdevfs_ioctl.sh
  tools/perf/trace/beauty/fadvise.sh
  tools/perf/trace/beauty/mmap_flags.sh
  tools/perf/trace/beauty/rename_flags.sh
  tools/perf/trace/beauty/vhost_virtio_ioctl.sh
  tools/perf/trace/beauty/fs_at_flags.sh
  tools/perf/trace/beauty/mmap_prot.sh
  tools/perf/trace/beauty/sndrv_ctl_ioctl.sh
  tools/perf/trace/beauty/x86_arch_prctl.sh
  tools/perf/trace/beauty/fsconfig.sh
  tools/perf/trace/beauty/mount_flags.sh
  tools/perf/trace/beauty/sndrv_pcm_ioctl.sh
  tools/perf/trace/beauty/fsmount.sh
  tools/perf/trace/beauty/move_mount_flags.sh
  tools/perf/trace/beauty/sockaddr.sh
  tools/perf/trace/beauty/fspick.sh
  tools/perf/trace/beauty/mremap_flags.sh
  tools/perf/trace/beauty/socket.sh
  $

Testing it:

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
     0.000 ( 0.031 ms): mv/1193096 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~# perf trace -e *nanosleep sleep 1.2345678901
       0.000 (1234.654 ms): sleep/1192697 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567891 }, rmtp: 0x7ffe1ea80460) = 0
  root@number:~# perf trace -e perf_event_open* perf stat -e cpu-clock sleep 1
       0.000 ( 0.011 ms): perf/1192701 perf_event_open(attr_uptr: { type: 1 (software), size: 136, config: 0 (PERF_COUNT_SW_CPU_CLOCK), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 1192702 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3

   Performance counter stats for 'sleep 1':

                0.51 msec cpu-clock                        #    0.001 CPUs utilized

         1.001242090 seconds time elapsed

         0.000000000 seconds user
         0.001010000 seconds sys

  root@number:~# perf trace -e connect* ping -c 1 bsky.app
       0.000 ( 0.130 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
      23.907 ( 0.006 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.20.108.158 }, addrlen: 16) = 0
      23.915 PING bsky.app (3.20.108.158) 56(84) bytes of data.
  ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.917 ( 0.002 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.12.170.30 }, addrlen: 16) = 0
      23.921 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.923 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 18.217.70.179 }, addrlen: 16) = 0
      23.925 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.927 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.132.20.46 }, addrlen: 16) = 0
      23.930 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.931 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.142.89.165 }, addrlen: 16) = 0
      23.934 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.935 ( 0.002 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 18.119.147.159 }, addrlen: 16) = 0
      23.938 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.940 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.22.38.164 }, addrlen: 16) = 0
      23.942 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.944 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.13.14.133 }, addrlen: 16) = 0
      23.956 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 3.20.108.158 }, addrlen: 16) = 0
  ^C
  --- bsky.app ping statistics ---
  1 packets transmitted, 0 received, 100% packet loss, time 0ms

  root@number:~#

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fW4=2GoP6foAN6qbrCiUzy0a_TzHbd8rvDsakTPfdzvfg@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:17:03 -03:00
Arnaldo Carvalho de Melo
c1632cc5ed perf trace augmented_syscalls.bpf: Move the renameat aumenter to renameat2, temporarily
While trying to shape Howard Chu's generic BPF augmenter transition into
the codebase I got stuck with the renameat2 syscall.

Until I noticed that the attempt at reusing augmenters were making it
use the 'openat' syscall augmenter, that collect just one string syscall
arg, for the 'renameat2' syscall, that takes two strings.

So, for the moment, just to help in this transition period, since
'renameat2' is what is used these days in the 'mv' utility, just make
the BPF collector be associated with the more widely used syscall,
hopefully the transition to Howard's generic BPF augmenter will cure
this, so get this out of the way for now!

So now we still have that odd "reuse", but for something we're not
testing so won't get in the way anymore:

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -vv -e rename* mv 123456 987654 |& grep renameat
  Reusing "openat" BPF sys_enter augmenter for "renameat"
       0.000 ( 0.079 ms): mv/1158612 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~#

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fXjGYs=tpBgETK-P9U-CuXssytk9pSnTXpfphrmmOydWA@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:16:26 -03:00
Yanfei Xu
5c6e3d5a5d cxl/pci: Remove duplicated implementation of waiting for memory_info_valid
commit ce17ad0d54 ("cxl: Wait Memory_Info_Valid before access memory
related info") added another implementation, which is
cxl_dvsec_mem_range_valid(), of waiting for memory_info_valid without
realizing it duplicated wait_for_valid(). Remove wait_for_valid() and
retain cxl_dvsec_mem_range_valid() as the former is hardcoded to check
only the Memory_Info_Valid bit of DVSEC range 1, while the latter allows
for selection between DVSEC range 1 or 2 via parameter.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20240828084231.1378789-3-yanfei.xu@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-09 11:33:44 -07:00
Linus Torvalds
fb92a1ffc1 hyperv-fixes for 6.11-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmbeRpsTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXsDDB/4oL6ypxiF3/yo+xR6bt8HlzIfcVeTx
 EuDR+a/hDRQdShMbNtgaF2OxovMO1W5Se2hCoNrKbVxrPRHL6gUuZASdm93l75eh
 l8I0muQif1q9rEXNbwQxe/ydE0860OgmE/ZGv944BXBtirG1fGHei1DNKkdL6VJy
 iEmmURwz7Ykg5neqwzYBY9SV7P/wwWZNR8GIRTWHhWU+ok1cYpehAs1dpQleAxsz
 WZCQLfIMXdSJBSDB/YO7JAlykZ1DkkTkI8pfbe2diReaDSw2QYsnsPXD6MVZArLO
 73kDojwb0LitLyWYEjm07ipOApkzYrEGTXjlLNdUVVF1Fx20nohu8jRd
 =k5P6
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20240908' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Add a documentation overview of Confidential Computing VM support
   (Michael Kelley)

 - Use lapic timer in a TDX VM without paravisor (Dexuan Cui)

 - Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency
   (Michael Kelley)

 - Fix a kexec crash due to VP assist page corruption (Anirudh
   Rayabharam)

 - Python3 compatibility fix for lsvmbus (Anthony Nandaa)

 - Misc fixes (Rachel Menge, Roman Kisel, zhang jiao, Hongbo Li)

* tag 'hyperv-fixes-signed-20240908' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hv: vmbus: Constify struct kobj_type and struct attribute_group
  tools: hv: rm .*.cmd when make clean
  x86/hyperv: fix kexec crash due to VP assist page corruption
  Drivers: hv: vmbus: Fix the misplaced function description
  tools: hv: lsvmbus: change shebang to use python3
  x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency
  Documentation: hyperv: Add overview of Confidential Computing VM support
  clocksource: hyper-v: Use lapic timer in a TDX VM without paravisor
  Drivers: hv: Remove deprecated hv_fcopy declarations
2024-09-09 09:31:55 -07:00
Greg Kroah-Hartman
f299cd11f7 Merge 6.11-rc7 into usb-next
We need the USB fixes in here as well, and this also resolves the merge
conflict in:
	drivers/usb/typec/ucsi/ucsi.c

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-09 08:40:22 +02:00
Greg Kroah-Hartman
895b4fae93 Merge 6.11-rc7 into char-misc-next
We need the char-misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-09 08:36:23 +02:00
Madhavan Srinivasan
8c9c01ce69 selftests/powerpc: Allow building without static libc
Currently exec-target.c is linked statically with libc, which on Fedora
at least requires installing an additional package (glibc-static).

If that package is not installed the build fails with:

    CC       exec_target
  /usr/bin/ld: cannot find -lc: No such file or directory
  collect2: error: ld returned 1 exit status

All exec_target.c does is call sys_exit, which can be done easily enough
using inline assembly, and removes the requirement for a static libc to
be installed.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240812094152.418586-1-maddy@linux.ibm.com
2024-09-09 16:35:04 +10:00
Zhu Jun
94e86b174d tools/hv: Add memory allocation check in hv_fcopy_start
Added error handling for memory allocation failures
of file_name and path_name.

Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://lore.kernel.org/r/20240906091333.11419-1-zhujun2@cmss.chinamobile.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240906091333.11419-1-zhujun2@cmss.chinamobile.com>
2024-09-09 01:09:27 +00:00
Neeraj Upadhyay
355debb83b Merge branches 'context_tracking.15.08.24a', 'csd.lock.15.08.24a', 'nocb.09.09.24a', 'rcutorture.14.08.24a', 'rcustall.09.09.24a', 'srcu.12.08.24a', 'rcu.tasks.14.08.24a', 'rcu_scaling_tests.15.08.24a', 'fixes.12.08.24a' and 'misc.11.08.24a' into next.09.09.24a 2024-09-09 00:09:47 +05:30
Sam James
8a3f14bb1e libbpf: Workaround (another) -Wmaybe-uninitialized false positive
We get this with GCC 15 -O3 (at least):
```
libbpf.c: In function ‘bpf_map__init_kern_struct_ops’:
libbpf.c:1109:18: error: ‘mod_btf’ may be used uninitialized [-Werror=maybe-uninitialized]
 1109 |         kern_btf = mod_btf ? mod_btf->btf : obj->btf_vmlinux;
      |         ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
libbpf.c:1094:28: note: ‘mod_btf’ was declared here
 1094 |         struct module_btf *mod_btf;
      |                            ^~~~~~~
In function ‘find_struct_ops_kern_types’,
    inlined from ‘bpf_map__init_kern_struct_ops’ at libbpf.c:1102:8:
libbpf.c:982:21: error: ‘btf’ may be used uninitialized [-Werror=maybe-uninitialized]
  982 |         kern_type = btf__type_by_id(btf, kern_type_id);
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
libbpf.c: In function ‘bpf_map__init_kern_struct_ops’:
libbpf.c:967:21: note: ‘btf’ was declared here
  967 |         struct btf *btf;
      |                     ^~~
```

This is similar to the other libbpf fix from a few weeks ago for
the same modelling-errno issue (fab45b9627).

Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://bugs.gentoo.org/939106
Link: https://lore.kernel.org/bpf/f6962729197ae7cdf4f6d1512625bd92f2322d31.1725630494.git.sam@gentoo.org
2024-09-06 14:09:24 -07:00
Mykyta Yatsenko
f8c6b7913d bpftool: Improve btf c dump sorting stability
Existing algorithm for BTF C dump sorting uses only types and names of
the structs and unions for ordering. As dump contains structs with the
same names but different contents, relative to each other ordering of
those structs will be accidental.
This patch addresses this problem by introducing a new sorting field
that contains hash of the struct/union field names and types to
disambiguate comparison of the non-unique named structs.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240906132453.146085-1-mykyta.yatsenko5@gmail.com
2024-09-06 14:06:30 -07:00
Linus Torvalds
890daedec4 RISC-V Fixes for 6.11-rc7
* A revert for the mmap() change that ties the allocation range to the
   hint adress, as what we tried to do ended up regressing on other
   userspace workloads.
 * A fix to avoid a kernel memory leak when emulating misaligned accesses
   from userspace.
 * A Kconfig fix for toolchain vector detection, which now correctly
   detects vector support on toolchains where the V extension depends on
   the M extension.
 * A fix to avoid failing the linear mapping bootmem bounds check on
   NOMMU systems.
 * A fix for early alternatives on relocatable kernels.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmbbD44THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiRR/EACqW46mbTmGrbDzbk2YcKbkc05djuB2
 +yorDaO6d188xmHM74zEvt1+X+Mxj18pMm+V02L+27JA7asv+JugXQVwfxtZ769w
 /XMKGrJTUCMSvFpsbhszbse3vXjc1F9uQ5wNa9o44MHAc2twSkJHtdhZJwkJJ9ru
 Od0m99VXWB1gbA1hvCpQBs2uMSzLoU5X2//AaAzVFK1pyskZ7HPqFX16eFcT0gpA
 GDNYIKLPVF1pcwS2gkQM7LAwveCsxuEdnLufJs5Coz9BZ/kQJPd3sK/z8Z58ghy2
 Db6XXtcYJs64Ndjv1MSowb4rIii/BN2vlMCCT95xHH+tuJR6flXuIZQPpI971V/A
 XOCglNQQkmzjJuFKn1/9ZJcVZGITOqDX37iMPW/3bQ/OFG0emBeGqYXKMmScI6f1
 TtqiByz2VXNEJBNkQVA37Cj42DVmRg3MCjwy0ACLbqBpMeSbGK7MRNUk258wOp4V
 ucmhf50D3a0w8y/3miaAH1Pk+tZz/rtVFkdbibDW3M91cOfdNoAYKhSJPEEnhaGm
 pVTvW+usKDdim3nqqTrlZTfFTNF7wFkvoDc11lStgYFK8VoZWuyoBcf1LQ2+ghv9
 qP/A5LRnWU4nXCxZG6dKRoZ/VvoGtsKdI6Iatnak4cAbsvXI+7foelgLgWY6aFzk
 /ZUtSmWDz1E21Q==
 =xYax
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A revert for the mmap() change that ties the allocation range to the
   hint adress, as what we tried to do ended up regressing on other
   userspace workloads.

 - A fix to avoid a kernel memory leak when emulating misaligned
   accesses from userspace.

 - A Kconfig fix for toolchain vector detection, which now correctly
   detects vector support on toolchains where the V extension depends on
   the M extension.

 - A fix to avoid failing the linear mapping bootmem bounds check on
   NOMMU systems.

 - A fix for early alternatives on relocatable kernels.

* tag 'riscv-for-linus-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix RISCV_ALTERNATIVE_EARLY
  riscv: Do not restrict memory size because of linear mapping on nommu
  riscv: Fix toolchain vector detection
  riscv: misaligned: Restrict user access to kernel memory
  riscv: mm: Do not restrict mmap address based on hint
  riscv: selftests: Remove mmap hint address checks
  Revert "RISC-V: mm: Document mmap changes"
2024-09-06 13:00:59 -07:00
zhang jiao
af1ec38c6c selftests/timers: Remove unused NSEC_PER_SEC macro
By reading the code, I found the macro NSEC_PER_SEC
is never referenced in the code. Just remove it.

Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: John Stultz <jstultz@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-06 13:37:41 -06:00
John B. Wyatt IV
80e67f1802 pm:cpupower: Add error warning when SWIG is not installed
Add error message to better explain to the user when SWIG and
python-config is missing from the path. Makefile was cleaned up
and unneeded elements were removed.

Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: John B. Wyatt IV <jwyatt@redhat.com>
Signed-off-by: John B. Wyatt IV <sageofredondo@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-06 10:58:35 -06:00
Shuah Khan
7beaf1da07 selftests:resctrl: Fix build failure on archs without __cpuid_count()
When resctrl is built on architectures without __cpuid_count()
support, build fails. resctrl uses __cpuid_count() defined in
kselftest.h.

Even though the problem is seen while building resctrl on aarch64,
this error can be seen on any platform that doesn't support CPUID.

CPUID is a x86/x86-64 feature and code paths with CPUID asm commands
will fail to build on all other architectures.

All others tests call __cpuid_count() do so from x86/x86_64 code paths
when _i386__ or __x86_64__ are defined. resctrl is an exception.

Fix the problem by defining __cpuid_count() only when __i386__ or
__x86_64__ are defined in kselftest.h and changing resctrl to call
__cpuid_count() only when __i386__ or __x86_64__ are defined.

In file included from resctrl.h:24,
                 from cat_test.c:11:
In function ‘arch_supports_noncont_cat’,
    inlined from ‘noncont_cat_run_test’ at cat_test.c:326:6:
../kselftest.h:74:9: error: impossible constraint in ‘asm’
   74 |         __asm__ __volatile__ ("cpuid\n\t"                               \
      |         ^~~~~~~
cat_test.c:304:17: note: in expansion of macro ‘__cpuid_count’
  304 |                 __cpuid_count(0x10, 1, eax, ebx, ecx, edx);
      |                 ^~~~~~~~~~~~~
../kselftest.h:74:9: error: impossible constraint in ‘asm’
   74 |         __asm__ __volatile__ ("cpuid\n\t"                               \
      |         ^~~~~~~
cat_test.c:306:17: note: in expansion of macro ‘__cpuid_count’
  306 |                 __cpuid_count(0x10, 2, eax, ebx, ecx, edx);

Fixes: ae638551ab ("selftests/resctrl: Add non-contiguous CBMs CAT test")
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Closes: https://lore.kernel.org/lkml/20240809071059.265914-1-usama.anjum@collabora.com/
Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-06 10:46:03 -06:00
Kan Liang
003265bb6f perf mem: Fix the wrong reference in parse_record_events()
A segmentation fault can be triggered when running
'perf mem record -e ldlat-loads'

The commit 35b38a71c9 ("perf mem: Rework command option handling")
moves the OPT_CALLBACK of event from __cmd_record() to cmd_mem().

When invoking the __cmd_record(), the 'mem' has been referenced (&).

So the &mem passed into the parse_record_events() is a double reference
(&&) of the original struct perf_mem mem.

But in the cmd_mem(), the &mem is the single reference (&) of the
original struct perf_mem mem.

Fixes: 35b38a71c9 ("perf mem: Rework command option handling")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240905170737.4070743-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:45:28 -03:00
Kan Liang
5ad7db2c3f perf mem: Fix missed p-core mem events on ADL and RPL
The p-core mem events are missed when launching 'perf mem record' on ADL
and RPL.

  root@number:~# perf mem record sleep 1
  Memory events are enabled on a subset of CPUs: 16-27
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.032 MB perf.data ]
  root@number:~# perf evlist
  cpu_atom/mem-loads,ldlat=30/P
  cpu_atom/mem-stores/P
  dummy:u

A variable 'record' in the 'struct perf_mem_event' is to indicate
whether a mem event in a mem_events[] should be recorded. The current
code only configure the variable for the first eligible PMU.

It's good enough for a non-hybrid machine or a hybrid machine which has
the same mem_events[].

However, if a different mem_events[] is used for different PMUs on a
hybrid machine, e.g., ADL or RPL, the 'record' for the second PMU never
get a chance to be set.

The mem_events[] of the second PMU are always ignored.

'perf mem' doesn't support the per-PMU configuration now. A per-PMU
mem_events[] 'record' variable doesn't make sense. Make it global.

That could also avoid searching for the per-PMU mem_events[] via
perf_pmu__mem_events_ptr every time.

Committer testing:

  root@number:~# perf evlist -g
  cpu_atom/mem-loads,ldlat=30/P
  cpu_atom/mem-stores/P
  {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
  cpu_core/mem-stores/P
  dummy:u
  root@number:~#

The :S for '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}' is
not being added by 'perf evlist -g', to be checked.

Fixes: abbdd79b78 ("perf mem: Clean up perf_mem_events__name()")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/
Link: https://lore.kernel.org/r/20240905170737.4070743-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:45:17 -03:00
Kan Liang
6e05d28ff2 perf mem: Check mem_events for all eligible PMUs
The current perf_pmu__mem_events_init() only checks the availability of
the mem_events for the first eligible PMU. It works for non-hybrid
machines and hybrid machines that have the same mem_events.

However, it may bring issues if a hybrid machine has a different
mem_events on different PMU, e.g., Alder Lake and Raptor Lake. A
mem-loads-aux event is only required for the p-core. The mem_events on
both e-core and p-core should be checked and marked.

The issue was not found, because it's hidden by another bug, which only
records the mem-events for the e-core. The wrong check for the p-core
events didn't yell.

Fixes: abbdd79b78 ("perf mem: Clean up perf_mem_events__name()")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240905170737.4070743-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:45:07 -03:00
Andi Kleen
4bef6168c1 perf script python: Avoid buffer overflow in python PEBS register interface
Running a script that processes PEBS records gives buffer overflows
in valgrind.

The problem is that the allocation of the register string doesn't
include the terminating 0 byte. Fix this.

I also replaced the very magic "28" with a more reasonable larger buffer
that should fit all registers.  There's no need to conserve memory here.

  ==2106591== Memcheck, a memory error detector
  ==2106591== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
  ==2106591== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
  ==2106591== Command: ../perf script -i tcall.data gcov.py tcall.gcov
  ==2106591==
  ==2106591== Invalid write of size 1
  ==2106591==    at 0x713354: regs_map (trace-event-python.c:748)
  ==2106591==    by 0x7134EB: set_regs_in_dict (trace-event-python.c:784)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==  Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
  ==2106591==    at 0x484280F: malloc (vg_replace_malloc.c:442)
  ==2106591==    by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==
  ==2106591== Invalid read of size 1
  ==2106591==    at 0x484B6C6: strlen (vg_replace_strmem.c:502)
  ==2106591==    by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899)
  ==2106591==    by 0x7134F7: set_regs_in_dict (trace-event-python.c:786)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==  Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
  ==2106591==    at 0x484280F: malloc (vg_replace_malloc.c:442)
  ==2106591==    by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==
  ==2106591== Invalid write of size 1
  ==2106591==    at 0x713354: regs_map (trace-event-python.c:748)
  ==2106591==    by 0x713539: set_regs_in_dict (trace-event-python.c:789)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==  Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
  ==2106591==    at 0x484280F: malloc (vg_replace_malloc.c:442)
  ==2106591==    by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==
  ==2106591== Invalid read of size 1
  ==2106591==    at 0x484B6C6: strlen (vg_replace_strmem.c:502)
  ==2106591==    by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899)
  ==2106591==    by 0x713545: set_regs_in_dict (trace-event-python.c:791)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==  Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
  ==2106591==    at 0x484280F: malloc (vg_replace_malloc.c:442)
  ==2106591==    by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==
  73056 total, 29 ignored

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240905151058.2127122-2-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:44:58 -03:00
Ian Rogers
f2dbc77909 perf jevents: Ignore sys when determining a model directory
Existing sys directories aren't placed under a model directory like
skylake.

Placing a sys directory there causes the `is_leaf_dir` test to fail and
consequently no events or metrics are generated for the model.

Ignore sys directories in this case and update the comments to
reflect why.

This change has no affect, but when testing with a sys directory for a
model people have reported running into the no event/metric issue.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240904211705.915101-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:44:46 -03:00
Arnaldo Carvalho de Melo
92984e4468 Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes from perf-tools/perf-tools, some of which were also in
perf-tools-next but were then indentified as being more appropriate to
go sooner, to fix regressions in v6.11.

Resolve a simple merge conflict in tools/perf/tests/pmu.c where a more
future proof approach to initialize all fields of a struct was used in
perf-tools-next, the one that is going into v6.11 is enough for the
segfault it addressed (using an uninitialized test_pmu.alias field).

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:42:59 -03:00
Jakub Kicinski
502cc061de Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/phy/phy_device.c
  2560db6ede ("net: phy: Fix missing of_node_put() for leds")
  1dce520abd ("net: phy: Use for_each_available_child_of_node_scoped()")
https://lore.kernel.org/20240904115823.74333648@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/xilinx/xilinx_axienet.h
drivers/net/ethernet/xilinx/xilinx_axienet_main.c
  858430db28 ("net: xilinx: axienet: Fix race in axienet_stop")
  76abb5d675 ("net: xilinx: axienet: Add statistics support")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-05 20:37:20 -07:00
Linus Torvalds
b831f83e40 bpf-6.11-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmbaYFMACgkQ6rmadz2v
 bTq7JBAAipwHeOL3IYproQxGy+f0W3Uik9FNlavSQ3zpJHmTJcpf0ysXkqH23g2q
 26CF0R44gmGMkdbZsxbk3HLI2qRmzxmznYCDH0g7d9qwzQMhFHIiY7TW7UD/XbKx
 UHdHLb5PYrj+j94T1WGiQdvbZYDlpmdz5rFA9K/TBtBArqYp9mA4D/cIlTDBfFpk
 cjhSGVl9x/BKbiHKApxSGcR7Fh/+ux9mVdlssWQNhRfm3V2tbRSAw1i1/ydTG+4c
 bf/m0RSIDfPMxy1i7D0lNRbclzWVisTqNzDXHfQoRUJMuMDfsK4UZB/6gvh+2LKy
 D60vT8AfN5ygjJbLdFbwFGnEymjfsXWguyqfQB0d9Hj/2/EsZ01rI2ikJv9J+qKl
 wwZM3YeA3Q/V0mZ5wCONp2dn+s+82nga+fdvCRFz6SLkWQwgbW5BYHFF1c60V9MH
 Pbd9Y5VfCOEZRzR6RxbmguPrnoU1+BUwQeIAp9L73bllrzhtmh/aL/b03uw8/wUh
 I+peLxJ+DVp6wTudgvSMviMySWcztuz397G7TnFyG0V4nKe1+QxSaQWWw2HKvpy3
 i+m98qoWqbuJqz49FpEtX6x/17gZZNA0LK648D77nrOfsGWOLTKOZUDbNWbTPw9a
 Gojg5obJ8P82yO9UCYQLyGsAJxJrKZv3OEmqy0mRG1hrSMsozxg=
 =5Quw
 -----END PGP SIGNATURE-----

Merge tag 'bpf-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix crash when btf_parse_base() returns an error (Martin Lau)

 - Fix out of bounds access in btf_name_valid_section() (Jeongjun Park)

* tag 'bpf-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add a selftest to check for incorrect names
  bpf: add check for invalid name in btf_name_valid_section()
  bpf: Fix a crash when btf_parse_base() returns an error pointer
2024-09-05 20:10:53 -07:00
John B. Wyatt IV
660475266b pm:cpupower: Include test_raw_pylibcpupower.py
This script demonstrates how to make use of, and tests, the bindings.

In the future, this script could become part of a larger test suite to
test the bindings and libcpupower.

Signed-off-by: John B. Wyatt IV <jwyatt@redhat.com>
Signed-off-by: John B. Wyatt IV <sageofredondo@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-05 18:50:07 -06:00
John B. Wyatt IV
338f490e07 pm:cpupower: Add SWIG bindings files for libcpupower
SWIG is a tool packaged in Fedora and other distros that can generate
bindings from C and C++ code for several languages including Python,
Perl, and Go.

These bindings allows users to easily write scripts that use and extend
libcpupower's functionality. Currently, only Python is provided in the
makefile, but additional languages may be added if there is demand.

Added suggestions from Shuah Khan for the README and license discussion.

Note that while SWIG itself is GPL v3+ licensed; the resulting output,
the bindings code, is permissively licensed + the license of the .o
files. Please see
https://swig.org/legal.html and [1] for more details.

[1]
https://lore.kernel.org/linux-pm/Zqv9BOjxLAgyNP5B@hatbackup/

Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: John B. Wyatt IV <jwyatt@redhat.com>
Signed-off-by: John B. Wyatt IV <sageofredondo@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-05 18:50:07 -06:00
John B. Wyatt IV
4b80294fb5 pm:cpupower: Add missing powercap_set_enabled() stub function
There was a symbol listed in the powercap.h file that was not implemented.
Implement it with a stub return of 0.

Programs like SWIG require that functions that are defined in the
headers be implemented.

Fixes: c2294c1496 ("cpupower: Introduce powercap intel-rapl library and powercap-info command")
Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: John B. Wyatt IV <jwyatt@redhat.com>
Signed-off-by: John B. Wyatt IV <sageofredondo@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-05 18:48:57 -06:00
Linus Torvalds
d759ee240d Including fixes from can, bluetooth and wireless.
No known regressions at this point. Another calm week, but chances are
 that has more to do with vacation season than the quality of our work.
 
 Current release - new code bugs:
 
  - smc: prevent NULL pointer dereference in txopt_get
 
  - eth: ti: am65-cpsw: number of XDP-related fixes
 
 Previous releases - regressions:
 
  - Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
    BREDR/LE", it breaks existing user space
 
  - Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
    later problems with suspend
 
  - can: mcp251x: fix deadlock if an interrupt occurs during mcp251x_open
 
  - eth: r8152: fix the firmware communication error due to use
    of bulk write
 
  - ptp: ocp: fix serial port information export
 
  - eth: igb: fix not clearing TimeSync interrupts for 82580
 
  - Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo
 
 Previous releases - always broken:
 
  - eth: intel: fix crashes and bugs when reconfiguration and resets
    happening in parallel
 
  - wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()
 
 Misc:
 
  - docs: netdev: document guidance on cleanup.h
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmbaMbsACgkQMUZtbf5S
 IrtpUg/+J6rNaZuGVTHJQAjdSlMx/HzpN3GIbhYyUSg+iHNclqtxJ706b2vyrG88
 Rw5a+f8aQueONNsoFfa/ooU4cGsdO1oYlch0Wtuj5taCqy2SvtVqJAyiuDyNNjU0
 BQ1Rf7aRLI7enmEpZJN2FFu106YTVccBcLqhPkx0CPcEjV+p5RvypfeQL72H6ZKx
 +7/HzEl4bagHIQW3W1uJGNUdwNP7fP2/Kg7TrTJ1t629nLiJCxKL7LrsmebO5o9a
 v+2NxAa1eujTZ1k7ITcM0wYlxKOaGNFF4sT+dA+GfMe+SFssdhGeZYyv1t0zm5VI
 3apJ/pSHza1a/hXFa6PaOSw5M5LWn4bJOqeZLl/yIV0upu5xadWqmT0gVay8V9lY
 +x/MURGr3seuNRSMsaToHDIq+Us45Dt/qkDDNO/P+9R/BsJKCW05Pfqx3Mr/OHzv
 eeCPbXRh4YYBdrUicBWo04gSD+BUA53vW8FC3pxU5ieLOOcX4kOPeb8wNPHcXjMU
 73D+kyO1ufsfsFMkd3VfgDI1mMz+xpEuZ6pxs33tJ/1Ny7DdG1Q49xlQVh4Wnobk
 uQqUSzdoelOROeg1rwmsIbfwIvj5a5dVIyBu8TDjHlb/rk1QNTkyu+fFmQRWEotL
 fQ7U62wXlpoCT8WchSMtiU32IDJ2+Lwhwecguy1Z7kLOLrtL8XU=
 =ju86
 -----END PGP SIGNATURE-----

Merge tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from can, bluetooth and wireless.

  No known regressions at this point. Another calm week, but chances are
  that has more to do with vacation season than the quality of our work.

  Current release - new code bugs:

   - smc: prevent NULL pointer dereference in txopt_get

   - eth: ti: am65-cpsw: number of XDP-related fixes

  Previous releases - regressions:

   - Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
     BREDR/LE", it breaks existing user space

   - Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
     later problems with suspend

   - can: mcp251x: fix deadlock if an interrupt occurs during
     mcp251x_open

   - eth: r8152: fix the firmware communication error due to use of bulk
     write

   - ptp: ocp: fix serial port information export

   - eth: igb: fix not clearing TimeSync interrupts for 82580

   - Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo

  Previous releases - always broken:

   - eth: intel: fix crashes and bugs when reconfiguration and resets
     happening in parallel

   - wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()

  Misc:

   - docs: netdev: document guidance on cleanup.h"

* tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
  ila: call nf_unregister_net_hooks() sooner
  tools/net/ynl: fix cli.py --subscribe feature
  MAINTAINERS: fix ptp ocp driver maintainers address
  selftests: net: enable bind tests
  net: dsa: vsc73xx: fix possible subblocks range of CAPT block
  sched: sch_cake: fix bulk flow accounting logic for host fairness
  docs: netdev: document guidance on cleanup.h
  net: xilinx: axienet: Fix race in axienet_stop
  net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN
  r8152: fix the firmware doesn't work
  fou: Fix null-ptr-deref in GRO.
  bareudp: Fix device stats updates.
  net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
  bpf, net: Fix a potential race in do_sock_getsockopt()
  net: dqs: Do not use extern for unused dql_group
  sch/netem: fix use after free in netem_dequeue
  usbnet: modern method to get random MAC
  MAINTAINERS: wifi: cw1200: add net-cw1200.h
  ice: do not bring the VSI up, if it was down before the XDP setup
  ice: remove ICE_CFG_BUSY locking from AF_XDP code
  ...
2024-09-05 17:08:01 -07:00
JP Kobryn
1b3bc648f5 bpf/selftests: coverage for tp and perf event progs using kfuncs
This coverage ensures that kfuncs are allowed within tracepoint and perf
event programs.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Link: https://lore.kernel.org/r/20240905223812.141857-3-inwardvessel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 17:02:03 -07:00
Arkadiusz Kubalewski
6fda63c45f tools/net/ynl: fix cli.py --subscribe feature
Execution of command:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml /
	--subscribe "monitor" --sleep 10
fails with:
  File "/repo/./tools/net/ynl/cli.py", line 109, in main
    ynl.check_ntf()
  File "/repo/tools/net/ynl/lib/ynl.py", line 924, in check_ntf
    op = self.rsp_by_value[nl_msg.cmd()]
KeyError: 19

Parsing Generic Netlink notification messages performs lookup for op in
the message. The message was not yet decoded, and is not yet considered
GenlMsg, thus msg.cmd() returns Generic Netlink family id (19) instead of
proper notification command id (i.e.: DPLL_CMD_PIN_CHANGE_NTF=13).

Allow the op to be obtained within NetlinkProtocol.decode(..) itself if the
op was not passed to the decode function, thus allow parsing of Generic
Netlink notifications without causing the failure.

Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/netdev/m2le0n5xpn.fsf@gmail.com/
Fixes: 0a966d606c ("tools/net/ynl: Fix extack decoding for directional ops")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20240904135034.316033-1-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-05 14:56:45 -07:00
Jamie Bainbridge
e4af74a53b selftests: net: enable bind tests
bind_wildcard is compiled but not run, bind_timewait is not compiled.

These two tests complete in a very short time, use the test harness
properly, and seem reasonable to enable.

The author of the tests confirmed via email that these were
intended to be run.

Enable these two tests.

Fixes: 13715acf8a ("selftest: Add test for bind() conflicts.")
Fixes: 2c042e8e54 ("tcp: Add selftest for bind() and TIME_WAIT.")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/5a009b26cf5fb1ad1512d89c61b37e2fac702323.1725430322.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-05 14:38:15 -07:00
Pu Lehui
95b1c5d178 selftests/bpf: Add description for running vmtest on RV64
Add description in tools/testing/selftests/bpf/README.rst
for running vmtest on RV64.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-11-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:13:40 -07:00
Pu Lehui
b2bc9d5054 selftests/bpf: Add riscv64 configurations to local vmtest
Add riscv64 configurations to local vmtest.

We can now perform cross platform testing for riscv64 bpf using the
following command:

PLATFORM=riscv64 CROSS_COMPILE=riscv64-linux-gnu- vmtest.sh \
    -l ./libbpf-vmtest-rootfs-2024.08.30-noble-riscv64.tar.zst -- \
    ./test_progs -d \
        \"$(cat tools/testing/selftests/bpf/DENYLIST.riscv64 \
            | cut -d'#' -f1 \
            | sed -e 's/^[[:space:]]*//' \
                  -e 's/[[:space:]]*$//' \
            | tr -s '\n' ','\
        )\"

Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-10-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:13:40 -07:00
Pu Lehui
c402cb8580 selftests/bpf: Add DENYLIST.riscv64
This patch adds DENYLIST.riscv64 file for riscv64. It will help BPF CI
and local vmtest to mask failing and unsupported test cases.

We can use the following command to use deny list in local vmtest as
previously mentioned by Manu.

PLATFORM=riscv64 CROSS_COMPILE=riscv64-linux-gnu- vmtest.sh \
    -l ./libbpf-vmtest-rootfs-2024.08.30-noble-riscv64.tar.zst -- \
    ./test_progs -d \
        \"$(cat tools/testing/selftests/bpf/DENYLIST.riscv64 \
            | cut -d'#' -f1 \
            | sed -e 's/^[[:space:]]*//' \
                  -e 's/[[:space:]]*$//' \
            | tr -s '\n' ','\
        )\"

Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-9-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:13:40 -07:00
Pu Lehui
897b368048 selftests/bpf: Add config.riscv64
Add config.riscv64 for both BPF CI and local vmtest.

Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-8-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:13:40 -07:00
Pu Lehui
d95d565190 selftests/bpf: Enable cross platform testing for vmtest
Add support cross platform testing for vmtest. The variable $ARCH in the
current script is platform semantics, not kernel semantics. Rename it to
$PLATFORM so that we can easily use $ARCH in cross-compilation. And drop
`set -u` unbound variable check as we will use CROSS_COMPILE env
variable. For now, Using PLATFORM= and CROSS_COMPILE= options will
enable cross platform testing:

  PLATFORM=<platform> CROSS_COMPILE=<toolchain> vmtest.sh -- ./test_progs

Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-7-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:13:40 -07:00
Pu Lehui
2294073dce selftests/bpf: Support local rootfs image for vmtest
Support vmtest to use local rootfs image generated by [0] that is
consistent with BPF CI. Now we can specify the local rootfs image
through the `-l` parameter like as follows:

  vmtest.sh -l ./libbpf-vmtest-rootfs-2024.08.22-noble-amd64.tar.zst -- ./test_progs

Meanwhile, some descriptions have been flushed.

Link: https://github.com/libbpf/ci/blob/main/rootfs/mkrootfs_debian.sh [0]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-6-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:13:40 -07:00
Pu Lehui
0c3fc330be selftests/bpf: Limit URLS parsing logic to actual scope in vmtest
The URLS array is only valid in the download_rootfs function and does
not need to be parsed globally in advance. At the same time, the logic
of loading rootfs is refactored to prepare vmtest for supporting local
rootfs.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-5-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:13:40 -07:00
Eduard Zingerman
67ab80a018 selftests/bpf: Prefer static linking for LLVM libraries
It is not always convenient to have LLVM libraries installed inside CI
rootfs images, thus request static libraries from llvm-config.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20240905081401.1894789-4-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:13:39 -07:00
Pu Lehui
a48a43884c selftests/bpf: Rename fallback in bpf_dctcp to avoid naming conflict
Recently, when compiling bpf selftests on RV64, the following
compilation failure occurred:

progs/bpf_dctcp.c:29:21: error: redefinition of 'fallback' as different kind of symbol
   29 | volatile const char fallback[TCP_CA_NAME_MAX];
      |                     ^
/workspace/tools/testing/selftests/bpf/tools/include/vmlinux.h:86812:15: note: previous definition is here
 86812 | typedef u32 (*fallback)(u32, const unsigned char *, size_t);

The reason is that the `fallback` symbol has been defined in
arch/riscv/lib/crc32.c, which will cause symbol conflicts when vmlinux.h
is included in bpf_dctcp. Let we rename `fallback` string to
`fallback_cc` in bpf_dctcp to fix this compilation failure.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-3-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:13:39 -07:00
Pu Lehui
dc3a8804d7 selftests/bpf: Adapt OUTPUT appending logic to lower versions of Make
The $(let ...) function is only supported by GNU Make version 4.4 [0]
and above, otherwise the following exception file or directory will be
generated:

	tools/testing/selftests/bpfFEATURE-DUMP.selftests
	tools/testing/selftests/bpffeature/

Considering that the GNU Make version of most Linux distributions is
lower than 4.4, let us adapt the corresponding logic to it.

Link: https://lists.gnu.org/archive/html/info-gnu/2022-10/msg00008.html [0]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-2-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:13:39 -07:00
Lin Yikai
bd4d67f8ae libbpf: fix some typos in libbpf
Hi, fix some spelling errors in libbpf, the details are as follows:

-in the code comments:
	termintaing->terminating
	architecutre->architecture
	requring->requiring
	recored->recoded
	sanitise->sanities
	allowd->allowed
	abover->above
	see bpf_udst_arg()->see bpf_usdt_arg()

Signed-off-by: Lin Yikai <yikai.lin@vivo.com>
Link: https://lore.kernel.org/r/20240905110354.3274546-3-yikai.lin@vivo.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:07:47 -07:00
Lin Yikai
a86857d254 bpftool: fix some typos in bpftool
Hi, fix some spelling errors in bpftool, the details are as follows:

-in file "bpftool-gen.rst"
	libppf->libbpf
-in the code comments:
	ouptut->output

Signed-off-by: Lin Yikai <yikai.lin@vivo.com>
Link: https://lore.kernel.org/r/20240905110354.3274546-2-yikai.lin@vivo.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:07:47 -07:00
Lin Yikai
5db0ba6766 selftests/bpf: fix some typos in selftests
Hi, fix some spelling errors in selftest, the details are as follows:

-in the codes:
	test_bpf_sk_stoarge_map_iter_fd(void)
		->test_bpf_sk_storage_map_iter_fd(void)
	load BTF from btf_data.o->load BTF from btf_data.bpf.o

-in the code comments:
	preample->preamble
	multi-contollers->multi-controllers
	errono->errno
	unsighed/unsinged->unsigned
	egree->egress
	shoud->should
	regsiter->register
	assummed->assumed
	conditiona->conditional
	rougly->roughly
	timetamp->timestamp
	ingores->ignores
	null-termainted->null-terminated
	slepable->sleepable
	implemenation->implementation
	veriables->variables
	timetamps->timestamps
	substitue a costant->substitute a constant
	secton->section
	unreferened->unreferenced
	verifer->verifier
	libppf->libbpf
...

Signed-off-by: Lin Yikai <yikai.lin@vivo.com>
Link: https://lore.kernel.org/r/20240905110354.3274546-1-yikai.lin@vivo.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-05 13:07:47 -07:00
Jiri Olsa
d2520bdb19 selftests/bpf: Add uprobe multi pid filter test for clone-ed processes
The idea is to run same test as for test_pid_filter_process, but instead
of standard fork-ed process we create the process with clone(CLONE_VM..)
to make sure the thread leader process filter works properly in this case.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-5-jolsa@kernel.org
2024-09-05 12:43:23 -07:00
Jiri Olsa
8df43e8594 selftests/bpf: Add uprobe multi pid filter test for fork-ed processes
The idea is to create and monitor 3 uprobes, each trigered in separate
process and make sure the bpf program gets executed just for the proper
PID specified via pid filter.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-4-jolsa@kernel.org
2024-09-05 12:43:22 -07:00
Jiri Olsa
0b0bb45371 selftests/bpf: Add child argument to spawn_child function
Adding child argument to spawn_child function to allow
to create multiple children in following change.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-3-jolsa@kernel.org
2024-09-05 12:43:22 -07:00
zhangjiao
dfc58f467f tools: iio: rm .*.cmd when make clean
rm .*.cmd when make clean

Signed-off-by: zhangjiao <zhangjiao2@cmss.chinamobile.com>
Link: https://patch.msgid.link/20240829053309.10563-1-zhangjiao2@cmss.chinamobile.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-09-05 19:27:13 +01:00
Peter Zijlstra
04b01625da perf/uprobe: split uprobe_unregister()
With uprobe_unregister() having grown a synchronize_srcu(), it becomes
fairly slow to call. Esp. since both users of this API call it in a
loop.

Peel off the sync_srcu() and do it once, after the loop.

We also need to add uprobe_unregister_sync() into uprobe_register()'s
error handling path, as we need to be careful about returning to the
caller before we have a guarantee that partially attached consumer won't
be called anymore. This is an unlikely slow path and this should be
totally fine to be slow in the case of a failed attach.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20240903174603.3554182-6-andrii@kernel.org
2024-09-05 16:56:14 +02:00
Nícolas F. R. A. Prado
05144ab7b7 kselftest: dt: Ignore nodes that have ancestors disabled
Filter out nodes that have one of its ancestors disabled as they aren't
expected to probe.

This removes the following false-positive failures on the
sc7180-trogdor-lazor-limozeen-nots-r5 platform:

/soc@0/geniqup@8c0000/i2c@894000/proximity@28
/soc@0/geniqup@ac0000/spi@a90000/ec@0
/soc@0/remoteproc@62400000/glink-edge/apr
/soc@0/remoteproc@62400000/glink-edge/apr/service@3
/soc@0/remoteproc@62400000/glink-edge/apr/service@4
/soc@0/remoteproc@62400000/glink-edge/apr/service@4/clock-controller
/soc@0/remoteproc@62400000/glink-edge/apr/service@4/dais
/soc@0/remoteproc@62400000/glink-edge/apr/service@7
/soc@0/remoteproc@62400000/glink-edge/apr/service@7/dais
/soc@0/remoteproc@62400000/glink-edge/apr/service@8
/soc@0/remoteproc@62400000/glink-edge/apr/service@8/routing
/soc@0/remoteproc@62400000/glink-edge/fastrpc
/soc@0/remoteproc@62400000/glink-edge/fastrpc/compute-cb@3
/soc@0/remoteproc@62400000/glink-edge/fastrpc/compute-cb@4
/soc@0/remoteproc@62400000/glink-edge/fastrpc/compute-cb@5
/soc@0/spmi@c440000/pmic@0/pon@800/pwrkey

Fixes: 14571ab1ad ("kselftest: Add new test for detecting unprobed Devicetree devices")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Link: https://lore.kernel.org/r/20240729-dt-kselftest-parent-disabled-v2-1-d7a001c4930d@collabora.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-09-05 07:54:16 -05:00
Rafael J. Wysocki
6482439d3d linux-cpupower-6.12-rc1
This cpupower update for Linux 6.12-rc1 consists of an enhancement
 to cpuidle tool to display the residency value of cpuidle states.
 This addition provides a clearer and more detailed view of idle
 state information when using cpuidle-info.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmbY1h0ACgkQCwJExA0N
 QxwNfBAAoK+A2gM1n9ucR+zf3npyiM1RzSyf5EO7WDHeO1KRSE0t7+Vl0avE5Hx+
 45Ih8yigXWhVl05HKSo7uQK8ho2HWOLewvvM50/bFz8Id1vqQ8w46zzqS2Q8h4KG
 5fiKxlBWfhHzIwghTjBPpz4HhZlu70LvZ0zMjxQZZe0mrOTD9KcxpxbqQo77cSJo
 j4+B2yd8D4cQdEIYdeCns12g0Bxaip6ftTiiu3wIb0DGd6biUpcAgk8LlWV8kaFY
 BZ1JSYwSIz+hncetwxqanQmvxDOinbrWamEPpsQL7AfzX5cWmTCO9xVUojxjVgVe
 7NVFWEe86W9wzoorw96aC8J5ms6eXJzm3WRxeSX/p01FMOGIiFkfqyrF87QbKzop
 JMCLsEPHk3BhWWRAuW/5zClgkFEPSBQnXK6RJTtL8Z0hws5OaY1V/8jErTglwf/l
 L3YnAYufFTOtIAMlLY7R6FgqrsWZXXReWkZCzZ7C/VSBqdwIobCmVt/9+an8Y4Gc
 g/as1iqaVLokKZnS8esNTDsnp1Dh9yD41LpZ+aT8SIfLfae9PeIxj3OvRKvgxbbh
 y/aK02PZzDbuz3X2my5Gf5U8QE6stbncuUQcB5j8bSwwNn2yIi1jtAT+VCDt5Dnj
 Pg224O8AJjiCdy7zCCTt5Mgfu1O29uBKRnXZLlzir6w4um1L1O8=
 =fcIj
 -----END PGP SIGNATURE-----

Merge tag 'linux-cpupower-6.12-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shuah/linux

Merge a cpupower utility update for 6.12 from Shuah Khan:

"This cpupower update for Linux 6.12-rc1 consists of an enhancement
 to cpuidle tool to display the residency value of cpuidle states.
 This addition provides a clearer and more detailed view of idle
 state information when using cpuidle-info."

* tag 'linux-cpupower-6.12-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
  tools/cpupower: display residency value in idle-info
2024-09-05 13:07:49 +02:00
Ingo Molnar
95c13662b6 Merge branch 'perf/urgent' into perf/core, to pick up fixes
This also refreshes the -rc1 based branch to -rc5.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-09-05 11:17:43 +02:00
zhang jiao
5e5cc1eb65 tools: hv: rm .*.cmd when make clean
rm .*.cmd when make clean

Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://lore.kernel.org/r/20240902042103.5867-1-zhangjiao2@cmss.chinamobile.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240902042103.5867-1-zhangjiao2@cmss.chinamobile.com>
2024-09-05 07:23:08 +00:00
Pu Lehui
9985742233 libbpf: Fix accessing first syscall argument on RV64
On RV64, as Ilya mentioned before [0], the first syscall parameter should be
accessed through orig_a0 (see arch/riscv64/include/asm/syscall.h),
otherwise it will cause selftests like bpf_syscall_macro, vmlinux,
test_lsm, etc. to fail on RV64. Let's fix it by using the struct pt_regs
style CO-RE direct access.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-1-iii@linux.ibm.com [0]
Link: https://lore.kernel.org/bpf/20240831041934.1629216-5-pulehui@huaweicloud.com
2024-09-04 17:06:39 -07:00
Pu Lehui
4a4c4c0d0a selftests/bpf: Enable test_bpf_syscall_macro: Syscall_arg1 on s390 and arm64
Considering that CO-RE direct read access to the first system call
argument is already available on s390 and arm64, let's enable
test_bpf_syscall_macro:syscall_arg1 on these architectures.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-4-pulehui@huaweicloud.com
2024-09-04 17:06:09 -07:00
Pu Lehui
9ab94078e8 libbpf: Access first syscall argument with CO-RE direct read on arm64
Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
into account the read pt_regs comes directly from the context, let's use
CO-RE direct read to access the first system call argument.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-3-pulehui@huaweicloud.com
2024-09-04 17:06:06 -07:00
Pu Lehui
e4db2a821b libbpf: Access first syscall argument with CO-RE direct read on s390
Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
into account the read pt_regs comes directly from the context, let's use
CO-RE direct read to access the first system call argument.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-2-pulehui@huaweicloud.com
2024-09-04 17:05:30 -07:00
Chen Ni
6ffa72acc9 selftests: net: convert comma to semicolon
Replace comma between expressions with semicolons.

Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.

Found by inspection.
No functional change intended.
Compile tested only.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240904014441.1065753-1-nichen@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-04 16:55:49 -07:00
Yonghong Song
eff5b5fffc selftests/bpf: Add a selftest for x86 jit convergence issues
The core part of the selftest, i.e., the je <-> jmp cycle, mimics the
original sched-ext bpf program. The test will fail without the
previous patch.

I tried to create some cases for other potential cycles
(je <-> je, jmp <-> je and jmp <-> jmp) with similar pattern
to the test in this patch, but failed. So this patch
only contains one test for je <-> jmp cycle.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240904221256.37389-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-04 16:46:22 -07:00
Tejun Heo
649e980dad Merge branch 'bpf/master' into for-6.12
Pull bpf/master to receive baebe9aaba ("bpf: allow passing struct
bpf_iter_<type> as kfunc arguments") and related changes in preparation for
the DSQ iterator patchset.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-04 11:41:32 -10:00
Masami Hiramatsu (Google)
f0a6ecebd8 selftests/ftrace: Fix eventfs ownership testcase to find mount point
Fix eventfs ownership testcase to find mount point if stat -c "%m" failed.
This can happen on the system based on busybox. In this case, this will
try to use the current working directory, which should be a tracefs top
directory (and eventfs is mounted as a part of tracefs.)
If it does not work, the test is skipped as UNRESOLVED because of
the environmental problem.

Fixes: ee9793be08 ("tracing/selftests: Add ownership modification tests for eventfs")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-04 15:08:18 -06:00
Tejun Heo
a4103eacc2 sched_ext: Add a cgroup scheduler which uses flattened hierarchy
This patch adds scx_flatcg example scheduler which implements hierarchical
weight-based cgroup CPU control by flattening the cgroup hierarchy into a
single layer by compounding the active weight share at each level.

This flattening of hierarchy can bring a substantial performance gain when
the cgroup hierarchy is nested multiple levels. in a simple benchmark using
wrk[8] on apache serving a CGI script calculating sha1sum of a small file,
it outperforms CFS by ~3% with CPU controller disabled and by ~10% with two
apache instances competing with 2:1 weight ratio nested four level deep.

However, the gain comes at the cost of not being able to properly handle
thundering herd of cgroups. For example, if many cgroups which are nested
behind a low priority parent cgroup wake up around the same time, they may
be able to consume more CPU cycles than they are entitled to. In many use
cases, this isn't a real concern especially given the performance gain.
Also, there are ways to mitigate the problem further by e.g. introducing an
extra scheduling layer on cgroup delegation boundaries.

v5: - Updated to specify SCX_OPS_HAS_CGROUP_WEIGHT instead of
      SCX_OPS_KNOB_CGROUP_WEIGHT.

v4: - Revert reference counted kptr for cgv_node as the change caused easily
      reproducible stalls.

v3: - Updated to reflect the core API changes including ops.init/exit_task()
      and direct dispatch from ops.select_cpu(). Fixes and improvements
      including additional statistics.

    - Use reference counted kptr for cgv_node instead of xchg'ing against
      stash location.

    - Dropped '-p' option.

v2: - Use SCX_BUG[_ON]() to simplify error handling.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
2024-09-04 10:24:59 -10:00
Tejun Heo
8195136669 sched_ext: Add cgroup support
Add sched_ext_ops operations to init/exit cgroups, and track task migrations
and config changes. A BPF scheduler may not implement or implement only
subset of cgroup features. The implemented features can be indicated using
%SCX_OPS_HAS_CGOUP_* flags. If cgroup configuration makes use of features
that are not implemented, a warning is triggered.

While a BPF scheduler is being enabled and disabled, relevant cgroup
operations are locked out using scx_cgroup_rwsem. This avoids situations
like task prep taking place while the task is being moved across cgroups,
making things easier for BPF schedulers.

v7: - cgroup interface file visibility toggling is dropped in favor just
      warning messages. Dynamically changing interface visiblity caused more
      confusion than helping.

v6: - Updated to reflect the removal of SCX_KF_SLEEPABLE.

    - Updated to use CONFIG_GROUP_SCHED_WEIGHT and fixes for
      !CONFIG_FAIR_GROUP_SCHED && CONFIG_EXT_GROUP_SCHED.

v5: - Flipped the locking order between scx_cgroup_rwsem and
      cpus_read_lock() to avoid locking order conflict w/ cpuset. Better
      documentation around locking.

    - sched_move_task() takes an early exit if the source and destination
      are identical. This triggered the warning in scx_cgroup_can_attach()
      as it left p->scx.cgrp_moving_from uncleared. Updated the cgroup
      migration path so that ops.cgroup_prep_move() is skipped for identity
      migrations so that its invocations always match ops.cgroup_move()
      one-to-one.

v4: - Example schedulers moved into their own patches.

    - Fix build failure when !CONFIG_CGROUP_SCHED, reported by Andrea Righi.

v3: - Make scx_example_pair switch all tasks by default.

    - Convert to BPF inline iterators.

    - scx_bpf_task_cgroup() is added to determine the current cgroup from
      CPU controller's POV. This allows BPF schedulers to accurately track
      CPU cgroup membership.

    - scx_example_flatcg added. This demonstrates flattened hierarchy
      implementation of CPU cgroup control and shows significant performance
      improvement when cgroups which are nested multiple levels are under
      competition.

v2: - Build fixes for different CONFIG combinations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Andrea Righi <andrea.righi@canonical.com>
2024-09-04 10:24:59 -10:00
Feng Yang
23457b37ec selftests: bpf: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE
The ARRAY_SIZE macro is more compact and more formal in linux source.

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240903072559.292607-1-yangfeng59949@163.com
2024-09-04 12:58:46 -07:00
Jeongjun Park
7430708947 selftests/bpf: Add a selftest to check for incorrect names
Add selftest for cases where btf_name_valid_section() does not properly
check for certain types of names.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Link: https://lore.kernel.org/r/20240831054742.364585-1-aha310510@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
2024-09-04 12:34:19 -07:00
Aditya Gupta
35439fe4e2 perf check: Fix inconsistencies in feature names
Fix two inconsistencies in feature names as discussed in [1]:

1. Rename "dwarf-unwind-support" to "dwarf-unwind"

2. 'get_cpuid' feature and 'HAVE_AUXTRACE_SUPPORT' names don't
   look related, change the feature name to 'auxtrace' to match the
   macro name, as 'get_cpuid' string is not used anywhere to check the
   feature presence

[1]: https://lore.kernel.org/linux-perf-users/ZoRw5we4HLSTZND6@x1/

Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904190132.415212-7-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04 16:19:53 -03:00
Athira Rajeev
512fcf7d9d perf tests probe_vfs_getname.sh: Update to use 'perf check feature'
In probe_vfs_getname.sh, current we use "perf record --dry-run"
to check for libtraceevent and skip the test if perf is not
build with libtraceevent. Change the check to use "perf check feature"
option

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904190132.415212-6-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04 16:19:52 -03:00
Aditya Gupta
8a028502b4 perf tools test_task_analyzer.sh: Update to use 'perf check feature'
Currently we use output of 'perf version --build-options', to check
whether perf was built with libtraceevent support.

Instead, use 'perf check feature libtraceevent' to check for
libtraceevent support.

Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904190132.415212-5-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04 16:19:33 -03:00
Aditya Gupta
6cdd7750de perf version: Update --build-options to use 'supported_features' array
Now that the feature list has been duplicated in a global
'supported_features' array, use that array instead of manually checking
status of built-in features.

This helps in being consistent with commands such as 'perf check feature',
so commands can use the same array, and any new feature can be added at
one place, in the 'supported_features' array

Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904190132.415212-4-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04 16:19:29 -03:00
Linus Torvalds
2adad548f7 Small perf tools fixes for v6.11
A number of small fixes for the late cycle:
 
 * Two more build fixes on 32-bit archs
 * Fixed a segfault during perf test
 * Fixed spinlock/rwlock accounting bug in perf lock contention
 
 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZtil/AAKCRCMstVUGiXM
 g1p5AQCmfXr4T/CCVX82ExpHYYtsrbcQrmqMolqsGlN4J3I9EwD9EnSNIEkxy44o
 bOrHRAzABpMTbhU8zVb2Mi+Gy+iccgc=
 =SFTw
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.11-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Namhyung Kim:
 "A number of small fixes for the late cycle:

   - Two more build fixes on 32-bit archs

   - Fixed a segfault during perf test

   - Fixed spinlock/rwlock accounting bug in perf lock contention"

* tag 'perf-tools-fixes-for-v6.11-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf daemon: Fix the build on more 32-bit architectures
  perf python: include "util/sample.h"
  perf lock contention: Fix spinlock and rwlock accounting
  perf test pmu: Set uninitialized PMU alias to null
2024-09-04 12:10:19 -07:00
Daniel Jordan
2351e8c654 ktest.pl: Avoid false positives with grub2 skip regex
Some distros have grub2 config files with the lines

    if [ x"${feature_menuentry_id}" = xy ]; then
      menuentry_id_option="--id"
    else
      menuentry_id_option=""
    fi

which match the skip regex defined for grub2 in get_grub_index():

    $skip = '^\s*menuentry';

These false positives cause the grub number to be higher than it
should be, and the wrong kernel can end up booting.

Grub documents the menuentry command with whitespace between it and the
title, so make the skip regex reflect this.

Link: https://lore.kernel.org/20240904175530.84175-1-daniel.m.jordan@oracle.com
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: John 'Warthog9' Hawley (Tenstorrent) <warthog9@eaglescrag.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2024-09-04 15:06:28 -04:00
Steven Rostedt
d441734d0c ktest.pl: Always warn on build warnings
If a warning happens at build, give a warning at the end:

  Build time:   1 minute 40 seconds
  Install time: 17 seconds
  Reboot time:  25 seconds

  *** WARNING found in build: 1 ***

  *******************************************
  *******************************************
  KTEST RESULT: TEST 1 SUCCESS!!!!   **
  *******************************************
  *******************************************

This way, even if the test isn't made to fail on warnings during the
build, a message is still displayed that warnings were found.

Link: https://lore.kernel.org/<20240819172028.3a7fae09@gandalf.local.home>
Acked-by: John 'Warthog9' Hawley (Tenstorrent) <warthog9@eaglescrag.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2024-09-04 15:05:48 -04:00
Yuan Chen
02baa0a2a6 selftests/bpf: Fix procmap_query()'s params mismatch and compilation warning
When the PROCMAP_QUERY is not defined, a compilation error occurs due to the
mismatch of the procmap_query()'s params, procmap_query() only be called in
the file where the function is defined, modify the params so they can match.

We get a warning when build samples/bpf:
    trace_helpers.c:252:5: warning: no previous prototype for ‘procmap_query’ [-Wmissing-prototypes]
      252 | int procmap_query(int fd, const void *addr, __u32 query_flags, size_t *start, size_t *offset, int *flags)
          |     ^~~~~~~~~~~~~
As this function is only used in the file, mark it as 'static'.

Fixes: 4e9e07603e ("selftests/bpf: make use of PROCMAP_QUERY ioctl if available")
Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
Link: https://lore.kernel.org/r/20240903012839.3178-1-chenyuan_fl@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-04 11:52:44 -07:00
zhang jiao
dce35dd276
spi: spidev_fdx: Fix the wrong format specifier
The unsigned int should use "%u" instead of "%d".

Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com>
Link: https://patch.msgid.link/20240904073550.103618-1-zhangjiao2@cmss.chinamobile.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-04 16:50:33 +01:00
Ian Rogers
9b2b9b66d5 perf jevents: Add cpuid to model lookup command
When restricting jevents generated json lookup code with JEVENTS_MODEL
a list of models must be provided. Some builds don't know model names
but know cpuids. Add a command that can convert a cpuid to a model
using mapfile.csv files. This can be used with JEVENTS_MODEL like:

  $ make JEVENTS_MODEL=`./pmu-events/models.py x86 'GenuineIntel-6-8D-1,AuthenticAMD-26-1' pmu-events/arch/`

Committer testing:

  $ tools/perf/pmu-events/models.py x86 'GenuineIntel-6-8D-1,AuthenticAMD-26-1' tools/perf/pmu-events/arch/
  tigerlake,amdzen5
  $ perf stat -v sleep 1 |& head -1
  Using CPUID GenuineIntel-6-B7-1
  $ tools/perf/pmu-events/models.py x86 'GenuineIntel-6-B7-1' tools/perf/pmu-events/arch/
  alderlake
  $

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240904044351.712080-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04 10:43:18 -03:00
Aditya Gupta
98ad0b7732 perf check: Introduce 'check' subcommand
Currently the presence of a feature is checked with a combination of
perf version --build-options and greps, such as:

    perf version --build-options | grep " on .* HAVE_FEATURE"

Instead of this, introduce a subcommand "perf check feature", with which
scripts can test for presence of a feature, such as:

    perf check feature HAVE_FEATURE

'perf check feature' command is expected to have exit status of 0 if
feature is built-in, and 1 if it's not built-in or if feature is not known.

Multiple features can also be passed as a comma-separated list, in which
case the exit status will be 1 only if all of the passed features are
built-in. For example, with below command, it will have exit status of 0
only if both libtraceevent and bpf are enabled, else 1 in all other cases

    perf check feature libtraceevent,bpf

The arguments are case-insensitive.
An array 'supported_features' has also been introduced that can be used by
other commands like 'perf version --build-options', so that new features
can be added in one place, with the array

Committer testing:

  $ perf check feature libtraceevent,bpf
           libtraceevent: [ on  ]  # HAVE_LIBTRACEEVENT
                     bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
  $ perf check feature libtraceevent
           libtraceevent: [ on  ]  # HAVE_LIBTRACEEVENT
  $ perf check feature bpf
                     bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
  $ perf check -q feature bpf && echo "BPF support is present"
  BPF support is present
  $ perf check -q feature Bogus && echo "Bogus support is present"
  $

Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904061836.55873-3-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04 09:56:05 -03:00
Aditya Gupta
1a5efc9e13 libsubcmd: Don't free the usage string
Currently, commands which depend on 'parse_options_subcommand()' don't
show the usage string, and instead show '(null)'

    $ ./perf sched
	Usage: (null)

    -D, --dump-raw-trace  dump raw trace in ASCII
    -f, --force           don't complain, do it
    -i, --input <file>    input file name
    -v, --verbose         be more verbose (show symbol address, etc)

'parse_options_subcommand()' is generally expected to initialise the usage
string, with information in the passed 'subcommands[]' array

This behaviour was changed in:

  230a7a71f9 ("libsubcmd: Fix parse-options memory leak")

Where the generated usage string is deallocated, and usage[0] string is
reassigned as NULL.

As discussed in [1], free the allocated usage string in the main
function itself, and don't reset usage string to NULL in
parse_options_subcommand

With this change, the behaviour is restored.

    $ ./perf sched
        Usage: perf sched [<options>] {record|latency|map|replay|script|timehist}

           -D, --dump-raw-trace  dump raw trace in ASCII
           -f, --force           don't complain, do it
           -i, --input <file>    input file name
           -v, --verbose         be more verbose (show symbol address, etc)

[1]: https://lore.kernel.org/linux-perf-users/htq5vhx6piet4nuq2mmhk7fs2bhfykv52dbppwxmo3s7du2odf@styd27tioc6e/

Fixes: 230a7a71f9 ("libsubcmd: Fix parse-options memory leak")
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904061836.55873-2-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04 09:54:24 -03:00
Ian Rogers
fa6cc3f932 perf parse-events: Vary default_breakpoint_len on i386 and arm64
On arm64 the breakpoint length should be 4-bytes but 8-bytes is
tolerated as perf passes that as sizeof(long). Just pass the correct
value.

On i386 the sizeof(long) check in the kernel needs to match the
kernel's long size. Check using an environment (uname checks) whether
4 or 8 bytes needs to be passed. Cache the value in a static.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240904050606.752788-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04 09:50:46 -03:00
Ian Rogers
70b27c756f perf parse-events: Add default_breakpoint_len helper
The default breakpoint length is "sizeof(long)" however this is
incorrect on platforms like Aarch64 where sizeof(long) is 8 but the
breakpoint length is 4. Add a helper function that can be used to
determine the correct breakpoint length, in this change it just
returns the existing default sizeof(long) value.

Use the helper in the bp_account test so that, when modifying the
event from a watchpoint to a breakpoint, the breakpoint length is
appropriate for the architecture and not just sizeof(long).

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240904050606.752788-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04 09:49:09 -03:00
Amit Vadhavana
387ce37ed5 pm-graph: Update directory handling and installation process in Makefile
- Standardize directory variables to support more flexible installations.
 - Add copyright and licensing information to the Makefile.
 - Introduce ".PHONY" declarations to ensure that specific targets are always
   executed, regardless of the presence of files with matching names.
 - Add a help target to provide usage instructions.

Signed-off-by: Amit Vadhavana <av2082000@gmail.com>
Acked-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Link: https://patch.msgid.link/Update directory handling and installation process in Makefile
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-09-04 14:34:54 +02:00
Yo-Jung (Leo) Lin
dd7c445beb pm-graph: Make git ignore sleepgraph.py artifacts
By default, sleepgraph.py creates suspend-{date}-{time} directories
to store artifacts, or suspend-{date}-{time}-xN if the --multi option
is used.

Ignore those directories by adding a .gitignore file.

Signed-off-by: Yo-Jung (Leo) Lin <0xff07@gmail.com>
Acked-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Link: https://patch.msgid.link/20240825095353.7578-1-0xff07@gmail.com
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-09-04 14:32:09 +02:00
Jason Xing
5c26516f09 selftests: add selftest for UDP SO_PEEK_OFF support
Add the SO_PEEK_OFF selftest for UDP. In this patch, I mainly do
three things:
1. rename tcp_so_peek_off.c
2. adjust for UDP protocol
3. add selftests into it

Suggested-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-04 13:10:43 +01:00
Joey Gouly
6a428d6371 kselftest/arm64: Add test case for POR_EL0 signal frame records
Ensure that we get signal context for POR_EL0 if and only if POE is present
on the system.

Copied from the TPIDR2 test.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240822151113.1479789-30-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 12:54:07 +01:00
Joey Gouly
d3c6e5b109 kselftest/arm64: parse POE_MAGIC in a signal frame
Teach the signal frame parsing about the new POE frame, avoids warning when it
is generated.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240822151113.1479789-29-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 12:54:06 +01:00
Joey Gouly
fabf056278 kselftest/arm64: add HWCAP test for FEAT_S1POE
Check that when POE is enabled, the POR_EL0 register is accessible.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240822151113.1479789-28-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 12:54:06 +01:00
Joey Gouly
f5b5ea51f7 selftests: mm: make protection_keys test work on arm64
The encoding of the pkey register differs on arm64, than on x86/ppc. On those
platforms, a bit in the register is used to disable permissions, for arm64, a
bit enabled in the register indicates that the permission is allowed.

This drops two asserts of the form:
	 assert(read_pkey_reg() <= orig_pkey_reg);
Because on arm64 this doesn't hold, due to the encoding.

The pkey must be reset to both access allow and write allow in the signal
handler. pkey_access_allow() works currently for PowerPC as the
PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE have overlapping bits set.

Access to the uc_mcontext is abstracted, as arm64 has a different structure.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20240822151113.1479789-27-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 12:54:06 +01:00
Joey Gouly
41bbcf7b4b selftests: mm: move fpregs printing
arm64's fpregs are not at a constant offset from sigcontext. Since this is
not an important part of the test, don't print the fpregs pointer on arm64.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20240822151113.1479789-26-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 12:54:06 +01:00
Joey Gouly
6354a0184c kselftest/arm64: move get_header()
Put this function in the header so that it can be used by other tests, without
needing to link to testcases.c.

This will be used by selftest/mm/protection_keys.c

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240822151113.1479789-25-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 12:54:06 +01:00
Joey Gouly
487355f111 KVM: selftests: get-reg-list: add Permission Overlay registers
Add new system registers:
  - POR_EL1
  - POR_EL0

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240822151113.1479789-31-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 12:52:39 +01:00
Tejun Heo
7c65ae81ea sched_ext: Don't call put_prev_task_scx() before picking the next task
fd03c5b858 ("sched: Rework pick_next_task()") changed the definition of
pick_next_task() from:

  pick_next_task() := pick_task() + set_next_task(.first = true)

to:

  pick_next_task(prev) := pick_task() + put_prev_task() + set_next_task(.first = true)

making invoking put_prev_task() pick_next_task()'s responsibility. This
reordering allows pick_task() to be shared between regular and core-sched
paths and put_prev_task() to know the next task.

sched_ext depended on put_prev_task_scx() enqueueing the current task before
pick_next_task_scx() is called. While pulling sched/core changes,
70cc76aa0d80 ("Merge branch 'tip/sched/core' into for-6.12") added an
explicit put_prev_task_scx() call for SCX tasks in pick_next_task_scx()
before picking the first task as a workaround.

Clean it up and adopt the conventions that other sched classes are
following.

The operation of keeping running the current task was spread and required
the task to be put on the local DSQ before picking:

  - balance_one() used SCX_TASK_BAL_KEEP to indicate that the task is still
    runnable, hasn't exhausted its slice, and thus should keep running.

  - put_prev_task_scx() enqueued the task to local DSQ if SCX_TASK_BAL_KEEP
    is set. It also called do_enqueue_task() with SCX_ENQ_LAST if it is the
    only runnable task. do_enqueue_task() in turn decided whether to use the
    local DSQ depending on SCX_OPS_ENQ_LAST.

Consolidate the logic in balance_one() as it always knows whether it is
going to keep the current task. balance_one() now considers all conditions
where the current task should be kept and uses SCX_TASK_BAL_KEEP to tell
pick_next_task_scx() to keep the current task instead of picking one from
the local DSQ. Accordingly, SCX_ENQ_LAST handling is removed from
put_prev_task_scx() and do_enqueue_task() and pick_next_task_scx() is
updated to pick the current task if SCX_TASK_BAL_KEEP is set.

The workaround put_prev_task[_scx]() calls are replaced with
put_prev_set_next_task().

This causes two behavior changes observable from the BPF scheduler:

- When a task keep running, it no longer goes through enqueue/dequeue cycle
  and thus ops.stopping/running() transitions. The new behavior is better
  and all the existing schedulers should be able to handle the new behavior.

- The BPF scheduler cannot keep executing the current task by enqueueing
  SCX_ENQ_LAST task to the local DSQ. If SCX_OPS_ENQ_LAST is specified, the
  BPF scheduler is responsible for resuming execution after each
  SCX_ENQ_LAST. SCX_OPS_ENQ_LAST is mostly useful for cases where scheduling
  decisions are not made on the local CPU - e.g. central or userspace-driven
  schedulin - and the new behavior is more logical and shouldn't pose any
  problems. SCX_OPS_ENQ_LAST demonstration from scx_qmap is dropped as it
  doesn't fit that well anymore and the last task handling is moved to the
  end of qmap_dispatch().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Cc: Changwoo Min <multics69@gmail.com>
Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
2024-09-03 21:54:28 -10:00
SeongJae Park
8c211412c5 selftests/damon: add execute permissions to test scripts
Some test scripts are missing executable permissions.  It causes warnings
that make the test output unnecessarily verbose.  Add executable
permissions.

Link: https://lkml.kernel.org/r/20240827030336.7930-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:57 -07:00
SeongJae Park
582c04b07f selftests/damon: cleanup __pycache__/ with 'make clean'
Python-based tests creates __pycache__/ directory.  Remove it with 'make
clean' by defining it as EXTRA_CLEAN.

Link: https://lkml.kernel.org/r/20240827030336.7930-3-sj@kernel.org
Fixes: b5906f5f73 ("selftests/damon: add a test for update_schemes_tried_regions sysfs command")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:56 -07:00
SeongJae Park
9cb75552f4 selftests/damon: add access_memory_even to .gitignore
Patch series "misc fixups for DAMON {self,kunit} tests".

This patchset is for minor fixups of DAMON selftests and kunit tests. 
First three patches make DAMON selftests more cleanly maintained (patches
1 and 2) without unnecessary warnings (patch 3).  Following six patches
remove unnecessary test case (patch 4), handle configs combinations that
can make tests fail (patches 5-7), reorganize the test files following the
new guideline (patch 8), and add reference kunitconfig for DAMON kunit
tests (patch 9).


This patch (of 9):

DAMON selftests build access_memory_even, but its not on the .gitignore
list.  Add it to make 'git status' output cleaner.

Link: https://lkml.kernel.org/r/20240827030336.7930-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240827030336.7930-2-sj@kernel.org
Fixes: c94df805c7 ("selftests/damon: implement a program for even-numbered memory regions access")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:56 -07:00
Lorenzo Stoakes
01c373e9a5 mm: rework vm_ops->close() handling on VMA merge
In commit 714965ca82 ("mm/mmap: start distinguishing if vma can be
removed in mergeability test") we relaxed the VMA merge rules for VMAs
possessing a vm_ops->close() hook, permitting this operation in instances
where we wouldn't delete the VMA as part of the merge operation.

This was later corrected in commit fc0c8f9089 ("mm, mmap: fix
vma_merge() case 7 with vma_ops->close") to account for a subtle case that
the previous commit had not taken into account.

In both instances, we first rely on is_mergeable_vma() to determine
whether we might be dealing with a VMA that might be removed, taking
advantage of the fact that a 'previous' VMA will never be deleted, only
VMAs that follow it.

The second patch corrects the instance where a merge of the previous VMA
into a subsequent one did not correctly check whether the subsequent VMA
had a vm_ops->close() handler.

Both changes prevent merge cases that are actually permissible (for
instance a merge of a VMA into a following VMA with a vm_ops->close(), but
with no previous VMA, which would result in the next VMA being extended,
not deleted).

In addition, both changes fail to consider the case where a VMA that would
otherwise be merged with the previous and next VMA might have
vm_ops->close(), on the assumption that for this to be the case, all three
would have to have the same vma->vm_file to be mergeable and thus the same
vm_ops.

And in addition both changes operate at 50,000 feet, trying to guess
whether a VMA will be deleted.

As we have majorly refactored the VMA merge operation and de-duplicated
code to the point where we know precisely where deletions will occur, this
patch removes the aforementioned checks altogether and instead explicitly
checks whether a VMA will be deleted.

In cases where a reduced merge is still possible (where we merge both
previous and next VMA but the next VMA has a vm_ops->close hook, meaning
we could just merge the previous and current VMA), we do so, otherwise the
merge is not permitted.

We take advantage of our userland testing to assert that this functions
correctly - replacing the previous limited vm_ops->close() tests with
tests for every single case where we delete a VMA.

We also update all testing for both new and modified VMAs to set
vma->vm_ops->close() in every single instance where this would not prevent
the merge, to assert that we never do so.

Link: https://lkml.kernel.org/r/9f96b8cfeef3d14afabddac3d6144afdfbef2e22.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:55 -07:00
Lorenzo Stoakes
cc8cb3697a mm: refactor vma_merge() into modify-only vma_merge_existing_range()
The existing vma_merge() function is no longer required to handle what
were previously referred to as cases 1-3 (i.e.  the merging of a new VMA),
as this is now handled by vma_merge_new_vma().

Additionally, simplify the convoluted control flow of the original,
maintaining identical logic only expressed more clearly and doing away
with a complicated set of cases, rather logically examining each possible
outcome - merging of both the previous and subsequent VMA, merging of the
previous VMA and merging of the subsequent VMA alone.

We now utilise the previously implemented commit_merge() function to share
logic with vma_expand() de-duplicating code and providing less surface
area for bugs and confusion.  In order to do so, we adjust this function
to accept parameters specific to merging existing ranges.

Link: https://lkml.kernel.org/r/2cf6016b7bfcc4965fc3cde10827560c42e4f12c.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:55 -07:00
Lorenzo Stoakes
cacded5e42 mm: avoid using vma_merge() for new VMAs
Abstract vma_merge_new_vma() to use vma_merge_struct and rename the
resultant function vma_merge_new_range() to be clear what the purpose of
this function is - a new VMA is desired in the specified range, and we
wish to see if it is possible to 'merge' surrounding VMAs into this range
rather than having to allocate a new VMA.

Note that this function uses vma_extend() exclusively, so adopts its
requirement that the iterator point at or before the gap.  We add an
assert to this effect.

This is as opposed to vma_merge_existing_range(), which will be introduced
in a subsequent commit, and provide the same functionality for cases in
which we are modifying an existing VMA.

In mmap_region() and do_brk_flags() we open code scenarios where we prefer
to use vma_expand() rather than invoke a full vma_merge() operation.

Abstract this logic and eliminate all of the open-coding, and also use the
same logic for all cases where we add new VMAs to, rather than ultimately
use vma_merge(), rather use vma_expand().

Doing so removes duplication and simplifies VMA merging in all such cases,
laying the ground for us to eliminate the merging of new VMAs in
vma_merge() altogether.

Also add the ability for the vmg to track state, and able to report
errors, allowing for us to differentiate a failed merge from an inability
to allocate memory in callers.

This makes it far easier to understand what is happening in these cases
avoiding confusion, bugs and allowing for future optimisation.

Also introduce vma_iter_next_rewind() to allow for retrieval of the next,
and (optionally) the prev VMA, rewinding to the start of the previous gap.

Introduce are_anon_vmas_compatible() to abstract individual VMA anon_vma
comparison for the case of merging on both sides where the anon_vma of the
VMA being merged maybe compatible with prev and next, but prev and next's
anon_vma's may not be compatible with each other.

Finally also introduce can_vma_merge_left() / can_vma_merge_right() to
check adjacent VMA compatibility and that they are indeed adjacent.

Link: https://lkml.kernel.org/r/49d37c0769b6b9dc03b27fe4d059173832556392.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:54 -07:00
Lorenzo Stoakes
fc21959f74 mm: abstract vma_expand() to use vma_merge_struct
The purpose of the vmg is to thread merge state through functions and
avoid egregious parameter lists.  We expand this to vma_expand(), which is
used for a number of merge cases.

Accordingly, adjust its callers, mmap_region() and relocate_vma_down(), to
use a vmg.

An added purpose of this change is the ability in a future commit to
perform all new VMA range merging using vma_expand().

Link: https://lkml.kernel.org/r/4bc8c9dbc9ca52452ef8e587b28fe555854ceb38.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:54 -07:00
Lorenzo Stoakes
2f1c6611b0 mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()
Rather than passing around huge numbers of parameters to numerous helper
functions, abstract them into a single struct that we thread through the
operation, the vma_merge_struct ('vmg').

Adjust vma_merge() and vma_modify() to accept this parameter, as well as
predicate functions can_vma_merge_before(), can_vma_merge_after(), and the
vma_modify_...() helper functions.

Also introduce VMG_STATE() and VMG_VMA_STATE() helper macros to allow for
easy vmg declaration.

We additionally remove the requirement that vma_merge() is passed a VMA
object representing the candidate new VMA.  Previously it used this to
obtain the mm_struct, file and anon_vma properties of the proposed range
(a rather confusing state of affairs), which are now provided by the vmg
directly.

We also remove the pgoff calculation previously performed vma_modify(),
and instead calculate this in VMG_VMA_STATE() via the vma_pgoff_offset()
helper.

Link: https://lkml.kernel.org/r/a955aad09d81329f6fbeb636b2dd10cde7b73dab.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:53 -07:00
Lorenzo Stoakes
955db39676 tools: add VMA merge tests
Add a variety of VMA merge unit tests to assert that the behaviour of VMA
merge is correct at an abstract level and VMAs are merged or not merged as
expected.

These are intentionally added _before_ we start refactoring vma_merge() in
order that we can continually assert correctness throughout the rest of
the series.

In order to reduce churn going forward, we backport the vma_merge_struct
data type to the test code which we introduce and use in a future commit,
and add wrappers around the merge new and existing VMA cases.

Link: https://lkml.kernel.org/r/1c7a0b43cfad2c511a6b1b52f3507696478ff51a.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:53 -07:00
Lorenzo Stoakes
4e52a60ac5 tools: improve vma test Makefile
Patch series "mm: remove vma_merge()", v3.

The infamous vma_merge() function has been the cause of a great deal of
pain, bugs and confusion for a very long time.

It is subtle, contains many corner cases, tries to do far too much and is
as a result very fragile.

The fact that the function requires there to be a numbering system to
cover each possible eventuality with references to each in the many
branches of its implementation as to which case you are looking at speaks
to all this.

Some of this complexity is inherent - unfortunately there is no getting
away from the need to figure out precisely how to execute the merge,
whether we need to remove VMAs, whether it is safe to do so, what
constitutes a mergeable VMA and so on.

However, a lot of the complexity is not inherent but instead a product of
the function's 'organic' development.

Liam has gone to great lengths to improve the situation as a part of his
maple tree implementation, greatly improving the readability of the code,
and Vlastimil and myself have additionally gone to lengths to try to
improve things further.

However, with the availability of userland VMA testing, it now becomes
possible to perform a rather more significant refactoring while
maintaining confidence in its correct operation.

An attempt was previously made by Vlastimil [0] to eliminate vma_merge(),
however it was rather - brutal - and an astute reader might refer to the
date of that patch for insight as to its intent.

This series instead divides merge operations into two natural kinds -
merges which occur when a NEW vma is being added to the address space, and
merges which occur when a vma is being MODIFIED.

Happily, the vma_expand() function introduced by Liam, which has the
capacity for also deleting a subsequent VMA, covers each of the NEW vma
cases.

By abstracting the actual final commit of changes to a VMA to its own
function, commit_merge() and writing a wrapper around vma_expand() for new
VMA cases vma_merge_new_range(), we can avoid having to use vma_merge()
for these instances altogether.

By doing so we are also able to then de-duplicate all existing merge logic
in mmap_region() and do_brk_flags() and have everything invoke this new
function, so we universally take the same approach to merging new VMAs.

Having done so, we can then completely rework vma_merge() into
vma_merge_existing_range() and use this for the instances where a merge is
proposed for a region of an existing VMA.

This eliminates vma_merge() and its numbered cases and instead divides
things into logical cases - merge both, merge left, merge right (the
latter 2 being either partial or full merges).

The code is heavily annotated with ASCII diagrams and greatly simplified
in comparison to the existing vma_merge() function.

Having made this change, we take the opportunity to address an issue with
merging VMAs possessing a vm_ops->close() hook - commit 714965ca82
("mm/mmap: start distinguishing if vma can be removed in mergeability
test") and commit fc0c8f9089 ("mm, mmap: fix vma_merge() case 7 with
vma_ops->close") make efforts to relax how we handle these, making
assumptions about which VMAs might end up deleted (and thus, if possessing
a vm_ops->close() hook, cannot be).

This refactor means we do not need to guess, so instead explicitly only
disallow merge in instances where a VMA with a vm_ops->close() hook would
be deleted (and try a smaller merge in cases where this is possible).

In addition to these changes, we introduce a new vma_merge_struct
abstraction to allow VMA merge state to be threaded through the operation
neatly.

There is heavy unit testing provided for all merge functionality, added
prior to the refactoring, allowing for before/after testing.

The vm_ops->close() change also introduces exhaustive testing to
demonstrate that this functions as expected, and in addition to this the
reproduction code from commit fc0c8f9089 ("mm, mmap: fix vma_merge()
case 7 with vma_ops->close") was tested and confirmed passing.

[0]:https://lore.kernel.org/linux-mm/20240401192623.18575-2-vbabka@suse.cz/


This patch (of 10):

Have vma.o depend on its source dependencies explicitly, as previously
these were simply being ignored as existing object files were up to date.

This now correctly re-triggers the build if mm/ source is changed as well
as local source code.

Also set clean as a phony rule.

Link: https://lkml.kernel.org/r/cover.1725040657.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/e3ea58f08364ae5432c9a074de0195a7c7e0b04a.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:53 -07:00
Mike Yuan
fd06ce2ce4 selftests: test_zswap: add test for hierarchical zswap.writeback
Ensure that zswap.writeback check goes up the cgroup tree, i.e.  is
hierarchical.  Create a subcgroup which has zswap.writeback set to 1, and
the upper hierarchy's restrictions shall apply.

Link: https://lkml.kernel.org/r/20240823162506.12117-2-me@yhndnzj.com
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:47 -07:00
David Hildenbrand
c41a701d18 selftests/mm: fix charge_reserved_hugetlb.sh test
Currently, running the charge_reserved_hugetlb.sh selftest we can
sometimes observe something like:

  $ ./charge_reserved_hugetlb.sh -cgroup-v2
  ...
  write_result is 0
  After write:
  hugetlb_usage=0
  reserved_usage=10485760
  killing write_to_hugetlbfs
  Received 2.
  Deleting the memory
  Detach failure: Invalid argument
  umount: /mnt/huge: target is busy.

Both cases are issues in the test.

While the unmount error seems to be racy, it will make the test fail:
	$ ./run_vmtests.sh -t hugetlb
	...
	# [FAIL]
	not ok 10 charge_reserved_hugetlb.sh -cgroup-v2 # exit=32

The issue is that we are not waiting for the write_to_hugetlbfs process to
quit.  So it might still have a hugetlbfs file open, about which umount is
not happy.  Fix that by making "killall" wait for the process to quit.

The other error ("Detach failure: Invalid argument") does not seem to
result in a test error, but is misleading.  Turns out write_to_hugetlbfs.c
unconditionally tries to cleanup using shmdt(), even when we only
mmap()'ed a hugetlb file.  Even worse, shmaddr is never even set for the
SHM case.  Fix that as well.

With this change it seems to work as expected.

Link: https://lkml.kernel.org/r/20240821123115.2068812-1-david@redhat.com
Fixes: 29750f71a9 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:46 -07:00
Matthew Wilcox (Oracle)
7a87225ae2 x86: remove PG_uncached
Convert x86 to use PG_arch_2 instead of PG_uncached and remove
PG_uncached.

Link: https://lkml.kernel.org/r/20240821193445.2294269-11-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:46 -07:00
Matthew Wilcox (Oracle)
02e1960aaf mm: rename PG_mappedtodisk to PG_owner_2
This flag has similar constraints to PG_owner_priv_1 -- it is ignored by
core code, and is entirely for the use of the code which allocated the
folio.  Since the pagecache does not use it, individual filesystems can
use it.  The bufferhead code does use it, so filesystems which use the
buffer cache must not use it for another purpose.

Link: https://lkml.kernel.org/r/20240821193445.2294269-10-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:45 -07:00
Pedro Falcato
f28bdd1b17 selftests/mm: add more mseal traversal tests
Add more mseal traversal tests across VMAs, where we could possibly screw
up sealing checks.  These test more across-vma traversal for mprotect,
munmap and madvise.  Particularly, we test for the case where a regular
VMA is followed by a sealed VMA.

[akpm@linux-foundation.org: remove incorrect comment, per review]
[akpm@linux-foundation.org: remove the correct comment, per Pedro]
[pedro.falcato@gmail.com: fix mseal's length]
  Link: https://lkml.kernel.org/r/vc4czyuemmu3kylqb4ctaga6y5yvondlyabimx6jvljlw2fkea@djawlllf45xa
Link: https://lkml.kernel.org/r/20240817-mseal-depessimize-v3-7-d8d2e037df30@gmail.com
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:42 -07:00
Baolin Wang
2e6d88e9d4 selftests: mm: support shmem mTHP collapse testing
Add shmem mTHP collpase testing.  Similar to the anonymous page, users can
use the '-s' parameter to specify the shmem mTHP size for testing.

Link: https://lkml.kernel.org/r/fa44bfa20ca5b9fd6f9163a048f3d3c1e53cd0a8.1724140601.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:39 -07:00
Jinjiang Tu
2f4db28610 selftests/mm: remove unnecessary ia64 code and comment
IA64 has gone with commit cf8e865810 ("arch: Remove Itanium (IA-64)
architecture"), so remove unnecessary ia64 special mm code and comment in
selftests too.

Link: https://lkml.kernel.org/r/20240819130609.3386195-1-tujinjiang@huawei.com
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:38 -07:00
Li Ming
577a67662f cxl/pci: Rename cxl_setup_parent_dport() and cxl_dport_map_regs()
The name of cxl_setup_parent_dport() function is not clear, the function
is used to initialize AER and RAS capabilities on a dport, therefore,
rename the function to cxl_dport_init_ras_reporting(), it is easier for
user to understand what the function does. Besides, adjust the order of
the function parameters, the subject of cxl_dport_init_ras_reporting()
is a cxl dport, so a struct cxl_dport as the first parameter of the
function should be better.

cxl_dport_map_regs() is used to map CXL RAS capability on a cxl dport,
using cxl_dport_map_ras() as the function name.

Signed-off-by: Li Ming <ming4.li@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240830061308.2327065-1-ming4.li@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-03 15:29:33 -07:00
Matthieu Baerts (NGI0)
38dc0708bc selftests: mptcp: pm_nl_ctl: remove re-definition
'MPTCP_PM_NAME' is defined in 'linux/mptcp_pm.h', included in
'linux/mptcp.h', no need to re-define it.

'MPTCP_PM_EVENTS' is not defined in 'linux/mptcp.h', but
'MPTCP_PM_EV_GRP_NAME' is, with the same value. We can then use the
latter, and drop the other one.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-11-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03 15:25:43 -07:00
Geliang Tang
0e2b4584d6 selftests: mptcp: join: simplify checksum_tests
The four checksum tests are similar, only one line is different. So
a for-loop can be used to simplify these tests.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-10-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03 15:25:43 -07:00
Matthieu Baerts (NGI0)
08eecd7e7f selftests: mptcp: join: mute errors when ran in the background
The test is supposed to be killed before the end, which will likely
cause "Connection reset by peer" errors. It is confusing, especially
because in case of real transfer errors, the test will not be marked as
failed. But that's OK, there are many other tests checking that.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-9-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03 15:25:43 -07:00
Matthieu Baerts (NGI0)
8d328dbcf6 selftests: mptcp: join: specify host being checked
Instead of displaying 'invert' when looking at some events like MP_FAIL,
MP_FASTCLOSE, MP_RESET, RM_ADDR, which is a bit vague because they are
not traditionnaly sent from one side, the host being checked is now
printed.

For the ADD_ADDR, only display the host when it is the client sending
it, which is more unusual.

Also before, the 'invert' message was printed after a few checks, but it
was not clear which ones exactly.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-8-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03 15:25:43 -07:00
Matthieu Baerts (NGI0)
6ed495345b selftests: mptcp: join: more explicit check name
Before, the check names had to be very short. It is no longer the case
now that these checks are printed on a dedicated line.

Then, it looks better to have more explicit names.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-7-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03 15:25:42 -07:00
Matthieu Baerts (NGI0)
004125c251 selftests: mptcp: join: validate MPJ SYN TX MIB counters
A few new MPJoinSynTx MIB counters have been added in a previous commit.
They are being validated here in mptcp_join.sh selftest, each time the
number of received MPJ are checked.

Most of the time, the number of sent SYN+MPJ is the same as the received
ones. But sometimes, there are more, because there are dropped, or there
are errors.

While at it, the "no MPC reuse with single endpoint" subtest has been
modified to force a bind() error.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-6-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03 15:25:42 -07:00
Matthieu Baerts (NGI0)
ba8a664004 selftests: mptcp: join: one line for join check
Most tests are checking if the expected number of SYN/SYN+ACK/ACK JOINs
have been received, each of them on one line.

More Join related tests are going to be checked soon, no need to add 5
new lines per test in case of success, just one is enough. In case of
issue, the errors will still be reported like before.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-5-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03 15:25:42 -07:00
Matthieu Baerts (NGI0)
1b2965a8cd selftests: mptcp: join: reduce join_nr params
chk_join_nr() currently takes 9 positional parameters, 6 of them are
optional. It makes it hard to read:

  chk_join_nr 1 1 1 1 0 1 1 0 4

Naming these vars helps to make it easier to read:

  join_csum_ns1=1 join_csum_ns2=0 \
    join_fail_nr=1 join_rst_nr=1 join_infi_nr=0 \
    join_corrupted_pkts=4 \
    chk_join_nr 1 1 1

It will then be easier to add new optional parameters.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-4-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03 15:25:42 -07:00
Kunwu Chan
1c1d348b08 tools/testing/cxl: Use dev_is_platform()
Use dev_is_platform() instead of checking bus type directly.

Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240827095123.168696-1-kunwu.chan@linux.dev
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-03 14:40:41 -07:00
Ian Rogers
f76e3525ac perf parse-events: Pass cpu_list as a perf_cpu_map in __add_event()
Previously the cpu_list is a string and typically no cpu_list is
passed to __add_event().

Wanting to make events have their cpus distinct from the PMU means that
in more occassions we want to pass a cpu_list.

If we're reading this from sysfs it is easier to read a perf_cpu_map
than allocate and pass around strings that will later be parsed.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Gautham Shenoy <gautham.shenoy@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20240718003025.1486232-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 16:45:34 -03:00
Ian Rogers
beef8fb2af perf pmu: Merge boolean sysfs event option parsing
Merge perf_pmu__parse_per_pkg() and perf_pmu__parse_snapshot() that do the
same parsing except for the file suffix used.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Gautham Shenoy <gautham.shenoy@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20240718003025.1486232-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 16:42:22 -03:00
Yang Jihong
9b3a48bbe2 perf sched timehist: Add --prio option
The --prio option is used to only show events for the given task priority(ies).
The default is to show events for all priority tasks, which is consistent with
the previous behavior.

Testcase:
  # perf sched record nice -n 9 perf bench sched messaging -l 10000
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 3.435 [sec]
  [ perf record: Woken up 270 times to write data ]
  [ perf record: Captured and wrote 618.688 MB perf.data (5729036 samples) ]

  # perf sched timehist -h

   Usage: perf sched timehist [<options>]

      -C, --cpu <cpu>       list of cpus to profile
      -D, --dump-raw-trace  dump raw trace in ASCII
      -f, --force           don't complain, do it
      -g, --call-graph      Display call chains if present (default on)
      -I, --idle-hist       Show idle events only
      -i, --input <file>    input file name
      -k, --vmlinux <file>  vmlinux pathname
      -M, --migrations      Show migration events
      -n, --next            Show next task
      -p, --pid <pid[,pid...]>
                            analyze events only for given process id(s)
      -s, --summary         Show only syscall summary with statistics
      -S, --with-summary    Show all syscalls and summary with statistics
      -t, --tid <tid[,tid...]>
                            analyze events only for given thread id(s)
      -V, --cpu-visual      Add CPU visual
      -v, --verbose         be more verbose (show symbol address, etc)
      -w, --wakeups         Show wakeup events
          --kallsyms <file>
                            kallsyms pathname
          --max-stack <n>   Maximum number of functions to display backtrace.
          --prio <prio>     analyze events only for given task priority(ies)
          --show-prio       Show task priority
          --state           Show task state when sched-out
          --symfs <directory>
                            Look for files with symbols relative to this directory
          --time <str>      Time span for analysis (start,stop)

  # perf sched timehist --prio 140
  Samples of sched_switch event do not have callchains.
  Invalid prio string

  # perf sched timehist --show-prio --prio 129
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       prio      wait time  sch delay   run time
                          [tid/pid]                                    (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  --------  ---------  ---------  ---------
   2090450.765421 [0002]  sched-messaging[1229618]        129           0.000      0.000      0.029
   2090450.765445 [0007]  sched-messaging[1229616]        129           0.000      0.062      0.043
   2090450.765448 [0014]  sched-messaging[1229619]        129           0.000      0.000      0.032
   2090450.765478 [0013]  sched-messaging[1229617]        129           0.000      0.065      0.048
   2090450.765503 [0014]  sched-messaging[1229622]        129           0.000      0.000      0.017
   2090450.765550 [0002]  sched-messaging[1229624]        129           0.000      0.000      0.021
   2090450.765562 [0007]  sched-messaging[1229621]        129           0.000      0.071      0.028
   2090450.765570 [0005]  sched-messaging[1229620]        129           0.000      0.064      0.066
   2090450.765583 [0001]  sched-messaging[1229625]        129           0.000      0.001      0.031
   2090450.765595 [0013]  sched-messaging[1229623]        129           0.000      0.060      0.028
   2090450.765637 [0014]  sched-messaging[1229628]        129           0.000      0.000      0.019
   2090450.765665 [0007]  sched-messaging[1229627]        129           0.000      0.038      0.030
  <SNIP>

  # perf sched timehist --show-prio --prio 0,120-129
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       prio      wait time  sch delay   run time
                          [tid/pid]                                    (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  --------  ---------  ---------  ---------
   2090450.763231 [0000]  perf[1229608]                   120           0.000      0.000      0.000
   2090450.763235 [0000]  migration/0[15]                 0             0.000      0.001      0.003
   2090450.763263 [0001]  perf[1229608]                   120           0.000      0.000      0.000
   2090450.763268 [0001]  migration/1[21]                 0             0.000      0.001      0.004
   2090450.763302 [0002]  perf[1229608]                   120           0.000      0.000      0.000
   2090450.763309 [0002]  migration/2[27]                 0             0.000      0.001      0.007
   2090450.763338 [0003]  perf[1229608]                   120           0.000      0.000      0.000
   2090450.763343 [0003]  migration/3[33]                 0             0.000      0.001      0.004
   2090450.763459 [0004]  perf[1229608]                   120           0.000      0.000      0.000
   2090450.763469 [0004]  migration/4[39]                 0             0.000      0.002      0.010
   2090450.763496 [0005]  perf[1229608]                   120           0.000      0.000      0.000
   2090450.763501 [0005]  migration/5[45]                 0             0.000      0.001      0.004
   2090450.763613 [0006]  perf[1229608]                   120           0.000      0.000      0.000
   2090450.763622 [0006]  migration/6[51]                 0             0.000      0.001      0.008
   2090450.763652 [0007]  perf[1229608]                   120           0.000      0.000      0.000
   2090450.763660 [0007]  migration/7[57]                 0             0.000      0.001      0.008
  <SNIP>
   2090450.765665 [0001]  <idle>                          120           0.031      0.031      0.081
   2090450.765665 [0007]  sched-messaging[1229627]        129           0.000      0.038      0.030
   2090450.765667 [0000]  s1-perf[8235/7168]              120           0.008      0.000      0.004
   2090450.765684 [0013]  <idle>                          120           0.028      0.028      0.088
   2090450.765685 [0001]  sched-messaging[1229630]        129           0.000      0.001      0.020
   2090450.765688 [0000]  <idle>                          120           0.004      0.004      0.020
   2090450.765689 [0002]  <idle>                          120           0.021      0.021      0.138
   2090450.765691 [0005]  sched-messaging[1229626]        129           0.000      0.085      0.029

Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819033016.2427235-3-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 15:45:59 -03:00
Yang Jihong
3fcd740990 perf sched timehist: Add --show-prio option
The --show-prio option is used to display the priority of task.

It is disabled by default, which is consistent with original behavior.

The display format is xxx (priority does not change during task running)
or xxx->yyy (priority changes during task running)

Testcase:

  # perf sched record nice -n 9 true
  [ perf record: Woken up 0 times to write data ]
  [ perf record: Captured and wrote 0.497 MB perf.data ]

  # perf sched timehist -h

   Usage: perf sched timehist [<options>]

      -C, --cpu <cpu>       list of cpus to profile
      -D, --dump-raw-trace  dump raw trace in ASCII
      -f, --force           don't complain, do it
      -g, --call-graph      Display call chains if present (default on)
      -I, --idle-hist       Show idle events only
      -i, --input <file>    input file name
      -k, --vmlinux <file>  vmlinux pathname
      -M, --migrations      Show migration events
      -n, --next            Show next task
      -p, --pid <pid[,pid...]>
                            analyze events only for given process id(s)
      -s, --summary         Show only syscall summary with statistics
      -S, --with-summary    Show all syscalls and summary with statistics
      -t, --tid <tid[,tid...]>
                            analyze events only for given thread id(s)
      -V, --cpu-visual      Add CPU visual
      -v, --verbose         be more verbose (show symbol address, etc)
      -w, --wakeups         Show wakeup events
          --kallsyms <file>
                            kallsyms pathname
          --max-stack <n>   Maximum number of functions to display backtrace.
          --show-prio       Show task priority
          --state           Show task state when sched-out
          --symfs <directory>
                            Look for files with symbols relative to this directory
          --time <str>      Time span for analysis (start,stop)

  # perf sched timehist
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------
     23952.006537 [0000]  perf[534]                           0.000      0.000      0.000
     23952.006593 [0000]  migration/0[19]                     0.000      0.014      0.056
     23952.006899 [0001]  perf[534]                           0.000      0.000      0.000
     23952.006947 [0001]  migration/1[22]                     0.000      0.015      0.047
     23952.007138 [0002]  perf[534]                           0.000      0.000      0.000
  <SNIP>

  # perf sched timehist --show-prio
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       prio      wait time  sch delay   run time
                          [tid/pid]                                    (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  --------  ---------  ---------  ---------
     23952.006537 [0000]  perf[534]                       120           0.000      0.000      0.000
     23952.006593 [0000]  migration/0[19]                 0             0.000      0.014      0.056
     23952.006899 [0001]  perf[534]                       120           0.000      0.000      0.000
  <SNIP>
     23952.034843 [0003]  nice[535]                       120->129      0.189      0.024     23.314
  <SNIP>
     23952.053838 [0005]  rcu_preempt[16]                 120           3.993      0.000      0.023
     23952.053990 [0005]  <idle>                          120           0.023      0.023      0.152
     23952.054137 [0006]  <idle>                          120           1.427      1.427     17.855
     23952.054278 [0007]  <idle>                          120           0.506      0.506      1.650

Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819033016.2427235-2-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 15:45:34 -03:00
Yang Jihong
b93fb9cf45 perf sched timehist: Remove redundant BUG_ON in timehist_sched_change_event()
The BUG_ON(thread__tid(thread) != 0) in timehist_sched_change_event() is
redundant, remove it.

No functional change.

Fixes: 07235f84ec ("perf sched timehist: Add -I/--idle-hist option")
Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812132606.3126490-2-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 15:44:22 -03:00
Yang Jihong
575eec2180 perf sched timehist: Skip print non-idle task samples when only show idle events
when only show idle events, runtime stats of non-idle tasks is not updated,
and the value is 0, there is no need to print non-idle samples.

Before:

  # perf sched timehist -I
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------
   2090450.763235 [0000]  migration/0[15]                     0.000      0.000      0.000
   2090450.763268 [0001]  migration/1[21]                     0.000      0.000      0.000
   2090450.763309 [0002]  migration/2[27]                     0.000      0.000      0.000
   2090450.763343 [0003]  migration/3[33]                     0.000      0.000      0.000
   2090450.763469 [0004]  migration/4[39]                     0.000      0.000      0.000
   2090450.763501 [0005]  migration/5[45]                     0.000      0.000      0.000
   2090450.763622 [0006]  migration/6[51]                     0.000      0.000      0.000
   2090450.763660 [0007]  migration/7[57]                     0.000      0.000      0.000
   2090450.763741 [0009]  migration/9[69]                     0.000      0.000      0.000
   2090450.763862 [0010]  migration/10[75]                    0.000      0.000      0.000
   2090450.763894 [0011]  migration/11[81]                    0.000      0.000      0.000
   2090450.764021 [0012]  migration/12[87]                    0.000      0.000      0.000
   2090450.764056 [0013]  migration/13[93]                    0.000      0.000      0.000
   2090450.764135 [0014]  migration/14[99]                    0.000      0.000      0.000
   2090450.764163 [0015]  migration/15[105]                   0.000      0.000      0.000
   2090450.764292 [0016]  migration/16[111]                   0.000      0.000      0.000
   2090450.764371 [0017]  migration/17[117]                   0.000      0.000      0.000
   2090450.764422 [0018]  migration/18[123]                   0.000      0.000      0.000
   2090450.764490 [0000]  <idle>                              0.000      0.000      1.255
   2090450.764505 [0000]  s1-perf[8235/7168]                  0.000      0.000      0.000
   2090450.764571 [0016]  <idle>                              0.000      0.000      0.278
   2090450.764588 [0010]  <idle>                              0.000      0.000      0.725
   2090450.764590 [0016]  s1-agent[7179/7162]                 0.000      0.000      0.000
   2090450.764635 [0000]  <idle>                              0.015      0.015      0.129
   2090450.764637 [0017]  <idle>                              0.000      0.000      0.266
   2090450.764639 [0000]  s1-perf[8235/7168]                  0.000      0.000      0.000
   2090450.764668 [0017]  s1-agent[7180/7162]                 0.000      0.000      0.000
   2090450.764669 [0000]  <idle>                              0.003      0.003      0.029
   2090450.764672 [0000]  s1-perf[8235/7168]                  0.000      0.000      0.000
   2090450.764683 [0000]  <idle>                              0.003      0.003      0.010

After:

  # perf sched timehist -I
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------
   2090450.764490 [0000]  <idle>                              0.000      0.000      1.255
   2090450.764571 [0016]  <idle>                              0.000      0.000      0.278
   2090450.764588 [0010]  <idle>                              0.000      0.000      0.725
   2090450.764635 [0000]  <idle>                              0.015      0.015      0.129
   2090450.764637 [0017]  <idle>                              0.000      0.000      0.266
   2090450.764669 [0000]  <idle>                              0.003      0.003      0.029
   2090450.764683 [0000]  <idle>                              0.003      0.003      0.010
   2090450.764688 [0016]  <idle>                              0.019      0.019      0.097
   2090450.764694 [0000]  <idle>                              0.001      0.001      0.009
   2090450.764706 [0000]  <idle>                              0.001      0.001      0.010
   2090450.764725 [0002]  <idle>                              0.000      0.000      1.415
   2090450.764728 [0000]  <idle>                              0.002      0.002      0.019
   2090450.764823 [0000]  <idle>                              0.003      0.003      0.091
   2090450.764838 [0019]  <idle>                              0.000      0.000      0.154
   2090450.764865 [0002]  <idle>                              0.109      0.109      0.029
   2090450.764866 [0000]  <idle>                              0.012      0.012      0.030
   2090450.764880 [0002]  <idle>                              0.013      0.013      0.001
   2090450.764880 [0000]  <idle>                              0.002      0.002      0.011
   2090450.764896 [0000]  <idle>                              0.001      0.001      0.013
   2090450.764903 [0019]  <idle>                              0.063      0.063      0.002
   2090450.764908 [0019]  <idle>                              0.003      0.003      0.001

Fixes: 07235f84ec ("perf sched timehist: Add -I/--idle-hist option")
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812132606.3126490-1-yangjihong@bytedance.com
Reviewed-and-tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 15:43:03 -03:00
Zhu Jun
3c6b818b09 tools/iio: Add memory allocation failure check for trigger_name
Added a check to handle memory allocation failure for `trigger_name`
and return `-ENOMEM`.

Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com>
Link: https://patch.msgid.link/20240828093129.3040-1-zhujun2@cmss.chinamobile.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-09-03 18:49:44 +01:00
Abhinav Jain
b4bcdff7e8 selftests: filesystems: fix warn_unused_result build warnings
Add return value checks for read & write calls in test_listmount_ns
function. This patch resolves below compilation warnings:

```
statmount_test_ns.c: In function ‘test_listmount_ns’:

statmount_test_ns.c:322:17: warning: ignoring return value of ‘write’
declared with attribute ‘warn_unused_result’ [-Wunused-result]

statmount_test_ns.c:323:17: warning: ignoring return value of ‘read’
declared with attribute ‘warn_unused_result’ [-Wunused-result]
```

Signed-off-by: Abhinav Jain <jain.abhinav177@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-03 11:42:55 -06:00
Andi Kleen
bf0db8c759 perf script: Minimize "not reaching sample" for '-F +brstackinsn'
In some situations 'perf script -F +brstackinsn' sees a lot of "not
reaching sample" messages.

This happens when the last LBR block before the sample contains a branch
that is not in the LBR, and the instruction dumping stops.

  $ perf record -b  emacs -Q --batch '()'
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.396 MB perf.data (443 samples) ]
  $ perf script -F +brstackinsn
  ...
          00007f0ab2d171a4        insn: 41 0f 94 c0
          00007f0ab2d171a8        insn: 83 fa 01
          00007f0ab2d171ab        insn: 74 d3                     # PRED 6 cycles [313] 1.00 IPC
          00007f0ab2d17180        insn: 45 84 c0
          00007f0ab2d17183        insn: 74 28
          ... not reaching sample ...

  $ perf script -F +brstackinsn | grep -c reach
  136
  $

This is a problem for further analysis that wants to see the full code
upto the sample.

There are two common cases where the message is bogus:

- The LBR only logs taken branches, but the branch might be a
  conditional branch that is not taken (that is the most common case
  actually)

- The LBR sampling uses a filter ignoring some branches, but the perf
  script check checks for all branches.

This patch fixes these two conditions, by only checking for conditional
branches, as well as checking the perf_event_attr's branch filter
attributes.

For the test case above it fixes all the messages:

  $ ./perf script -F +brstackinsn | grep -c reach
  0

Note that there are still conditions when the message is hit --
sometimes there can be a unconditional branch that misses the LBR update
before the sample -- but they are much more rare now.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20240229161828.386397-1-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 12:22:01 -03:00
Namhyung Kim
8b3b1bb3ea perf record offcpu: Constify control data for BPF
The control knobs set before loading BPF programs should be declared as
'const volatile' so that it can be optimized by the BPF core.

Committer testing:

  root@x1:~# perf record --off-cpu
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 1.807 MB perf.data (5645 samples) ]

  root@x1:~# perf evlist
  cpu_atom/cycles/P
  cpu_core/cycles/P
  offcpu-time
  dummy:u
  root@x1:~# perf evlist -v
  cpu_atom/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000000, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
  cpu_core/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000000, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
  offcpu-time: type: 1 (software), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1
  dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|IDENTIFIER, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
  root@x1:~# perf trace -e bpf --max-events 5 perf record --off-cpu
       0.000 ( 0.015 ms): :2949124/2949124 bpf(cmd: 36, uattr: 0x7ffefc6dbe30, size: 8)          = -1 EOPNOTSUPP (Operation not supported)
       0.031 ( 0.115 ms): :2949124/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbb60, size: 148) = 14
       0.159 ( 0.037 ms): :2949124/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbc20, size: 148) = 14
      23.868 ( 0.144 ms): perf/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbad0, size: 148)     = 14
      24.027 ( 0.014 ms): perf/2949124 bpf(uattr: 0x7ffefc6dbc80, size: 80)                      = 14
  root@x1:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240902200515.2103769-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 11:54:47 -03:00
Namhyung Kim
4afdc00c37 perf lock contention: Constify control data for BPF
The control knobs set before loading BPF programs should be declared as
'const volatile' so that it can be optimized by the BPF core.

Committer testing:

  root@x1:~# perf lock contention --use-bpf
   contended   total wait     max wait     avg wait         type   caller

           5     31.57 us     14.93 us      6.31 us        mutex   btrfs_delayed_update_inode+0x43
           1     16.91 us     16.91 us     16.91 us      rwsem:R   btrfs_tree_read_lock_nested+0x1b
           1     15.13 us     15.13 us     15.13 us     spinlock   btrfs_getattr+0xd1
           1      6.65 us      6.65 us      6.65 us      rwsem:R   btrfs_tree_read_lock_nested+0x1b
           1      4.34 us      4.34 us      4.34 us     spinlock   process_one_work+0x1a9
  root@x1:~#
  root@x1:~# perf trace -e bpf --max-events 10 perf lock contention --use-bpf
       0.000 ( 0.013 ms): :2948281/2948281 bpf(cmd: 36, uattr: 0x7ffd5f12d730, size: 8)          = -1 EOPNOTSUPP (Operation not supported)
       0.024 ( 0.120 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d460, size: 148) = 16
       0.158 ( 0.034 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d520, size: 148) = 16
      26.653 ( 0.154 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d3d0, size: 148)     = 16
      26.825 ( 0.014 ms): perf/2948281 bpf(uattr: 0x7ffd5f12d580, size: 80)                      = 16
      87.924 ( 0.038 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d400, size: 40)       = 16
      87.988 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d470, size: 40)       = 16
      88.019 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d250, size: 40)       = 16
      88.029 ( 0.172 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d320, size: 148)     = 17
      88.217 ( 0.005 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d4d0, size: 40)       = 16
  root@x1:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240902200515.2103769-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 11:53:15 -03:00
Namhyung Kim
066fd84087 perf kwork: Constify control data for BPF
The control knobs set before loading BPF programs should be declared as
'const volatile' so that it can be optimized by the BPF core.

Committer testing:

  root@x1:~# perf kwork report --use-bpf
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    (w)intel_atomic_commit_work [  | 0009 |     18.680 ms |         2 |     18.553 ms |     362410.681580 s |     362410.700133 s |
    (w)pm_runtime_work             | 0007 |     13.300 ms |         1 |     13.300 ms |     362410.254996 s |     362410.268295 s |
    (w)intel_atomic_commit_work [  | 0009 |      9.846 ms |         2 |      9.717 ms |     362410.172352 s |     362410.182069 s |
    (w)acpi_ec_event_processor     | 0002 |      8.106 ms |         1 |      8.106 ms |     362410.463187 s |     362410.471293 s |
    (s)SCHED:7                     | 0000 |      1.351 ms |       106 |      0.063 ms |     362410.658017 s |     362410.658080 s |
    i915:157                       | 0008 |      0.994 ms |        13 |      0.361 ms |     362411.222125 s |     362411.222486 s |
    (s)SCHED:7                     | 0001 |      0.703 ms |        98 |      0.047 ms |     362410.245004 s |     362410.245051 s |
    (s)SCHED:7                     | 0005 |      0.674 ms |        42 |      0.074 ms |     362411.483039 s |     362411.483113 s |
    (s)NET_RX:3                    | 0001 |      0.556 ms |        10 |      0.079 ms |     362411.066388 s |     362411.066467 s |
  <SNIP>

  root@x1:~# perf trace -e bpf --max-events 5 perf kwork report --use-bpf
       0.000 ( 0.016 ms): perf/2948007 bpf(cmd: 36, uattr: 0x7ffededa6660, size: 8)          = -1 EOPNOTSUPP (Operation not supported)
       0.026 ( 0.106 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6390, size: 148) = 12
       0.152 ( 0.032 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6450, size: 148) = 12
      26.247 ( 0.138 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6300, size: 148) = 12
      26.396 ( 0.012 ms): perf/2948007 bpf(uattr: 0x7ffededa64b0, size: 80)                  = 12
  Starting trace, Hit <Ctrl+C> to stop and report
  root@x1:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240902200515.2103769-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 11:50:20 -03:00
Namhyung Kim
ac5a23b2f2 perf ftrace latency: Constify control data for BPF
The control knobs set before loading BPF programs should be declared as
'const volatile' so that it can be optimized by the BPF core.

Committer testing:

  root@x1:~# perf ftrace latency --use-bpf -T schedule
  ^C#   DURATION     |      COUNT | GRAPH                                          |
       0 - 1    us |          0 |                                                |
       1 - 2    us |          0 |                                                |
       2 - 4    us |          0 |                                                |
       4 - 8    us |          0 |                                                |
       8 - 16   us |          1 |                                                |
      16 - 32   us |          5 |                                                |
      32 - 64   us |          2 |                                                |
      64 - 128  us |          6 |                                                |
     128 - 256  us |          7 |                                                |
     256 - 512  us |          5 |                                                |
     512 - 1024 us |         22 | #                                              |
       1 - 2    ms |         36 | ##                                             |
       2 - 4    ms |         68 | #####                                          |
       4 - 8    ms |         22 | #                                              |
       8 - 16   ms |         91 | #######                                        |
      16 - 32   ms |         11 |                                                |
      32 - 64   ms |         26 | ##                                             |
      64 - 128  ms |        213 | #################                              |
     128 - 256  ms |         19 | #                                              |
     256 - 512  ms |         14 | #                                              |
     512 - 1024 ms |          5 |                                                |
       1 - ...   s |          8 |                                                |
  root@x1:~#

  root@x1:~# perf trace -e bpf perf ftrace latency --use-bpf -T schedule
     0.000 ( 0.015 ms): perf/2944525 bpf(cmd: 36, uattr: 0x7ffe80de7b40, size: 8)                          = -1 EOPNOTSUPP (Operation not supported)
     0.025 ( 0.102 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7870, size: 148)                 = 8
     0.136 ( 0.026 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7930, size: 148)                 = 8
     0.174 ( 0.026 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de77e0, size: 148)                 = 8
     0.205 ( 0.010 ms): perf/2944525 bpf(uattr: 0x7ffe80de7990, size: 80)                                  = 8
     0.227 ( 0.011 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7810, size: 40)                   = 8
     0.244 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40)                   = 8
     0.257 ( 0.006 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7660, size: 40)                   = 8
     0.265 ( 0.058 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7730, size: 148)                 = 9
     0.330 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78e0, size: 40)                   = 8
     0.337 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7890, size: 40)                   = 8
     0.343 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40)                   = 8
     0.349 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78b0, size: 40)                   = 8
     0.355 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7890, size: 40)                   = 8
     0.361 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78b0, size: 40)                   = 8
     0.367 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40)                   = 8
     0.373 ( 0.014 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7a00, size: 40)                   = 8
     0.390 ( 0.358 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80)                                  = 9
     0.763 ( 0.014 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80)                                  = 9
     0.783 ( 0.011 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80)                                  = 9
     0.798 ( 0.017 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80)                                  = 9
     0.819 ( 0.003 ms): perf/2944525 bpf(uattr: 0x7ffe80de7700, size: 80)                                  = 9
     0.824 ( 0.047 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de76c0, size: 148)                 = 10
     0.878 ( 0.008 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80)                                  = 9
     0.891 ( 0.014 ms): perf/2944525 bpf(cmd: MAP_UPDATE_ELEM, uattr: 0x7ffe80de79e0, size: 32)            = 0
     0.910 ( 0.103 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7880, size: 148)                 = 9
     1.016 ( 0.143 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7880, size: 148)                 = 10
     3.777 ( 0.068 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7570, size: 148)                 = 12
     3.848 ( 0.003 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de7550, size: 64)                = -1 EBADF (Bad file descriptor)
     3.859 ( 0.006 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de77c0, size: 64)                = 12
     6.504 ( 0.010 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de77c0, size: 64)                = 14
^C#   DURATION     |      COUNT | GRAPH                                          |
     0 - 1    us |          0 |                                                |
     1 - 2    us |          0 |                                                |
     2 - 4    us |          1 |                                                |
     4 - 8    us |          3 |                                                |
     8 - 16   us |          3 |                                                |
    16 - 32   us |         11 |                                                |
    32 - 64   us |          9 |                                                |
    64 - 128  us |         17 |                                                |
   128 - 256  us |         30 | #                                              |
   256 - 512  us |         20 |                                                |
   512 - 1024 us |         42 | #                                              |
     1 - 2    ms |        151 | ######                                         |
     2 - 4    ms |        106 | ####                                           |
     4 - 8    ms |         18 |                                                |
     8 - 16   ms |        149 | ######                                         |
    16 - 32   ms |         30 | #                                              |
    32 - 64   ms |         17 |                                                |
    64 - 128  ms |        360 | ###############                                |
   128 - 256  ms |         52 | ##                                             |
   256 - 512  ms |         18 |                                                |
   512 - 1024 ms |         28 | #                                              |
     1 - ...   s |          5 |                                                |
  root@x1:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240902200515.2103769-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 11:47:02 -03:00
Namhyung Kim
76d3685400 perf stat: Constify control data for BPF
The control knobs set before loading BPF programs should be declared as
'const volatile' so that it can be optimized by the BPF core.

Committer testing:

  root@x1:~# perf stat --bpf-counters -e cpu_core/cycles/,cpu_core/instructions/ sleep 1

   Performance counter stats for 'sleep 1':

           2,442,583      cpu_core/cycles/
           2,494,425      cpu_core/instructions/

         1.002687372 seconds time elapsed

         0.001126000 seconds user
         0.001166000 seconds sys

  root@x1:~# perf trace -e bpf --max-events 10 perf stat --bpf-counters -e cpu_core/cycles/,cpu_core/instructions/ sleep 1
       0.000 ( 0.019 ms): perf/2944119 bpf(cmd: OBJ_GET, uattr: 0x7fffdf5cdd40, size: 20)            = 5
       0.021 ( 0.002 ms): perf/2944119 bpf(cmd: OBJ_GET_INFO_BY_FD, uattr: 0x7fffdf5cdcd0, size: 16) = 0
       0.030 ( 0.005 ms): perf/2944119 bpf(cmd: MAP_LOOKUP_ELEM, uattr: 0x7fffdf5ceda0, size: 32)    = 0
       0.037 ( 0.004 ms): perf/2944119 bpf(cmd: LINK_GET_FD_BY_ID, uattr: 0x7fffdf5ced80, size: 12)  = -1 ENOENT (No such file or directory)
       0.189 ( 0.004 ms): perf/2944119 bpf(cmd: 36, uattr: 0x7fffdf5cec10, size: 8)                  = -1 EOPNOTSUPP (Operation not supported)
       0.201 ( 0.095 ms): perf/2944119 bpf(cmd: PROG_LOAD, uattr: 0x7fffdf5ce940, size: 148)         = 10
       0.305 ( 0.026 ms): perf/2944119 bpf(cmd: PROG_LOAD, uattr: 0x7fffdf5cea00, size: 148)         = 10
       0.347 ( 0.012 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce8e0, size: 40)           = 10
       0.364 ( 0.004 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce950, size: 40)           = 10
       0.376 ( 0.006 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce730, size: 40)           = 10
  root@x1:~#
   Performance counter stats for 'sleep 1':

             271,221      cpu_core/cycles/
             139,150      cpu_core/instructions/

         1.002881677 seconds time elapsed

         0.001318000 seconds user
         0.001314000 seconds sys

  root@x1:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240902200515.2103769-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 11:43:16 -03:00
Ian Rogers
18f41f1ba5 perf test: Make watchpoint data 32-bits on i386
i386 only supports watchpoints up to size 4, 8 bytes causes extra
counts and test failures.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240831070415.506194-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 11:26:53 -03:00
Ian Rogers
91235380e5 perf test: Skip uprobe test if probe command isn't present
The probe command is dependent on libelf. Skip the test if the
required probe command isn't present.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240831070415.506194-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 11:23:01 -03:00
Ian Rogers
38e2648a81 perf time-utils: Fix 32-bit nsec parsing
The "time utils" test fails in 32-bit builds:
  ...
  parse_nsec_time("18446744073.709551615")
  Failed. ptime 4294967295709551615 expected 18446744073709551615
  ...

Switch strtoul to strtoull as an unsigned long in 32-bit build isn't
64-bits.

Fixes: c284d669a2 ("perf tools: Move parse_nsec_time to time-utils.c")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240831070415.506194-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 11:21:55 -03:00
Ian Rogers
6c99903e08 perf pmus: Fix name comparisons on 32-bit systems
The hex PMU suffix maybe 64-bit but the comparisons were "unsigned
long" or 32-bit on 32-bit systems. This was causing the "PMU name
comparison" test to fail in a 32-bit build.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240831070415.506194-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 11:21:09 -03:00
Steinar H. Gunderson
0488568178 perf annotate: LLVM-based disassembler
Support using LLVM as a disassembler method, allowing helperless
annotation in non-distro builds. (It is also much faster than
using libbfd or bfd objdump on binaries with a lot of debug
information.)

This is nearly identical to the output of llvm-objdump; there are
some very rare whitespace differences, some minor changes to demangling
(since we use perf's regular demangling and not LLVM's own) and
the occasional case where llvm-objdump makes a different choice
when multiple symbols share the same address.

It should work across all of LLVM's supported architectures, although
I've only tested 64-bit x86, and finding the right triple from perf's
idea of machine architecture can sometimes be a bit tricky. Ideally, we
should have some way of finding the triplet just from the file itself.

Committer notes:

Address this on 32-bit systems by using PRIu64 from inttypes.h

     3    17.58 almalinux:9-i386              : FAIL gcc version 11.4.1 20231218 (Red Hat 11.4.1-3) (GCC)
      util/llvm-c-helpers.cpp: In function ‘char* make_symbol_relative_string(dso*, const char*, u64, u64)’:
      util/llvm-c-helpers.cpp:150:52: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘u64’ {aka
  +‘long long unsigned int’} [-Werror=format=]
        150 |                 snprintf(buf, sizeof(buf), "%s+0x%lx",
            |                                                  ~~^
            |                                                    |
            |                                                    long unsigned int
            |                                                  %llx
        151 |                          demangled ? demangled : sym_name, addr - base_addr);
            |                                                            ~~~~~~~~~~~~~~~~
            |                                                                 |
            |                                                                 u64 {aka long long unsigned int}
      cc1plus: all warnings being treated as errors

Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240803152008.2818485-3-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 10:39:20 -03:00
Steinar H. Gunderson
6eca7c5ac2 perf annotate: Split out read_symbol()
The Capstone disassembler code has a useful code snippet to read the
bytes for a given code symbol into memory. Split it out into its own
function, so that the LLVM disassembler can use it in the next patch.

Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240803152008.2818485-2-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 10:15:55 -03:00
Steinar H. Gunderson
c3f8644c21 perf report: Support LLVM for addr2line()
In addition to the existing support for libbfd and calling out to
an external addr2line command, add support for using libllvm directly.

This is both faster than libbfd, and can be enabled in distro builds
(the LLVM license has an explicit provision for GPLv2 compatibility).

Thus, it is set as the primary choice if available.

As an example, running 'perf report' on a medium-size profile with
DWARF-based backtraces took 58 seconds with LLVM, 78 seconds with
libbfd, 153 seconds with external llvm-addr2line, and I got tired and
aborted the test after waiting for 55 minutes with external bfd
addr2line (which is the default for perf as compiled by distributions
today).

Evidently, for this case, the bfd addr2line process needs 18 seconds (on
a 5.2 GHz Zen 3) to load the .debug ELF in question, hits the 1-second
timeout and gets killed during initialization, getting restarted anew
every time. Having an in-process addr2line makes this much more robust.

As future extensions, libllvm can be used in many other places where
we currently use libbfd or other libraries:

 - Symbol enumeration (in particular, for PE binaries).
 - Demangling (including non-Itanium demangling, e.g. Microsoft
   or Rust).
 - Disassembling (perf annotate).

However, these are much less pressing; most people don't profile PE
binaries, and perf has non-bfd paths for ELF. The same with demangling;
the default _cxa_demangle path works fine for most users, and while bfd
objdump can be slow on large binaries, it is possible to use
--objdump=llvm-objdump to get the speed benefits.  (It appears
LLVM-based demangling is very simple, should we want that.)

Tested with LLVM 14, 15, 16, 18 and 19. For some reason, LLVM 12 was not
correctly detected using feature_check, and thus was not tested.

Committer notes:

 Added the name and a __maybe_unused to address:

   1    13.50 almalinux:8                   : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-22) (GCC)
    util/srcline.c: In function 'dso__free_a2l':
    util/srcline.c:184:20: error: parameter name omitted
     void dso__free_a2l(struct dso *)
                        ^~~~~~~~~~~~
    make[3]: *** [/git/perf-6.11.0-rc3/tools/build/Makefile.build:158: util] Error 2

Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240803152008.2818485-1-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03 10:15:16 -03:00
Florian Westphal
5ceb87dc76 selftests: netfilter: nft_queue.sh: fix spurious timeout on debug kernel
The sctp selftest is very slow on debug kernels.

Its possible that the nf_queue listener program exits due to timeout
before first sctp packet is processed.

In this case socat hangs until script times out.
Fix this by removing the -t option where possible and kill the test
program once the file transfer/socat has exited.

-t sets SO_RCVTIMEO, its inteded for the 'ping' part of the selftest
where we want to make sure that packets get reinjected properly without
skipping a second queue request.

While at it, add a helper to compare the (binary) files instead of diff.
The 'diff' part was copied from a another sub-test that compares text.

Let helper dump file sizes on error so we can see the progress made.

Tested on an old 2010-ish box with a debug kernel and 100 iterations.

This is a followup to the earlier filesize reduction change.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20240829080109.GB30766@breakpoint.cc/
Fixes: 0a8b08c554 ("selftests: netfilter: nft_queue.sh: reduce test file size for debug build")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20240830092254.8029-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-03 13:04:09 +02:00
Alexander Lobakin
05c1280a2b netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_local
"Interface can't change network namespaces" is rather an attribute,
not a feature, and it can't be changed via Ethtool.
Make it a "cold" private flag instead of a netdev_feature and free
one more bit.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-03 11:36:43 +02:00
Michael Grzeschik
673f0c3ffc tools: usb: p9_fwd: add usb gadget packet forwarder script
This patch is adding an small python tool to forward 9pfs requests
from the USB gadget to an existing 9pfs TCP server. Since currently all
9pfs servers lack support for the usb transport this tool is an useful
helper to get started.

Refer the Documentation section "USBG Example" in
Documentation/filesystems/9p.rst on how to use it.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Tested-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Link: https://lore.kernel.org/r/20240116-ml-topic-u9p-v12-3-9a27de5160e0@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-03 09:57:08 +02:00
Mykyta Yatsenko
b0222d1d9e bpftool: Fix handling enum64 in btf dump sorting
Wrong function is used to access the first enum64 element. Substituting btf_enum(t)
with btf_enum64(t) for BTF_KIND_ENUM64.

Fixes: 94133cf24b ("bpftool: Introduce btf c dump sorting")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240902171721.105253-1-mykyta.yatsenko5@gmail.com
2024-09-02 22:44:51 +02:00
Arnaldo Carvalho de Melo
e162cb25c4 perf daemon: Fix the build on more 32-bit architectures
FYI: I'm carrying this on perf-tools-next.

The previous attempt fixed the build on debian:experimental-x-mipsel,
but when building on a larger set of containers I noticed it broke the
build on some other 32-bit architectures such as:

  42     7.87 ubuntu:18.04-x-arm            : FAIL gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)
    builtin-daemon.c: In function 'cmd_session_list':
    builtin-daemon.c:692:16: error: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'long int' [-Werror=format=]
       fprintf(out, "%c%" PRIu64,
                    ^~~~~
    builtin-daemon.c:694:13:
        csv_sep, (curr - daemon->start) / 60);
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In file included from builtin-daemon.c:3:0:
    /usr/arm-linux-gnueabihf/include/inttypes.h:105:34: note: format string is defined here
     # define PRIu64  __PRI64_PREFIX "u"

So lets cast that time_t (32-bit/64-bit) to uint64_t to make sure it
builds everywhere.

Fixes: 4bbe600293 ("perf daemon: Fix the build on 32-bit architectures")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/ZsPmldtJ0D9Cua9_@x1
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-02 11:59:24 -07:00
Xu Yang
aee1d55922 perf python: include "util/sample.h"
The 32-bit arm build system will complain:

tools/perf/util/python.c:75:28: error: field ‘sample’ has incomplete type
   75 |         struct perf_sample sample;

However, arm64 build system doesn't complain this.

The root cause is arm64 define "HAVE_KVM_STAT_SUPPORT := 1" in
tools/perf/arch/arm64/Makefile, but arm arch doesn't define this.
This will lead to kvm-stat.h include other header files on arm64 build
system, especially "util/sample.h" for util/python.c.

This will try to directly include "util/sample.h" for "util/python.c" to
avoid such build issue on arm platform.

Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Cc: imx@lists.linux.dev
Link: https://lore.kernel.org/r/20240819023403.201324-1-xu.yang_2@nxp.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-02 11:59:24 -07:00
Namhyung Kim
287bd5cf06 perf lock contention: Fix spinlock and rwlock accounting
The spinlock and rwlock use a single-element per-cpu array to track
current locks due to performance reason.  But this means the key is
always available and it cannot simply account lock stats in the array
because some of them are invalid.

In fact, the contention_end() program in the BPF invalidates the entry
by setting the 'lock' value to 0 instead of deleting the entry for the
hashmap.  So it should skip entries with the lock value of 0 in the
account_end_timestamp().

Otherwise, it'd have spurious high contention on an idle machine:

  $ sudo perf lock con -ab -Y spinlock sleep 3
   contended   total wait     max wait     avg wait         type   caller

           8      4.72 s       1.84 s     590.46 ms     spinlock   rcu_core+0xc7
           8      1.87 s       1.87 s     233.48 ms     spinlock   process_one_work+0x1b5
           2      1.87 s       1.87 s     933.92 ms     spinlock   worker_thread+0x1a2
           3      1.81 s       1.81 s     603.93 ms     spinlock   tmigr_update_events+0x13c
           2      1.72 s       1.72 s     861.98 ms     spinlock   tick_do_update_jiffies64+0x25
           6     42.48 us     13.02 us      7.08 us     spinlock   futex_q_lock+0x2a
           1     13.03 us     13.03 us     13.03 us     spinlock   futex_wake+0xce
           1     11.61 us     11.61 us     11.61 us     spinlock   rcu_core+0xc7

I don't believe it has contention on a spinlock longer than 1 second.
After this change, it only reports some small contentions.

  $ sudo perf lock con -ab -Y spinlock sleep 3
   contended   total wait     max wait     avg wait         type   caller

           4    133.51 us     43.29 us     33.38 us     spinlock   tick_do_update_jiffies64+0x25
           4     69.06 us     31.82 us     17.27 us     spinlock   process_one_work+0x1b5
           2     50.66 us     25.77 us     25.33 us     spinlock   rcu_core+0xc7
           1     28.45 us     28.45 us     28.45 us     spinlock   rcu_core+0xc7
           1     24.77 us     24.77 us     24.77 us     spinlock   tmigr_update_events+0x13c
           1     23.34 us     23.34 us     23.34 us     spinlock   raw_spin_rq_lock_nested+0x15

Fixes: b5711042a1 ("perf lock contention: Use per-cpu array map for spinlocks")
Reported-by: Xi Wang <xii@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240828052953.1445862-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-02 11:59:24 -07:00
Veronika Molnarova
1c7fb536e8 perf test pmu: Set uninitialized PMU alias to null
Commit 3e0bf9fde2 ("perf pmu: Restore full PMU name wildcard
support") adds a test case "PMU cmdline match" that covers PMU name
wildcard support provided by function perf_pmu__match(). The test works
with a wide range of supported combinations of PMU name matching but
omits the case that if the perf_pmu__match() cannot match the PMU name
to the wildcard, it tries to match its alias. However, this variable is
not set up, causing the test case to fail when run with subprocesses or
to segfault if run as a single process.

  ./perf test -vv 9
    9: Sysfs PMU tests                                                 :
    9.1: Parsing with PMU format directory                             : Ok
    9.2: Parsing with PMU event                                        : Ok
    9.3: PMU event names                                               : Ok
    9.4: PMU name combining                                            : Ok
    9.5: PMU name comparison                                           : Ok
    9.6: PMU cmdline match                                             : FAILED!

  ./perf test -F 9
    9.1: Parsing with PMU format directory                             : Ok
    9.2: Parsing with PMU event                                        : Ok
    9.3: PMU event names                                               : Ok
    9.4: PMU name combining                                            : Ok
    9.5: PMU name comparison                                           : Ok
  Segmentation fault (core dumped)

Initialize the PMU alias to null for all tests of perf_pmu__match()
as this functionality is not being tested and the alias matching works
exactly the same as the matching of the PMU name.

  ./perf test -F 9
    9.1: Parsing with PMU format directory                             : Ok
    9.2: Parsing with PMU event                                        : Ok
    9.3: PMU event names                                               : Ok
    9.4: PMU name combining                                            : Ok
    9.5: PMU name comparison                                           : Ok
    9.6: PMU cmdline match                                             : Ok

Fixes: 3e0bf9fde2 ("perf pmu: Restore full PMU name wildcard support")
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: james.clark@arm.com
Cc: mpetlan@redhat.com
Cc: rstoyano@redhat.com
Link: https://lore.kernel.org/r/20240808103749.9356-1-vmolnaro@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-02 11:57:46 -07:00
Quentin Monnet
5d2784e25d bpftool: Add missing blank lines in bpftool-net doc example
In bpftool-net documentation, two blank lines are missing in a
recently added example, causing docutils to complain:

    $ cd tools/bpf/bpftool
    $ make doc
      DESCEND Documentation
      GEN     bpftool-btf.8
      GEN     bpftool-cgroup.8
      GEN     bpftool-feature.8
      GEN     bpftool-gen.8
      GEN     bpftool-iter.8
      GEN     bpftool-link.8
      GEN     bpftool-map.8
      GEN     bpftool-net.8
    <stdin>:189: (INFO/1) Possible incomplete section title.
    Treating the overline as ordinary text because it's so short.
    <stdin>:192: (INFO/1) Blank line missing before literal block (after the "::")? Interpreted as a definition list item.
    <stdin>:199: (INFO/1) Possible incomplete section title.
    Treating the overline as ordinary text because it's so short.
    <stdin>:201: (INFO/1) Blank line missing before literal block (after the "::")? Interpreted as a definition list item.
      GEN     bpftool-perf.8
      GEN     bpftool-prog.8
      GEN     bpftool.8
      GEN     bpftool-struct_ops.8

Add the missing blank lines.

Fixes: 0d7c06125c ("bpftool: Add document for net attach/detach on tcx subcommand")
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240901210742.25758-1-qmo@kernel.org
2024-09-02 18:25:16 +02:00
Arnaldo Carvalho de Melo
0e7eb23668 perf tools: Build x86 32-bit syscall table from arch/x86/entry/syscalls/syscall_32.tbl
To remove one more use of the audit libs and address a problem reported
with a recent change where a function isn't available when using the
audit libs method, that should really go away, this being one step in
that direction.

The script used to generate the 64-bit syscall table was already
parametrized to generate for both 64-bit and 32-bit, so just use it and
wire the generated table to the syscalltbl.c routines.

Reported-by: Jiri Slaby <jirislaby@kernel.org>
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Jiri Slaby <jirislaby@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/6fe63fa3-6c63-4b75-ac09-884d26f6fb95@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-02 11:13:40 -03:00
zhangjiao
931a36c413 tools: gpio: rm .*.cmd on make clean
rm .*.cmd when calling make clean

Signed-off-by: zhangjiao <zhangjiao2@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240829062942.11487-1-zhangjiao2@cmss.chinamobile.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-09-02 12:27:35 +02:00
Breno Leitao
8af2caf730 failcmd: make failcmd.sh executable
Change the file permissions of tools/testing/fault-injection/failcmd.sh to
allow execution.  This ensures the script can be run directly without
explicitly invoking a shell.

Link: https://lkml.kernel.org/r/20240729085215.3403417-1-leitao@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:43:32 -07:00
Breno Leitao
11ee88a0f9 fault-injection: enhance failcmd to exit on non-hex address input
The failcmd.sh script in the fault-injection toolkit does not currently
validate whether the provided address is in hexadecimal format.  This can
lead to silent failures if the address is sourced from places like
`/proc/kallsyms`, which omits the '0x' prefix, potentially causing users
to operate under incorrect assumptions.

Introduce a new function, `exit_if_not_hex`, which checks the format of
the provided address and exits with an error message if the address is not
a valid hexadecimal number.

This enhancement prevents users from running the command with improperly
formatted addresses, thus improving the robustness and usability of the
failcmd tool.

Link: https://lkml.kernel.org/r/20240729084512.3349928-1-leitao@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:43:31 -07:00
Sidhartha Kumar
3cd9e92e00 maple_tree: remove mas_destroy() from mas_nomem()
Separate call to mas_destroy() from mas_nomem() so we can check for no
memory errors without destroying the current maple state in
mas_store_gfp().  We then add calls to mas_destroy() to callers of
mas_nomem().

Link: https://lkml.kernel.org/r/20240814161944.55347-6-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:15 -07:00
Sidhartha Kumar
5d659bbb52 maple_tree: introduce mas_wr_store_type()
Introduce mas_wr_store_type() which will set the correct store type based
on a walk of the tree.  In mas_wr_node_store() the <= min_slots condition
is changed to < as if new_end is = to mt_min_slots then there is not
enough room.

mas_prealloc_calc() is also introduced to abstract the calculation used to
determine the number of nodes needed for a store operation.

In this change a call to mas_reset() is removed in the error case of
mas_prealloc().  This is only needed in the MA_STATE_REBALANCE case of
mas_destroy().  We can move the call to mas_reset() directly to
mas_destroy().

Also, add a test case to validate the order that we check the store type
in is correct.  This test models a vma expanding and then shrinking which
is part of the boot process.

Link: https://lkml.kernel.org/r/20240814161944.55347-5-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:15 -07:00
Sidhartha Kumar
617f8e4d76 maple_tree: add test to replicate low memory race conditions
Add new callback fields to the userspace implementation of struct
kmem_cache.  This allows for executing callback functions in order to
further test low memory scenarios where node allocation is retried.

This callback can help test race conditions by calling a function when a
low memory event is tested.

Link: https://lkml.kernel.org/r/20240812190543.71967-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:11 -07:00
Kirill A. Shutemov
5adfeaecc4 mm: rework accept memory helpers
Make accept_memory() and range_contains_unaccepted_memory() take 'start'
and 'size' arguments instead of 'start' and 'end'.

Remove accept_page(), replacing it with direct calls to accept_memory(). 
The accept_page() name is going to be used for a different function.

Link: https://lkml.kernel.org/r/20240809114854.3745464-6-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:07 -07:00
Jeff Xu
072cd213b7 selftest mm/mseal: fix test_seal_mremap_move_dontunmap_anyaddr
the syscall remap accepts following:

mremap(src, size, size, MREMAP_MAYMOVE | MREMAP_DONTUNMAP, dst)

when the src is sealed, the call will fail with error code:
EPERM

Previously, the test uses hard-coded 0xdeaddead as dst, and it
will fail on the system with newer glibc installed.

This patch removes test's dependency on glibc for mremap(), also
fix the test and remove the hardcoded address.

Link: https://lkml.kernel.org/r/20240807212320.2831848-1-jeffxu@chromium.org
Fixes: 4926c7a52d ("selftest mm/mseal memory sealing")
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reported-by: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:05 -07:00
Pedro Falcato
67203f3f2a selftests/mm: add mseal test for no-discard madvise
Add an mseal test for madvise() operations that aren't considered
"discard" (e.g purely advisory ops such as MADV_RANDOM).

[pedro.falcato@gmail.com: adjust the mseal test's plan]
  Link: https://lkml.kernel.org/r/20240807203724.2686144-1-pedro.falcato@gmail.com
Link: https://lkml.kernel.org/r/20240807173336.2523757-3-pedro.falcato@gmail.com
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Tested-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:03 -07:00
Lorenzo Stoakes
9325b8b5a1 tools: add skeleton code for userland testing of VMA logic
Establish a new userland VMA unit testing implementation under
tools/testing which utilises existing logic providing maple tree support
in userland utilising the now-shared code previously exclusive to radix
tree testing.

This provides fundamental VMA operations whose API is defined in mm/vma.h,
while stubbing out superfluous functionality.

This exists as a proof-of-concept, with the test implementation functional
and sufficient to allow userland compilation of vma.c, but containing only
cursory tests to demonstrate basic functionality.

Link: https://lkml.kernel.org/r/533ffa2eec771cbe6b387dd049a7f128a53eb616.1722251717.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: SeongJae Park <sj@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kees Cook <kees@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:55 -07:00
Lorenzo Stoakes
74579d8dab tools: separate out shared radix-tree components
The core components contained within the radix-tree tests which provide
shims for kernel headers and access to the maple tree are useful for
testing other things, so separate them out and make the radix tree tests
dependent on the shared components.

This lays the groundwork for us to add VMA tests of the newly introduced
vma.c file.

Link: https://lkml.kernel.org/r/1ee720c265808168e0d75608e687607d77c36719.1722251717.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kees Cook <kees@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Rae Moar <rmoar@google.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:55 -07:00
David Finkel
d075bccec0 mm, memcg: cg2 memory{.swap,}.peak write tests
Extend two existing tests to cover extracting memory usage through the
newly mutable memory.peak and memory.swap.peak handlers.

In particular, make sure to exercise adding and removing watchers with
overlapping lifetimes so the less-trivial logic gets tested.

The new/updated tests attempt to detect a lack of the write handler by
fstat'ing the memory.peak and memory.swap.peak files and skip the tests if
that's the case.  Additionally, skip if the file doesn't exist at all.

[davidf@vimeo.com: update tests]
  Link: https://lkml.kernel.org/r/20240730231304.761942-3-davidf@vimeo.com
Link: https://lkml.kernel.org/r/20240729143743.34236-3-davidf@vimeo.com
Signed-off-by: David Finkel <davidf@vimeo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:53 -07:00
Muhammad Usama Anjum
b808f62921 selftests: mm: fix build errors on armhf
The __NR_mmap isn't found on armhf.  The mmap() is commonly available
system call and its wrapper is present on all architectures.  So it should
be used directly.  It solves problem for armhf and doesn't create problem
for other architectures.

Remove sys_mmap() functions as they aren't doing anything else other than
calling mmap().  There is no need to set errno = 0 manually as glibc
always resets it.

For reference errors are as following:

  CC       seal_elf
seal_elf.c: In function 'sys_mmap':
seal_elf.c:39:33: error: '__NR_mmap' undeclared (first use in this function)
   39 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,
      |                                 ^~~~~~~~~

mseal_test.c: In function 'sys_mmap':
mseal_test.c:90:33: error: '__NR_mmap' undeclared (first use in this function)
   90 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,
      |                                 ^~~~~~~~~

Link: https://lkml.kernel.org/r/20240809082511.497266-1-usama.anjum@collabora.com
Fixes: 4926c7a52d ("selftest mm/mseal memory sealing")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:58:59 -07:00
Chen Ridong
3f9319c691 cgroup/cpuset: add sefltest for cpuset v1
There is only hotplug test for cpuset v1, just add base read/write test
for cpuset v1.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-30 10:00:17 -10:00
Waiman Long
43a17fcfcc selftest/cgroup: Make test_cpuset_prs.sh deal with pre-isolated CPUs
Since isolated CPUs can be reserved at boot time via the "isolcpus"
boot command line option, these pre-isolated CPUs may interfere with
testing done by test_cpuset_prs.sh.

With the previous commit that incorporates those boot time isolated CPUs
into "cpuset.cpus.isolated", we can check for those before testing is
started to make sure that there will be no interference.  Otherwise,
this test will be skipped if incorrect test failure can happen.

As "cpuset.cpus.isolated" is now available in a non cgroup_debug kernel,
we don't need to check for its existence anymore.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-30 09:23:39 -10:00
Ihor Solodrai
2ad6d23f46 selftests/bpf: Do not update vmlinux.h unnecessarily
%.bpf.o objects depend on vmlinux.h, which makes them transitively
dependent on unnecessary libbpf headers. However vmlinux.h doesn't
actually change as often.

When generating vmlinux.h, compare it to a previous version and update
it only if there are changes.

Example of build time improvement (after first clean build):
  $ touch ../../../lib/bpf/bpf.h
  $ time make -j8
Before: real  1m37.592s
After:  real  0m27.310s

Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed.

Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20240828174608.377204-2-ihor.solodrai@pm.me
2024-08-30 11:54:14 -07:00
Ihor Solodrai
38960ac8f9 selftests/bpf: Specify libbpf headers required for %.bpf.o progs
Test %.bpf.o objects actually depend only on some libbpf headers.
Define a list of required headers and use it as TRUNNER_BPF_OBJS
dependency.

bpf_*.h list was determined by:

    $ grep -rh 'include <bpf/bpf_' progs | sort -u

Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link:
Link: https://lore.kernel.org/bpf/20240828174608.377204-1-ihor.solodrai@pm.me

https://lore.kernel.org/bpf/CAEf4BzYQ-j2i_xjs94Nn=8+FVfkWt51mLZyiYKiz9oA4Z=pCeA@mail.gmail.com/
2024-08-30 11:54:14 -07:00
Linus Torvalds
13c6bba601 IOMMU Fixes for Linux v6.11-rc5
Including:
 
 	- Fix a device-stall problem in bad io-page-fault setups (faults
 	  received from devices with no supporting domain attached).
 
 	- Context flush fix for Intel VT-d.
 
 	- Do not allow non-read+non-write mapping through iommufd as most
 	  implementations can not handle that.
 
 	- Fix a possible infinite-loop issue in map_pages() path.
 
 	- Add Jean-Philippe as reviewer for SMMUv3 SVA support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmbRvfEACgkQK/BELZcB
 GuOB8w//WLapQpxMw9w+4l3Z3SqxB5gSPF6pdCJwYRrpFGBX1yNZ0vWtF2TpKtOC
 NaMa/EC1C2FWjcArCP21uFtDvN04FgXSVl6sjFUHsUf+YALrUfljQk/XFI4SenTq
 PtvPv8PVGbhqLtdJDXMlQWBN3RX0qK/PIFmuUX5ySBk7J7k5QyBi2HEuK2DbPM7j
 +LMnyTHj5Aa2jRz/NSCDIRKbSFJKgvd8apval2VX0zljjpyqk5KmHHjkLtiOiTTI
 G6ZJlRYCn98eTLU2ww8b7/y0vVYop7C1Q7Cyds/72xvW+a3jbSRIGf6yqtmdbMYd
 faxRng5rWHWsq3XMZC+Ts9k2FA3pUIvOmfptCFfrQYYXvZI6dD6o7uMko6SF82n4
 xEy+H6AEWZXF70xaJDp1cn1PpURJgJly/l/6qAIB746qNT7j/CcOOha1bpbCy81x
 EIOl0B4wyJGjQnxjKsH01K9ec3uT6rugbpFEE9PL8l25khhyweBwuQWc2EVxRZgH
 ICH4pCmvU9Wy6mpXL2R/SyzECWjgg0oJr+pq3Yxv7xufSGQswWJ/StFozSBHnH01
 OGGA/2xMrNeRzlm4PZfRzdAiCfYX9kEodiF1jGLA4B1V5Tx/y1LSX7W/nCeZmlRz
 /OhEC07DWZumeSCTe5I+BmZwiXh/DEAlUypDQkVKaaeGltlyvl8=
 =8XuD
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

 - Fix a device-stall problem in bad io-page-fault setups (faults
   received from devices with no supporting domain attached).

 - Context flush fix for Intel VT-d.

 - Do not allow non-read+non-write mapping through iommufd as most
   implementations can not handle that.

 - Fix a possible infinite-loop issue in map_pages() path.

 - Add Jean-Philippe as reviewer for SMMUv3 SVA support

* tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  MAINTAINERS: Add Jean-Philippe as SMMUv3 SVA reviewer
  iommu: Do not return 0 from map_pages if it doesn't do anything
  iommufd: Do not allow creating areas without READ or WRITE
  iommu/vt-d: Fix incorrect domain ID in context flush helper
  iommu: Handle iommu faults for a bad iopf setup
2024-08-31 06:11:34 +12:00
Eduard Zingerman
181b0d1af5 selftests/bpf: Check if distilled base inherits source endianness
Create a BTF with endianness different from host, make a distilled
base/split BTF pair from it, dump as raw bytes, import again and
verify that endianness is preserved.

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240830173406.1581007-1-eddyz87@gmail.com
2024-08-30 11:03:16 -07:00
Tony Ambardar
da18bfa59d libbpf: Ensure new BTF objects inherit input endianness
New split BTF needs to preserve base's endianness. Similarly, when
creating a distilled BTF, we need to preserve original endianness.

Fix by updating libbpf's btf__distill_base() and btf_new_empty() to retain
the byte order of any source BTF objects when creating new ones.

Fixes: ba451366bf ("libbpf: Implement basic split BTF support")
Fixes: 58e185a0dc ("libbpf: Add btf__distill_base() creating split BTF with distilled base BTF")
Reported-by: Song Liu <song@kernel.org>
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/6358db36c5f68b07873a0a5be2d062b1af5ea5f8.camel@gmail.com/
Link: https://lore.kernel.org/bpf/20240830095150.278881-1-tony.ambardar@gmail.com
2024-08-30 10:41:01 -07:00
Jason A. Donenfeld
33ffa2dd0d selftests: vDSO: quash clang omitted parameter warning in getrandom test
When building with clang, there's this warning:

vdso_test_getrandom.c:145:40: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions]
  145 | static void *test_vdso_getrandom(void *)
      |                                        ^
vdso_test_getrandom.c:155:40: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions]
  155 | static void *test_libc_getrandom(void *)
      |                                        ^
vdso_test_getrandom.c:165:43: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions]
  165 | static void *test_syscall_getrandom(void *)

Add the named ctx parameter to quash it.

Reported-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 17:55:41 +02:00
Dev Jain
d736d4fc76 kselftest/arm64: Fix build warnings for ptrace
A "%s" is missing in ksft_exit_fail_msg(); instead, use the newly
introduced ksft_exit_fail_perror().

Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240830052911.4040970-1-dev.jain@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 16:26:47 +01:00
Christophe Leroy
f0d0dbbc10 selftests: vDSO: use parse_vdso.h in vdso_test_abi
Don't duplicate parse_vdso function prototypes, include
the header instead.

Fixes: 693f5ca08c ("kselftest: Extend vDSO selftest")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:48:45 +02:00
Christophe Leroy
6eda706a53 selftests: vDSO: fix the way vDSO functions are called for powerpc
vdso_test_correctness test fails on powerpc:

~ # ./vdso_test_correctness
...
[RUN]	Testing clock_gettime for clock CLOCK_REALTIME_ALARM (8)...
[FAIL]	No such clock, but __vdso_clock_gettime returned 22
[RUN]	Testing clock_gettime for clock CLOCK_BOOTTIME_ALARM (9)...
[FAIL]	No such clock, but __vdso_clock_gettime returned 22
[RUN]	Testing clock_gettime for clock CLOCK_SGI_CYCLE (10)...
[FAIL]	No such clock, but __vdso_clock_gettime returned 22
...
[RUN]	Testing clock_gettime for clock invalid (-1)...
[FAIL]	No such clock, but __vdso_clock_gettime returned 22
[RUN]	Testing clock_gettime for clock invalid (-2147483648)...
[FAIL]	No such clock, but __vdso_clock_gettime returned 22
[RUN]	Testing clock_gettime for clock invalid (2147483647)...
[FAIL]	No such clock, but __vdso_clock_gettime returned 22

On powerpc, a call to a VDSO function is not an ordinary C function
call. Unlike several architectures which returns a negative error code
in case of an error, powerpc sets CR[SO] and returns the error code
as a positive value.

Define and use a macro called VDSO_CALL() which takes a pointer
to the function to call, the number of arguments and the arguments.

Also update ABI vdso documentation to reflect this subtlety.

Provide a specific version of VDSO_CALL() for powerpc that negates
the error code on return when CR[SO] is set.

Fixes: c7e5789b24 ("kselftest: Move test_vdso to the vDSO test suite")
Fixes: 2e9a972566 ("selftests: vdso: Add a selftest for vDSO getcpu()")
Fixes: 693f5ca08c ("kselftest: Extend vDSO selftest")
Fixes: b2f1c3db28 ("kselftest: Extend vdso correctness test to clock_gettime64")
Fixes: 4920a2590e ("selftests/vDSO: add tests for vgetrandom")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:48:45 +02:00
Christophe Leroy
ba83b3239e selftests: vDSO: fix vDSO symbols lookup for powerpc64
On powerpc64, following tests fail locating vDSO functions:

  ~ # ./vdso_test_abi
  TAP version 13
  1..16
  # [vDSO kselftest] VDSO_VERSION: LINUX_2.6.15
  # Couldn't find __kernel_gettimeofday
  ok 1 # SKIP __kernel_gettimeofday
  # clock_id: CLOCK_REALTIME
  # Couldn't find __kernel_clock_gettime
  ok 2 # SKIP __kernel_clock_gettime CLOCK_REALTIME
  # Couldn't find __kernel_clock_getres
  ok 3 # SKIP __kernel_clock_getres CLOCK_REALTIME
  ...
  # Couldn't find __kernel_time
  ok 16 # SKIP __kernel_time
  # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:16 error:0

  ~ # ./vdso_test_getrandom
  __kernel_getrandom is missing!

  ~ # ./vdso_test_gettimeofday
  Could not find __kernel_gettimeofday

  ~ # ./vdso_test_getcpu
  Could not find __kernel_getcpu

On powerpc64, as shown below by readelf, vDSO functions symbols have
type NOTYPE, so also accept that type when looking for symbols.

$ powerpc64-linux-gnu-readelf -a arch/powerpc/kernel/vdso/vdso64.so.dbg
ELF Header:
  Magic:   7f 45 4c 46 02 02 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, big endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           PowerPC64
  Version:                           0x1
...

Symbol table '.dynsym' contains 12 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000524    84 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     2: 00000000000005f0    36 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     3: 0000000000000578    68 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     4: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6.15
     5: 00000000000006c0    48 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     6: 0000000000000614   172 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     7: 00000000000006f0    84 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     8: 000000000000047c    84 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     9: 0000000000000454    12 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
    10: 00000000000004d0    84 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
    11: 00000000000005bc    52 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15

Symbol table '.symtab' contains 56 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
...
    45: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6.15
    46: 00000000000006c0    48 NOTYPE  GLOBAL DEFAULT    8 __kernel_getcpu
    47: 0000000000000524    84 NOTYPE  GLOBAL DEFAULT    8 __kernel_clock_getres
    48: 00000000000005f0    36 NOTYPE  GLOBAL DEFAULT    8 __kernel_get_tbfreq
    49: 000000000000047c    84 NOTYPE  GLOBAL DEFAULT    8 __kernel_gettimeofday
    50: 0000000000000614   172 NOTYPE  GLOBAL DEFAULT    8 __kernel_sync_dicache
    51: 00000000000006f0    84 NOTYPE  GLOBAL DEFAULT    8 __kernel_getrandom
    52: 0000000000000454    12 NOTYPE  GLOBAL DEFAULT    8 __kernel_sigtram[...]
    53: 0000000000000578    68 NOTYPE  GLOBAL DEFAULT    8 __kernel_time
    54: 00000000000004d0    84 NOTYPE  GLOBAL DEFAULT    8 __kernel_clock_g[...]
    55: 00000000000005bc    52 NOTYPE  GLOBAL DEFAULT    8 __kernel_get_sys[...]

Fixes: 98eedc3a9d ("Document the vDSO and add a reference parser")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:48:45 +02:00
Christophe Leroy
7d297c419b selftests: vDSO: fix vdso_config for powerpc
Running vdso_test_correctness on powerpc64 gives the following warning:

  ~ # ./vdso_test_correctness
  Warning: failed to find clock_gettime64 in vDSO

This is because vdso_test_correctness was built with VDSO_32BIT defined.

__powerpc__ macro is defined on both powerpc32 and powerpc64 so
__powerpc64__ needs to be checked first in vdso_config.h

Fixes: 693f5ca08c ("kselftest: Extend vDSO selftest")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:48:45 +02:00
Christophe Leroy
59eb856c3e selftests: vDSO: fix vDSO name for powerpc
Following error occurs when running vdso_test_correctness on powerpc:

~ # ./vdso_test_correctness
[WARN]	failed to find vDSO
[SKIP]	No vDSO, so skipping clock_gettime() tests
[SKIP]	No vDSO, so skipping clock_gettime64() tests
[RUN]	Testing getcpu...
[OK]	CPU 0: syscall: cpu 0, node 0

On powerpc, vDSO is neither called linux-vdso.so.1 nor linux-gate.so.1
but linux-vdso32.so.1 or linux-vdso64.so.1.

Also search those two names before giving up.

Fixes: c7e5789b24 ("kselftest: Move test_vdso to the vDSO test suite")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:48:45 +02:00
Jason A. Donenfeld
f78280b1a3 selftests: vDSO: skip getrandom test if architecture is unsupported
If the getrandom test compiles for an arch, don't exit fatally if the
actual cpu it's running on is unsupported.

Suggested-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:44:38 +02:00
Xi Ruoyao
b90eeff1ba selftests: vDSO: use KHDR_INCLUDES for UAPI headers for getrandom test
Building test_vdso_getrandom currently leads to following issue:

    In file included from /home/xry111/git-repos/linux/tools/include/linux/compiler_types.h:36,
                     from /home/xry111/git-repos/linux/include/uapi/linux/stddef.h:5,
                     from /home/xry111/git-repos/linux/include/uapi/linux/posix_types.h:5,
                     from /usr/include/asm/sigcontext.h:12,
                     from /usr/include/bits/sigcontext.h:30,
                     from /usr/include/signal.h:301,
                     from vdso_test_getrandom.c:14:
    /home/xry111/git-repos/linux/tools/include/linux/compiler-gcc.h:3:2: error: #error "Please don't include <linux/compiler-gcc.h> directly, include <linux/compiler.h> instead."
        3 | #error "Please don't include <linux/compiler-gcc.h> directly, include <linux/compiler.h> instead."
          |  ^~~~~

It's because the compiler_types.h inclusion in
include/uapi/linux/stddef.h is expected to be removed by the
header_install.sh script, as compiler_types.h shouldn't be used from
user space.

Add KHDR_INCLUDES before the existing include/uapi inclusion so that
usr/include takes precedence if it's populated.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:44:38 +02:00
Jason A. Donenfeld
be9155154b selftests: vDSO: remove unnecessary command line defs from chacha test
CONFIG_FUNCTION_ALIGNMENT=0 is no longer necessary and BULID_VDSO wasn't
spelled right while BUILD_VDSO isn't necessary, so just remove these.

Reported-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:44:38 +02:00
Jason A. Donenfeld
a5330eb3bc selftests: vDSO: separate LDLIBS from CFLAGS for libsodium
On systems that set -Wl,--as-needed, putting the -lsodium in the wrong
place on the command line means we get a linker error:

      CC       vdso_test_chacha
    /usr/bin/ld: /tmp/ccKpjnSM.o: in function `main':
    vdso_test_chacha.c:(.text+0x276): undefined reference to `crypto_stream_chacha20'
    collect2: error: ld returned 1 exit status

Fix this by passing pkg-config's --libs output to the LDFLAGS field
instead of the CFLAGS field, as is customary.

Reported-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:44:38 +02:00
Xi Ruoyao
1e661b3490 selftests: vDSO: add --cflags for pkg-config command querying libsodium
When libsodium is installed into its own prefix, the --cflags output is
needed for the compiler to find libsodium headers.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:44:38 +02:00
Christophe Leroy
e1bbcab496 selftests: vDSO: look for arch-specific function name in getrandom test
Don't hard-code x86 specific names. Rather, use vdso_config definitions
to find the correct function matching the architecture.

Add random VDSO function names in names[][]. Remove the #ifdef
CONFIG_VDSO32, as having the name there all the time is harmless and
guaranties a steady index for following strings.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[Jason: add [6] to variable declaration rather than each usage site.]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:44:38 +02:00
Christophe Leroy
f8d92fc527 selftests: vDSO: fix include order in build of test_vdso_chacha
Building test_vdso_chacha currently leads to following issue:

  In file included from /home/chleroy/linux-powerpc/include/linux/limits.h:7,
                   from /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/local_lim.h:38,
                   from /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/posix1_lim.h:161,
                   from /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/limits.h:195,
                   from /opt/powerpc64-e5500--glibc--stable-2024.02-1/lib/gcc/powerpc64-buildroot-linux-gnu/12.3.0/include-fixed/limits.h:203,
                   from /opt/powerpc64-e5500--glibc--stable-2024.02-1/lib/gcc/powerpc64-buildroot-linux-gnu/12.3.0/include-fixed/syslimits.h:7,
                   from /opt/powerpc64-e5500--glibc--stable-2024.02-1/lib/gcc/powerpc64-buildroot-linux-gnu/12.3.0/include-fixed/limits.h:34,
                   from /tmp/sodium/usr/local/include/sodium/export.h:7,
                   from /tmp/sodium/usr/local/include/sodium/crypto_stream_chacha20.h:14,
                   from vdso_test_chacha.c:6:
  /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/xopen_lim.h:99:6: error: missing binary operator before token "("
     99 | # if INT_MAX == 32767
        |      ^~~~~~~
  /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/xopen_lim.h:102:7: error: missing binary operator before token "("
    102 | #  if INT_MAX == 2147483647
        |       ^~~~~~~
  /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/xopen_lim.h:126:6: error: missing binary operator before token "("
    126 | # if LONG_MAX == 2147483647
        |      ^~~~~~~~

This is due to kernel include/linux/limits.h being included instead of
libc's limits.h.

This is because directory include/ is added through option -isystem so
it goes prior to glibc's include directory.

Replace -isystem by -idirafter.

But this implies that now tools/include/linux/linkage.h is included
instead of include/linux/linkage.h, so define a stub for
SYM_FUNC_START() and SYM_FUNC_END().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:44:38 +02:00
Christophe Leroy
20a9af057c selftests: vDSO: don't hard-code location of vDSO sources
Architectures use different location for vDSO sources:

	arch/mips/vdso
	arch/sparc/vdso
	arch/arm64/kernel/vdso
	arch/riscv/kernel/vdso
	arch/csky/kernel/vdso
	arch/x86/um/vdso
	arch/x86/entry/vdso
	arch/powerpc/kernel/vdso
	arch/arm/vdso
	arch/loongarch/vdso

Don't hard-code vdso sources location in selftest Makefile. Instead
create a vdso/ symbolic link in tools/arch/$arch/ and update Makefile
accordingly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:44:33 +02:00
Jason A. Donenfeld
01b52f01c5 selftests: vDSO: simplify getrandom thread local storage and structs
Rather than using pthread_get/set_specific, just use gcc's __thread
annotation, which is noticeably faster and makes the code more obvious.

Also, just have one simplified struct called vgrnd, instead of trying to
split things up semantically. Those divisions were useful when this code
was split across several commit *messages*, but doesn't make as much
sense within a single file. This should make the code more clear and
provide a better example for implementers.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-30 15:43:11 +02:00
Yang Jihong
39c243411b perf sched timehist: Fixed timestamp error when unable to confirm event sched_in time
If sched_in event for current task is not recorded, sched_in timestamp
will be set to end_time of time window interest, causing an error in
timestamp show. In this case, we choose to ignore this event.

Test scenario:

  perf[1229608] does not record the first sched_in event, run time and sch delay are both 0

  # perf sched timehist
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------
   2090450.763231 [0000]  perf[1229608]                       0.000      0.000      0.000
   2090450.763235 [0000]  migration/0[15]                     0.000      0.001      0.003
   2090450.763263 [0001]  perf[1229608]                       0.000      0.000      0.000
   2090450.763268 [0001]  migration/1[21]                     0.000      0.001      0.004
   2090450.763302 [0002]  perf[1229608]                       0.000      0.000      0.000
   2090450.763309 [0002]  migration/2[27]                     0.000      0.001      0.007
   2090450.763338 [0003]  perf[1229608]                       0.000      0.000      0.000
   2090450.763343 [0003]  migration/3[33]                     0.000      0.001      0.004

Before:

  arbitrarily specify a time window of interest, timestamp will be set to an incorrect value

  # perf sched timehist --time 100,200
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------
       200.000000 [0000]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0001]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0002]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0003]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0004]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0005]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0006]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0007]  perf[1229608]                       0.000      0.000      0.000

 After:

  # perf sched timehist --time 100,200
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------

Fixes: 853b740711 ("perf sched timehist: Add option to specify time window of interest")
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819024720.2405244-1-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30 10:31:57 -03:00
Namhyung Kim
74fd69a35c perf lock contention: Fix spinlock and rwlock accounting
The spinlock and rwlock use a single-element per-cpu array to track
current locks due to performance reason.  But this means the key is
always available and it cannot simply account lock stats in the array
because some of them are invalid.

In fact, the contention_end() program in the BPF invalidates the entry
by setting the 'lock' value to 0 instead of deleting the entry for the
hashmap.  So it should skip entries with the lock value of 0 in the
account_end_timestamp().

Otherwise, it'd have spurious high contention on an idle machine:

  $ sudo perf lock con -ab -Y spinlock sleep 3
   contended   total wait     max wait     avg wait         type   caller

           8      4.72 s       1.84 s     590.46 ms     spinlock   rcu_core+0xc7
           8      1.87 s       1.87 s     233.48 ms     spinlock   process_one_work+0x1b5
           2      1.87 s       1.87 s     933.92 ms     spinlock   worker_thread+0x1a2
           3      1.81 s       1.81 s     603.93 ms     spinlock   tmigr_update_events+0x13c
           2      1.72 s       1.72 s     861.98 ms     spinlock   tick_do_update_jiffies64+0x25
           6     42.48 us     13.02 us      7.08 us     spinlock   futex_q_lock+0x2a
           1     13.03 us     13.03 us     13.03 us     spinlock   futex_wake+0xce
           1     11.61 us     11.61 us     11.61 us     spinlock   rcu_core+0xc7

I don't believe it has contention on a spinlock longer than 1 second.
After this change, it only reports some small contentions.

  $ sudo perf lock con -ab -Y spinlock sleep 3
   contended   total wait     max wait     avg wait         type   caller

           4    133.51 us     43.29 us     33.38 us     spinlock   tick_do_update_jiffies64+0x25
           4     69.06 us     31.82 us     17.27 us     spinlock   process_one_work+0x1b5
           2     50.66 us     25.77 us     25.33 us     spinlock   rcu_core+0xc7
           1     28.45 us     28.45 us     28.45 us     spinlock   rcu_core+0xc7
           1     24.77 us     24.77 us     24.77 us     spinlock   tmigr_update_events+0x13c
           1     23.34 us     23.34 us     23.34 us     spinlock   raw_spin_rq_lock_nested+0x15

Fixes: b5711042a1 ("perf lock contention: Use per-cpu array map for spinlocks")
Reported-by: Xi Wang <xii@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: bpf@vger.kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240828052953.1445862-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30 10:22:50 -03:00
Namhyung Kim
36cddd1056 perf lock contention: Do not fail EEXIST for update
When it updates the lock stat for the first time, it needs to create an
element in the BPF hash map.

But if there's a concurrent thread waiting for the same lock (like for
rwsem or rwlock), it might race with the thread and possibly fail to
update with -EEXIST.

In that case, it can lookup the map again and put the data there instead
of failing.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240830065150.1758962-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30 10:03:39 -03:00
Namhyung Kim
05a5dd1dfd perf lock contention: Simplify spinlock check
The LCB_F_SPIN bit is used for spinlock, rwlock and optimistic spinning
in mutex.  In get_tstamp_elem() it needs to check spinlock and rwlock
only.  As mutex sets the LCB_F_MUTEX, it can check those two bits and
reduce the number of operations.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240830065150.1758962-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30 09:57:47 -03:00
Namhyung Kim
10d6c57c82 perf lock contention: Handle error in a single place
It has some duplicate codes to do the same job.  Let's add a label and
goto there to handle errors in a single place.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240830065150.1758962-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30 09:57:19 -03:00
Ian Rogers
ccb9004656 perf test: Additional pipe tests with pipe output written to a file
Additional pipe tests where piped files are written to disk. This
means that spotting a file name of "-" isn't a sufficient "is pipe?"
test.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30 09:24:27 -03:00
Ian Rogers
2d57c32b32 perf header: Remove repipe option
No longer used by `perf inject` the repipe_fd is always -1 and repipe
is always false. Remove the options and associated code knowing the
constant values of the removed variables.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30 09:24:27 -03:00
Ian Rogers
89d64e7273 perf inject: Overhaul handling of pipe files
Previously inject->is_pipe was set if the input or output were a
pipe. Determining the input was a pipe had to be done prior to
starting the session and opening the file. This was done by comparing
the input file name with '-' but it fails if the pipe file is written
to disk.

Opening a pipe file from disk will correctly set perf_data.is_pipe, but
this is too late for 'perf inject' and results in a broken file. A
workaround is 'cat pipe_perf|perf inject -i - ...'.

This change removes inject->is_pipe and changes the dependent
conditions to use the is_pipe flag on the input
(inject->session->data) and output files (inject->output). This
ensures the is_pipe condition reflects things like the header being
read.

The change removes the use of perf file header repiping, that is
writing the file header out while reading it in. The case of input
pipe and output file cannot repipe as the attributes for the file are
unknown. To resolve this, write the file header when writing to disk
and as the attributes may be unknown, write them after the data.

Update sessions repipe variable to be trace_event_repipe as those are
the only events now impacted by it. Update __perf_session__new as the
repipe_fd no longer needs passing. Fully removing repipe from session
header reading will be done in a later change.

Committer testing:

  root@number:~# perf record -e syscalls:sys_enter_*sleep/max-stack=4/ -o - sleep 0.01 | perf report -i -
  # To display the perf.data header info, please use --header/--header-only options.
  #
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.050 MB - ]
  #
  # Total Lost Samples: 0
  #
  # Samples: 1  of event 'syscalls:sys_enter_clock_nanosleep'
  # Event count (approx.): 1
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ...............................
  #
     100.00%  sleep    libc.so.6      [.] clock_nanosleep@GLIBC_2.2.5
              |
              ---__libc_start_main@@GLIBC_2.34
                 __libc_start_call_main
                 0x562fc2560a9f
                 clock_nanosleep@GLIBC_2.2.5

  #
  # (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
  #
  root@number:~# perf record -e syscalls:sys_enter_*sleep/max-stack=4/ -o - sleep 0.01 > pipe.data
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.050 MB - ]
  root@number:~# perf report --stdio -i pipe.data
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 1  of event 'syscalls:sys_enter_clock_nanosleep'
  # Event count (approx.): 1
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ...............................
  #
     100.00%  sleep    libc.so.6      [.] clock_nanosleep@GLIBC_2.2.5
              |
              ---__libc_start_main@@GLIBC_2.34
                 __libc_start_call_main
                 0x55f775975a9f
                 clock_nanosleep@GLIBC_2.2.5

  #
  # (Tip: To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ...)
  #
  root@number:~#

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30 09:23:51 -03:00
Mark Brown
6f0315330a kselftest/arm64: Actually test SME vector length changes via sigreturn
The test case for SME vector length changes via sigreturn use a bit too
much cut'n'paste and only actually changed the SVE vector length in the
test itself. Andre's recent factoring out of the initialisation code caused
this to be exposed and the test to start failing. Fix the test to actually
cover the thing it's supposed to test.

Fixes: 4963aeb35a ("kselftest/arm64: signal: Add SME signal handling tests")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20240829-arm64-sme-signal-vl-change-test-v1-1-42d7534cb818@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 12:53:04 +01:00
Zhu Jun
9a7db819a1 crypto: tools/ccp - Remove unused variable
the variable is never referenced in the code, just remove them.

Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-08-30 18:22:30 +08:00
Colton Lewis
54306f5644 KVM: arm64: selftests: Add arch_timer_edge_cases selftest
Add a new arch_timer_edge_cases selftests that validates:

* timers above the max TVAL value
* timers in the past
* moving counters ahead and behind pending timers
* reprograming timers
* timers fired multiple times
* masking/unmasking using the timer control mask

These are intentionally unusual scenarios to stress compliance with
the arm architecture.

Co-developed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Link: https://lore.kernel.org/r/20240823175836.2798235-3-coltonlewis@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30 09:04:16 +01:00
Colton Lewis
ca1a18368d KVM: arm64: selftests: Ensure pending interrupts are handled in arch_timer test
Break up the asm instructions poking daifclr and daifset to handle
interrupts. R_RBZYL specifies pending interrupts will be handle after
context synchronization events such as an ISB.

Introduce a function wrapper for the WFI instruction.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
Link: https://lore.kernel.org/r/20240823175836.2798235-2-coltonlewis@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30 09:03:45 +01:00
Sean Christopherson
9d15171f39 KVM: selftests: Explicitly include committed one-off assets in .gitignore
Add KVM selftests' one-off assets, e.g. the Makefile, to the .gitignore so
that they are explicitly included.  The justification for omitting the
one-offs was that including them wouldn't help prevent mistakes:

  Deliberately do not include the one-off assets, e.g. config, settings,
  .gitignore itself, etc as Git doesn't ignore files that are already in
  the repository.  Adding the one-off assets won't prevent mistakes where
  developers forget to --force add files that don't match the "allowed".

Turns out that's not the case, as W=1 will generate warnings, and the
amazing-as-always kernel test bot reports new warnings:

   tools/testing/selftests/kvm/.gitignore: warning: ignored by one of the .gitignore files
   tools/testing/selftests/kvm/Makefile: warning: ignored by one of the .gitignore files
>> tools/testing/selftests/kvm/Makefile.kvm: warning: ignored by one of the .gitignore files
   tools/testing/selftests/kvm/config: warning: ignored by one of the .gitignore files
   tools/testing/selftests/kvm/settings: warning: ignored by one of the .gitignore files

Fixes: 43e96957e8 ("KVM: selftests: Use pattern matching in .gitignore")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408211818.85zIkDEK-lkp@intel.com
Link: https://lore.kernel.org/r/20240828215800.737042-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 19:38:56 -07:00
Sean Christopherson
215b3cb7a8 KVM: selftests: Add a test for coalesced MMIO (and PIO on x86)
Add a test to verify that KVM correctly exits (or not) when a vCPU's
coalesced I/O ring is full (or isn't).  Iterate over all legal starting
points in the ring (with an empty ring), and verify that KVM doesn't exit
until the ring is full.

Opportunistically verify that KVM exits immediately on non-coalesced I/O,
either because the MMIO/PIO region was never registered, or because a
previous region was unregistered.

This is a regression test for a KVM bug where KVM would prematurely exit
due to bad math resulting in a false positive if the first entry in the
ring was before the halfway mark.  See commit 92f6d41304 ("KVM: Fix
coalesced_mmio_has_room() to avoid premature userspace exit").

Enable the test for x86, arm64, and risc-v, i.e. all architectures except
s390, which doesn't have MMIO.

On x86, which has both MMIO and PIO, interleave MMIO and PIO into the same
ring, as KVM shouldn't exit until a non-coalesced I/O is encountered,
regardless of whether the ring is filled with MMIO, PIO, or both.

Lastly, wrap the coalesced I/O ring in a structure to prepare for a
potential future where KVM supports multiple ring buffers beyond KVM's
"default" built-in buffer.

Link: https://lore.kernel.org/all/20240820133333.1724191-1-ilstam@amazon.com
Cc: Ilias Stamatis <ilstam@amazon.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20240828181446.652474-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 19:38:33 -07:00
Peter Gonda
2f6fcfa1f4 KVM: selftests: Add SEV-ES shutdown test
Regression test for ae20eef5 ("KVM: SVM: Update SEV-ES shutdown intercepts
with more metadata"). Test confirms userspace is correctly indicated of
a guest shutdown not previous behavior of an EINVAL from KVM_RUN.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Alper Gun <alpergun@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: kvm@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Peter Gonda <pgonda@google.com>
Tested-by: Pratik R. Sampat <pratikrajesh.sampat@amd.com>
Link: https://lore.kernel.org/r/20240709182936.146487-1-pgonda@google.com
[sean: clobber IDT to ensure #UD leads to SHUTDOWN]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 19:37:11 -07:00
Sean Christopherson
c0d1a39d1d KVM: selftests: Always unlink memory regions when deleting (VM free)
Unlink memory regions when freeing a VM, even though it's not strictly
necessary since all tracking structures are freed soon after.  The time
spent deleting entries is negligible, and not unlinking entries is
confusing, e.g. it's easy to overlook that the tree structures are
freed by the caller.

Link: https://lore.kernel.org/r/20240802201429.338412-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 19:01:42 -07:00
Sean Christopherson
ce3b90bd0a KVM: selftests: Remove unused kvm_memcmp_hva_gva()
Remove sefltests' kvm_memcmp_hva_gva(), which has literally never had a
single user since it was introduced by commit 783e9e5126 ("kvm:
selftests: add API testing infrastructure").

Link: https://lore.kernel.org/r/20240802200853.336512-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 19:01:22 -07:00
Juntong Deng
7c5f7b16fe selftests/bpf: Add tests for iter next method returning valid pointer
This patch adds test cases for iter next method returning valid
pointer, which can also used as usage examples.

Currently iter next method should return valid pointer.

iter_next_trusted is the correct usage and test if iter next method
return valid pointer. bpf_iter_task_vma_next has KF_RET_NULL flag,
so the returned pointer may be NULL. We need to check if the pointer
is NULL before using it.

iter_next_trusted_or_null is the incorrect usage. There is no checking
before using the pointer, so it will be rejected by the verifier.

iter_next_rcu and iter_next_rcu_or_null are similar test cases for
KF_RCU_PROTECTED iterators.

iter_next_rcu_not_trusted is used to test that the pointer returned by
iter next method of KF_RCU_PROTECTED iterator cannot be passed in
KF_TRUSTED_ARGS kfuncs.

iter_next_ptr_mem_not_trusted is used to test that base type
PTR_TO_MEM should not be combined with type flag PTR_TRUSTED.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB5848709758F6922F02AF9F1F99962@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-29 18:52:16 -07:00
Martin KaFai Lau
cada0bdcc4 selftests/bpf: Test epilogue patching when the main prog has multiple BPF_EXIT
This patch tests the epilogue patching when the main prog has
multiple BPF_EXIT. The verifier should have patched the 2nd (and
later) BPF_EXIT with a BPF_JA that goes back to the earlier
patched epilogue instructions.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-10-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-29 18:15:45 -07:00
Martin KaFai Lau
42fdbbde6c selftests/bpf: A pro/epilogue test when the main prog jumps back to the 1st insn
This patch adds a pro/epilogue test when the main prog has a goto insn
that goes back to the very first instruction of the prog. It is
to test the correctness of the adjust_jmp_off(prog, 0, delta)
after the verifier has applied the prologue and/or epilogue patch.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-29 18:15:45 -07:00
Martin KaFai Lau
b191b0fd74 selftests/bpf: Add tailcall epilogue test
This patch adds a gen_epilogue test to test a main prog
using a bpf_tail_call.

A non test_loader test is used. The tailcall target program,
"test_epilogue_subprog", needs to be used in a struct_ops map
before it can be loaded. Another struct_ops map is also needed
to host the actual "test_epilogue_tailcall" struct_ops program
that does the bpf_tail_call. The earlier test_loader patch
will attach all struct_ops maps but the bpf_testmod.c does
not support >1 attached struct_ops.

The earlier patch used the test_loader which has already covered
checking for the patched pro/epilogue instructions. This is done
by the __xlated tag.

This patch goes for the regular skel load and syscall test to do
the tailcall test that can also allow to directly pass the
the "struct st_ops_args *args" as ctx_in to the
SEC("syscall") program.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-8-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-29 18:15:45 -07:00
Martin KaFai Lau
47e69431b5 selftests/bpf: Test gen_prologue and gen_epilogue
This test adds a new struct_ops "bpf_testmod_st_ops" in bpf_testmod.
The ops of the bpf_testmod_st_ops is triggered by new kfunc calls
"bpf_kfunc_st_ops_test_*logue". These new kfunc calls are
primarily used by the SEC("syscall") program. The test triggering
sequence is like:
    SEC("syscall")
    syscall_prologue(struct st_ops_args *args)
        bpf_kfunc_st_op_test_prologue(args)
	    st_ops->test_prologue(args)

.gen_prologue adds 1000 to args->a
.gen_epilogue adds 10000 to args->a
.gen_epilogue will also set the r0 to 2 * args->a.

The .gen_prologue and .gen_epilogue of the bpf_testmod_st_ops
will test the prog->aux->attach_func_name to decide if
it needs to generate codes.

The main programs of the pro_epilogue.c will call a
new kfunc bpf_kfunc_st_ops_inc10 which does "args->a += 10".
It will also call a subprog() which does "args->a += 1".

This patch uses the test_loader infra to check the __xlated
instructions patched after gen_prologue and/or gen_epilogue.
The __xlated check is based on Eduard's example (Thanks!) in v1.

args->a is returned by the struct_ops prog (either the main prog
or the epilogue). Thus, the __retval of the SEC("syscall") prog
is checked. For example, when triggering the ops in the
'SEC("struct_ops/test_epilogue") int test_epilogue'
The expected args->a is +1 (subprog call) + 10 (kfunc call)
    	     	     	+ 10000 (.gen_epilogue) = 10011.
The expected return value is 2 * 10011 (.gen_epilogue).

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-7-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-29 18:15:45 -07:00
Eduard Zingerman
a0dbf6d0b2 selftests/bpf: attach struct_ops maps before test prog runs
In test_loader based tests to bpf_map__attach_struct_ops()
before call to bpf_prog_test_run_opts() in order to trigger
bpf_struct_ops->reg() callbacks on kernel side.
This allows to use __retval macro for struct_ops tests.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-29 18:15:45 -07:00
Sean Christopherson
5a7c7d148e KVM: selftests: Play nice with AMD's AVIC errata
When AVIC, and thus IPI virtualization on AMD, is enabled, the CPU will
virtualize ICR writes.  Unfortunately, the CPU doesn't do a very good job,
as it fails to clear the BUSY bit and also allows writing ICR2[23:0],
despite them being "RESERVED MBZ".  Account for the quirky behavior in
the xapic_state test to avoid failures in a configuration that likely has
no hope of ever being enabled in production.

Link: https://lore.kernel.org/r/20240719235107.3023592-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 16:25:06 -07:00
Sean Christopherson
0cb26ec320 KVM: selftests: Verify the guest can read back the x2APIC ICR it wrote
Now that the BUSY bit mess is gone (for x2APIC), verify that the *guest*
can read back the ICR value that it wrote.  Due to the divergent
behavior between AMD and Intel with respect to the backing storage of the
ICR in the vAPIC page, emulating a seemingly simple MSR write is quite
complex.

Link: https://lore.kernel.org/r/20240719235107.3023592-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 16:25:06 -07:00
Sean Christopherson
3426cb48ad KVM: selftests: Test x2APIC ICR reserved bits
Actually test x2APIC ICR reserved bits instead of deliberately skipping
them.  The behavior that is observed when IPI virtualization is enabled is
the architecturally correct behavior, KVM is the one who was wrong, i.e.
KVM was missing reserved bit checks.

Fixes: 4b88b1a518 ("KVM: selftests: Enhance handling WRMSR ICR register in x2APIC mode")
Link: https://lore.kernel.org/r/20240719235107.3023592-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 16:25:06 -07:00
Sean Christopherson
faf06a2382 KVM: selftests: Skip ICR.BUSY test in xapic_state_test if x2APIC is enabled
Don't test the ICR BUSY bit when x2APIC is enabled as AMD and Intel have
different behavior (AMD #GPs, Intel ignores), and the fact that the CPU
performs the reserved bit checks when IPI virtualization is enabled makes
it impossible for KVM to precisely emulate one or the other.

Link: https://lore.kernel.org/r/20240719235107.3023592-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 16:25:06 -07:00
Sean Christopherson
f2e91e8741 KVM: selftests: Add x86 helpers to play nice with x2APIC MSR #GPs
Add helpers to allow and expect #GP on x2APIC MSRs, and opportunistically
have the existing helper spit out a more useful error message if an
unexpected exception occurs.

Link: https://lore.kernel.org/r/20240719235107.3023592-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 16:25:06 -07:00
Sean Christopherson
ed24ba6c2c KVM: selftests: Report unhandled exceptions on x86 as regular guest asserts
Now that selftests support printf() in the guest, report unexpected
exceptions via the regular assertion framework.  Exceptions were special
cased purely to provide a better error message.  Convert only x86 for now,
as it's low-hanging fruit (already formats the assertion in the guest),
and converting x86 will allow adding asserts in x86 library code without
needing to update multiple tests.

Once all other architectures are converted, this will allow moving the
reporting to common code, which will in turn allow adding asserts in
common library code, and will also allow removing UCALL_UNHANDLED.

Link: https://lore.kernel.org/r/20240719235107.3023592-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 16:25:06 -07:00
Sean Christopherson
d1c2cdca5a KVM: selftests: Open code vcpu_run() equivalent in guest_printf test
Open code a version of vcpu_run() in the guest_printf test in anticipation
of adding UCALL_ABORT handling to _vcpu_run().  The guest_printf test
intentionally generates asserts to verify the output, and thus needs to
bypass common assert handling.

Open code a helper in the guest_printf test, as it's not expected that any
other test would want to skip _only_ the UCALL_ABORT handling.

Link: https://lore.kernel.org/r/20240719235107.3023592-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-29 16:25:06 -07:00
Jon Maloy
b24d22ac74 selftests: add selftest for tcp SO_PEEK_OFF support
We add a selftest to check that the new feature added in
commit 05ea491641 ("tcp: add support for SO_PEEK_OFF socket option")
works correctly.

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Link: https://patch.msgid.link/20240828183752.660267-3-jmaloy@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-29 13:00:37 -07:00
Jakub Kicinski
3d8806f37d tools: ynl: error check scanf() in a sample
Someone reported on GitHub that the YNL NIPA test is failing
when run locally. The test builds the tools, and it hits:

  netdev.c:82:9: warning: ignoring return value of ‘scanf’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
  82 | scanf("%d", &ifindex);

I can't repro this on my setups but error seems clear enough.

Link: https://github.com/linux-netdev/nipa/discussions/37
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://patch.msgid.link/20240828173609.2951335-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-29 12:43:29 -07:00
Amery Hung
bd0b4836a2 selftests/bpf: Make sure stashed kptr in local kptr is freed recursively
When dropping a local kptr, any kptr stashed into it is supposed to be
freed through bpf_obj_free_fields->__bpf_obj_drop_impl recursively. Add a
test to make sure it happens.

The test first stashes a referenced kptr to "struct task" into a local
kptr and gets the reference count of the task. Then, it drops the local
kptr and reads the reference count of the task again. Since
bpf_obj_free_fields and __bpf_obj_drop_impl will go through the local kptr
recursively during bpf_obj_drop, the dtor of the stashed task kptr should
eventually be called. The second reference count should be one less than
the first one.

Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240827011301.608620-1-amery.hung@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-29 12:18:26 -07:00
Ian Rogers
e9a7053da3 perf header: Allow attributes to be written after data
With a file, to write data an offset needs to be known. Typically data
follows the event attributes in a file.

However, if processing a pipe the number of event attributes may not be
known.

It is convenient in that case to write the attributes after the data.

Expand perf_session__do_write_header() to allow this when the data
offset and size are known.

This approach may be useful for more than just taking a pipe file to
write into a data file, `perf inject --itrace` will reserve and
additional 8kb for attributes, which would be unnecessary if the
attributes were written after the data.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 16:17:43 -03:00
Ian Rogers
10df481fda perf header: Fail read if header sections overlap
Buggy perf.data files can have the attributes and data
overlapping.

For example, when processing pipe data the attributes aren't known and
so file offset header calculations can consider them not present.

Later this can cause the attributes to overwrite the data. This can be
seen in:

  $ perf record -o - true > a.data
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.059 MB - ]
  $ perf inject -i a.data -o b.data
  $ perf report --stats -i b.data
  0x68 [0]: failed to process type: 510379 [Invalid argument]
  Error:
  failed to process sample
  $

This change makes reading the corrupt file fail:

  $ perf report --stats -i b.data
  Perf file header corrupt: Attributes and data overlap
  incompatible file format (rerun with -v to learn more)
  $

Which is more informative.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 16:15:29 -03:00
Ian Rogers
d71bbe799c perf header: Add kerneldoc to 'struct perf_file_header'
Some of the values are a little strange so add documentation to
resolve ambiguity.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 16:14:24 -03:00
Ian Rogers
d9c993100e perf session: Document 'struct perf_session' and constify its 'auxtrace' member
perf_session is a central data structure to the tool so let's comment
it. The auxtrace callbacks are never modified in session so constify.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 16:13:26 -03:00
James Clark
022aa67b5a perf: cs-etm: Print queue number in raw trace dump
Now that we have overlapping trace IDs it's also useful to know what the
queue number is to be able to distinguish the source of the trace so
print it inline. Hide it behind the -v option because it might not be
obvious to users what the queue number is.

Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-8-james.clark@linaro.org
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 15:56:37 -03:00
James Clark
1506af6db8 perf: cs-etm: Support version 0.1 of HW_ID packets
v0.1 HW_ID packets have a new field that describes which sink each CPU
writes to. Use the sink ID to link trace ID maps to each other so that
mappings are shared wherever the sink is shared.

Also update the error message to show that overlapping IDs aren't an
error in per-thread mode, just not supported. In the future we can
use the CPU ID from the AUX records, or watch for changing sink IDs on
HW_ID packets to use the correct decoders.

Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-7-james.clark@linaro.org
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 15:56:13 -03:00
James Clark
940007cee5 perf: cs-etm: Only save valid trace IDs into files
This isn't a bug because Perf always masks with
CORESIGHT_TRACE_ID_VAL_MASK before using these values, but to avoid it
looking like it could be, make an effort to not save bad values.

Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-6-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 15:55:52 -03:00
James Clark
19c3e4db38 perf: cs-etm: Create decoders based on the trace ID mappings
Now that each queue has a unique set of trace ID mappings, use this
list to create the decoders. In unformatted mode just add a single
mapping so only one decoder is made.

Previously each queue would have a decoder created for each traced CPU
on the system but this won't work anymore because CPUs can have
overlapping trace IDs.

This also means that the CORESIGHT_TRACE_ID_UNUSED_FLAG isn't needed
any more. If mappings aren't added then decoders aren't created, rather
than needing a flag to suppress creation.

Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-5-james.clark@linaro.org
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 15:55:24 -03:00
James Clark
77c123f53e perf: cs-etm: Move traceid_list to each queue
The global list won't work for per-sink trace ID allocations, so put a
list in each queue where the IDs will be unique to that queue.

To keep the same behavior as before, for version 0 of the HW_ID packets,
copy all the HW_ID mappings into all queues.

This change doesn't effect the decoders, only trace ID lookups on the
Perf side. The decoders are still created with global mappings which
will be fixed in a later commit.

Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-4-james.clark@linaro.org
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 15:54:40 -03:00
Jakub Kicinski
3cbd2090d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/faraday/ftgmac100.c
  4186c8d9e6 ("net: ftgmac100: Ensure tx descriptor updates are visible")
  e24a6c8746 ("net: ftgmac100: Get link speed and duplex for NC-SI")
https://lore.kernel.org/0b851ec5-f91d-4dd3-99da-e81b98c9ed28@kernel.org

net/ipv4/tcp.c
  bac76cf898 ("tcp: fix forever orphan socket caused by tcp_abort")
  edefba66d9 ("tcp: rstreason: introduce SK_RST_REASON_TCP_STATE for active reset")
https://lore.kernel.org/20240828112207.5c199d41@canb.auug.org.au

No adjacent changes.

Link: https://patch.msgid.link/20240829130829.39148-1-pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-29 11:49:10 -07:00
Linus Torvalds
0dd5dd63ba Including fixes from bluetooth, wireless and netfilter.
No known outstanding regressions.
 
 Current release - regressions:
 
   - wifi: iwlwifi: fix hibernation
 
   - eth: ionic: prevent tx_timeout due to frequent doorbell ringing
 
 Previous releases - regressions:
 
   - sched: fix sch_fq incorrect behavior for small weights
 
   - wifi:
     - iwlwifi: take the mutex before running link selection
     - wfx: repair open network AP mode
 
   - netfilter: restore IP sanity checks for netdev/egress
 
   - tcp: fix forever orphan socket caused by tcp_abort
 
   - mptcp: close subflow when receiving TCP+FIN
 
   - bluetooth: fix random crash seen while removing btnxpuart driver
 
 Previous releases - always broken:
 
   - mptcp: more fixes for the in-kernel PM
 
   - eth: bonding: change ipsec_lock from spin lock to mutex
 
   - eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response
 
 Misc:
 
   - documentation: drop special comment style for net code
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmbQcqISHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkOJkP+QHZx2LCilc0uvrYkqWBz7aYEigISK+6
 NdGiF/c9FO/dvmisUbs7i48TXKplHu56bR0YTVm2pdKUNcXO5jUgy+s4n9uncsCF
 /Cq8WnaXJ3THqKKNlMnSeTJ1URE47iagI+LdX4g9a5HE5GgrORcHm4mfcn7m68EP
 pZ+TaPDw9jp+o+1nkpqgPe8Vdz1dPlqC1S2KQMl0S60WcSlYgDpVUtVU5m2mitJ1
 giNHXcU2UXxFFyvqhHXyQIFIkKU6sNbD32cm9VXDMw680KjmBM63Fz5EIiZospkz
 efQWHn/xwqZNEOLdN5sPUtKLv02D8sTusfTaGWaNulmNd346ABrkS+fDjHqRxLFb
 OBKXOn4AG1OPCs4MgrgtJf7mrcwJErbE/21dLqGPZWkOzqbe+7r8pXn0XtS9m+gR
 V0IkSpwrS1D394nP2TVmw0B+LwXMA3ItzAjNa8Lemr7TVEpAmuCdAz2XiR13KZfw
 B0DBtWCWr5skgWZdlhofi71OOXC9bWIq75i+aSZR/KXqRj4I38OsargSVNQve/H4
 OmOitt1w0ZRhmb+HmW+xgIffikGsi9YEO165UUzUQCQVn00r0EHP8T3Q2d1/2Hzx
 FtnaYFsBAI2TV2m2DTuJNWc0Fa3G08tJWkoEqbrWeAuebOk0oKWExsXzVfMOrig1
 a9nNA98DmlN3
 =yJIE
 -----END PGP SIGNATURE-----

Merge tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth, wireless and netfilter.

  No known outstanding regressions.

  Current release - regressions:

   - wifi: iwlwifi: fix hibernation

   - eth: ionic: prevent tx_timeout due to frequent doorbell ringing

  Previous releases - regressions:

   - sched: fix sch_fq incorrect behavior for small weights

   - wifi:
      - iwlwifi: take the mutex before running link selection
      - wfx: repair open network AP mode

   - netfilter: restore IP sanity checks for netdev/egress

   - tcp: fix forever orphan socket caused by tcp_abort

   - mptcp: close subflow when receiving TCP+FIN

   - bluetooth: fix random crash seen while removing btnxpuart driver

  Previous releases - always broken:

   - mptcp: more fixes for the in-kernel PM

   - eth: bonding: change ipsec_lock from spin lock to mutex

   - eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response

  Misc:

   - documentation: drop special comment style for net code"

* tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  nfc: pn533: Add poll mod list filling check
  mailmap: update entry for Sriram Yagnaraman
  selftests: mptcp: join: check re-re-adding ID 0 signal
  mptcp: pm: ADD_ADDR 0 is not a new address
  selftests: mptcp: join: validate event numbers
  mptcp: avoid duplicated SUB_CLOSED events
  selftests: mptcp: join: check re-re-adding ID 0 endp
  mptcp: pm: fix ID 0 endp usage after multiple re-creations
  mptcp: pm: do not remove already closed subflows
  selftests: mptcp: join: no extra msg if no counter
  selftests: mptcp: join: check re-adding init endp with != id
  mptcp: pm: reset MPC endp ID when re-added
  mptcp: pm: skip connecting to already established sf
  mptcp: pm: send ACK on an active subflow
  selftests: mptcp: join: check removing ID 0 endpoint
  mptcp: pm: fix RM_ADDR ID for the initial subflow
  mptcp: pm: reuse ID 0 after delete and re-add
  net: busy-poll: use ktime_get_ns() instead of local_clock()
  sctp: fix association labeling in the duplicate COOKIE-ECHO case
  mptcp: pr_debug: add missing \n at the end
  ...
2024-08-30 06:14:39 +12:00
Andrii Nakryiko
c634d6f4e1 libbpf: Fix bpf_object__open_skeleton()'s mishandling of options
We do an ugly copying of options in bpf_object__open_skeleton() just to
be able to set object name from skeleton's recorded name (while still
allowing user to override it through opts->object_name).

This is not just ugly, but it also is broken due to memcpy() that
doesn't take into account potential skel_opts' and user-provided opts'
sizes differences due to backward and forward compatibility. This leads
to copying over extra bytes and then failing to validate options
properly. It could, technically, lead also to SIGSEGV, if we are unlucky.

So just get rid of that memory copy completely and instead pass
default object name into bpf_object_open() directly, simplifying all
this significantly. The rule now is that obj_name should be non-NULL for
bpf_object_open() when called with in-memory buffer, so validate that
explicitly as well.

We adopt bpf_object__open_mem() to this as well and generate default
name (based on buffer memory address and size) outside of bpf_object_open().

Fixes: d66562fba1 ("libbpf: Add BPF object skeleton support")
Reported-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Daniel Müller <deso@posteo.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240827203721.1145494-1-andrii@kernel.org
2024-08-29 17:47:27 +02:00
James Clark
57880a7966 perf: cs-etm: Allocate queues for all CPUs
Make cs_etm__setup_queue() setup a queue even if it's empty, and
pre-allocate queues based on the max CPU that was recorded. In per-CPU
mode aux queues are indexed based on CPU ID even if all CPUs aren't
recorded, sparse queue arrays aren't used.

This will allow HW_IDs to be saved even if no aux data was received in
that queue without having to call cs_etm__setup_queue() from two
different places.

Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-3-james.clark@linaro.org
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 12:34:55 -03:00
James Clark
b6aa0de9a5 perf cs-etm: Create decoders after both AUX and HW_ID search passes
Both of these passes gather information about how to create the
decoders. AUX records determine formatted/unformatted, and the HW_IDs
determine the traceID/metadata mappings.

Therefore it makes sense to cache the information and wait until both
passes are over until creating the decoders, rather than creating them
at the first HW_ID found.

This will allow a simplification of the creation process where
cs_etm_queue->traceid_list will exclusively used to create the decoders,
rather than the current two methods depending on whether the trace is
formatted or not.

Previously the sample CPU from the AUX record was used to initialize
the decoder CPU, but actually sample CPU == AUX queue index in per-CPU
mode, so saving the sample CPU isn't required.

Similarly formatted/unformatted was used upfront to create the decoders,
but now it's cached until later.

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-2-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 12:33:02 -03:00
Arnaldo Carvalho de Melo
0fd77ae4a3 Revert "tools build: Remove leftover libcap tests that prevents fast path feature detection from working"
Ian pointed out that the libcap feature test is also used by bpftool, so
we can't remove it just because perf stopped using it, revert the
removal of the feature test.

Since both perf and libcap uses the fast path feature detection
(tools/build/feature/test-all.c), probably the best thing is to keep
libcap-devel when building perf even it not being used there.

This reverts commit 47b3b6435e.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29 11:51:43 -03:00
Palmer Dabbelt
84cfab9a18
Merge patch series "riscv: mm: Do not restrict mmap address based on hint"
Charlie Jenkins <charlie@rivosinc.com> says:

There have been a couple of reports that using the hint address to
restrict the address returned by mmap hint address has caused issues in
applications. A different solution for restricting addresses returned by
mmap is necessary to avoid breakages.

[Palmer: This also just wasn't doing the right thing in the first place,
as it didn't handle the sv39 cases we were trying to deal with.]

* b4-shazam-merge:
  riscv: mm: Do not restrict mmap address based on hint
  riscv: selftests: Remove mmap hint address checks
  Revert "RISC-V: mm: Document mmap changes"

Link: https://lore.kernel.org/r/20240826-riscv_mmap-v1-0-cd8962afe47f@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-29 06:22:51 -07:00
Charlie Jenkins
83dae72ac0
riscv: selftests: Remove mmap hint address checks
The mmap behavior that restricts the addresses returned by mmap caused
unexpected behavior, so get rid of the test cases that check that
behavior.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Fixes: 73d05262a2 ("selftests: riscv: Generalize mm selftests")
Link: https://lore.kernel.org/r/20240826-riscv_mmap-v1-2-cd8962afe47f@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-29 06:03:28 -07:00
Matthieu Baerts (NGI0)
f18fa2abf8 selftests: mptcp: join: check re-re-adding ID 0 signal
This test extends "delete re-add signal" to validate the previous
commit: when the 'signal' endpoint linked to the initial subflow (ID 0)
is re-added multiple times, it will re-send the ADD_ADDR with id 0. The
client should still be able to re-create this subflow, even if the
add_addr_accepted limit has been reached as this special address is not
considered as a new address.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: d0876b2284 ("mptcp: add the incoming RM_ADDR support")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)
20ccc7c5f7 selftests: mptcp: join: validate event numbers
This test extends "delete and re-add" and "delete re-add signal" to
validate the previous commit: the number of MPTCP events are checked to
make sure there are no duplicated or unexpected ones.

A new helper has been introduced to easily check these events. The
missing events have been added to the lib.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: b911c97c7d ("mptcp: add netlink event support")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)
d397d7246c selftests: mptcp: join: check re-re-adding ID 0 endp
This test extends "delete and re-add" to validate the previous commit:
when the endpoint linked to the initial subflow (ID 0) is re-added
multiple times, it was no longer being used, because the internal linked
counters are not decremented for this special endpoint: it is not an
additional endpoint.

Here, the "del/add id 0" steps are done 3 times to unsure this case is
validated.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 3ad14f54bd ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)
76a2d8394c selftests: mptcp: join: no extra msg if no counter
The checksum and fail counters might not be available. Then no need to
display an extra message with missing info.

While at it, fix the indentation around, which is wrong since the same
commit.

Fixes: 47867f0a7e ("selftests: mptcp: join: skip check if MIB counter not supported")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)
1c2326fcae selftests: mptcp: join: check re-adding init endp with != id
The initial subflow has a special local ID: 0. It is specific per
connection.

When a global endpoint is deleted and re-added later, it can have a
different ID, but the kernel should still use the ID 0 if it corresponds
to the initial address.

This test validates this behaviour: the endpoint linked to the initial
subflow is removed, and re-added with a different ID.

Note that removing the initial subflow will not decrement the 'subflows'
counters, which corresponds to the *additional* subflows. On the other
hand, when the same endpoint is re-added, it will increment this
counter, as it will be seen as an additional subflow this time.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 3ad14f54bd ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)
5f94b08c00 selftests: mptcp: join: check removing ID 0 endpoint
Removing the endpoint linked to the initial subflow should trigger a
RM_ADDR for the right ID, and the removal of the subflow. That's what is
now being verified in the "delete and re-add" test.

Note that removing the initial subflow will not decrement the 'subflows'
counters, which corresponds to the *additional* subflows. On the other
hand, when the same endpoint is re-added, it will increment this
counter, as it will be seen as an additional subflow this time.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 3ad14f54bd ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-29 10:39:49 +02:00
Benjamin Tissoires
321f7798cf selftests/hid: Add HIDIOCREVOKE tests
Add 4 tests for the new revoke ioctl, for read/write/ioctl and poll.

Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Link: https://patch.msgid.link/20240827-hidraw-revoke-v5-4-d004a7451aea@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-08-29 10:39:37 +02:00
Benjamin Tissoires
8163892a62 selftests/hid: Add initial hidraw tests skeleton
Largely inspired from hid_bpf.c for the fixture setup.

Create a couple of tests for hidraw:
- create a uhid device and check if the fixture is working properly
- inject one uhid event and read it through hidraw

These tests are not that useful for now, but will be once we start adding
the ioctl and BPFs to revoke the hidraw node.

Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Link: https://patch.msgid.link/20240827-hidraw-revoke-v5-3-d004a7451aea@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-08-29 10:39:37 +02:00
Benjamin Tissoires
375e9bde9f selftests/hid: extract the utility part of hid_bpf.c into its own header
When adding new tests programs, we need the same mechanics to create
new virtual devices, and read from their matching hidraw node.

Extract the common part into its own header so we can easily add new
tests C-files.

Link: https://patch.msgid.link/20240827-hidraw-revoke-v5-2-d004a7451aea@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-08-29 10:39:37 +02:00
Florian Westphal
0a8b08c554 selftests: netfilter: nft_queue.sh: reduce test file size for debug build
The sctp selftest is very slow on debug kernels.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20240826192500.32efa22c@kernel.org/
Fixes: 4e97d521c2 ("selftests: netfilter: nft_queue.sh: sctp coverage")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://patch.msgid.link/20240827090023.8917-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-29 10:35:35 +02:00
Oliver Upton
4641c7ea88 KVM: arm64: selftests: Cope with lack of GICv3 in set_id_regs
Broonie reports that the set_id_regs test is failing as of commit
5cb57a1aff ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is
presented to the guest"). The test does not anticipate the 'late' ID
register fixup where KVM clobbers the GIC field in absence of GICv3.

While the field technically has FTR_LOWER_SAFE behavior, fix the issue
by setting it to an exact value of 0, matching the effect of the 'late'
fixup.

Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240829004622.3058639-1-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-29 08:34:03 +01:00
Juntong Deng
6db59c4935 selftests/bpf: Add test for zero offset or non-zero offset pointers as KF_ACQUIRE kfuncs argument
This patch adds test cases for zero offset (implicit cast) or non-zero
offset pointer as KF_ACQUIRE kfuncs argument. Currently KF_ACQUIRE
kfuncs should support passing in pointers like &sk->sk_write_queue
(non-zero offset) or &sk->__sk_common (zero offset) and not be rejected
by the verifier.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB5848CB6F0D4D9068669A905B99952@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-28 17:11:54 -07:00
Arnaldo Carvalho de Melo
47b3b6435e tools build: Remove leftover libcap tests that prevents fast path feature detection from working
I noticed that the fast path feature detection was failing:

  $ cat /tmp/build/perf-tools-next/feature/test-all.make.output
  /usr/bin/ld: cannot find -lcap: No such file or directory
  collect2: error: ld returned 1 exit status
  $

The patch removing the dependency (Fixes tag below) didn't remove the
detection of libcap, and as the fast path feature detection (test-all.c)
had -lcap in its Makefile link list of libraries to link, it was failing
when libcap-devel is not available, fix it by removing those leftover
files.

Fixes: e25ebda78e ("perf cap: Tidy up and improve capability testing")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zs-gjOGFWtAvIZit@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-28 19:11:36 -03:00
Namhyung Kim
d56a4d56a2 perf test: Add 'perf record cgroup' filtering test
$ sudo ./perf test filtering -vv
   96: perf record sample filtering (by BPF) tests:
  --- start ---
  test child forked, pid 2966908
  Checking BPF-filter privilege
  Basic bpf-filter test
  Basic bpf-filter test [Success]
  Failing bpf-filter test
  Failing bpf-filter test [Success]
  Group bpf-filter test
  Group bpf-filter test [Success]
  Multiple bpf-filter test
  Multiple bpf-filter test [Success]
  Cgroup bpf-filter test
  Cgroup bpf-filter test [Success]
  ---- end(0) ----
   96: perf record sample filtering (by BPF) tests                     : Ok

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-28 18:22:27 -03:00
Namhyung Kim
91e88437d5 perf bpf-filter: Support filtering on cgroups
The new cgroup filter can take either of '==' or '!=' operator and a
pathname for the target cgroup.

  $ perf record -a --all-cgroups -e cycles --filter 'cgroup == /abc/def' -- sleep 1

Users should have --all-cgroups option in the command line to enable
cgroup filtering.  Technically it doesn't need to have the option as
it can get the current task's cgroup info directly from BPF.  But I want
to follow the convention for the other sample info.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-28 18:21:49 -03:00
Namhyung Kim
591156f25f perf bpf-filter: Add build dependency to header files
The flex and bison files need to be recompiled when one of these header
filters are changed.

 * util/bpf-filter.h
 * util/bpf_skel/sample-filter.h

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-28 18:21:24 -03:00
Namhyung Kim
9af2efee41 perf report: Fix segfault when 'sym' sort key is not used
The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.

So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage.  So it shouldn't access it
unconditionally.

I got a segfault, when I wanted to see cgroup profiles.

  $ sudo perf record -a --all-cgroups --synth=cgroup true

  $ sudo perf report -s cgroup

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  48		return RC_CHK_ACCESS(map)->dso;
  (gdb) bt
  #0  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  #1  0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
  #2  0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
  #3  0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
      at util/hist.c:644
  #4  0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
  #5  0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
  #6  0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
  #7  0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
      at util/hist.c:1260
  #8  0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
      machine=0x5555560388e8) at builtin-report.c:334
  #9  0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
  #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
  #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
      file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
  #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
  #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
  #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
  #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
  #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
      at util/session.c:780
  #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
      file_path=0x555556038ff0 "perf.data") at util/session.c:1406

As you can see the entry->ms.map was NULL even if he->ms.map has a
value.  This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same.  I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).

Fixes: ac01c8c424 ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-28 18:20:38 -03:00
James Clark
6f87543c74 perf test trace_btf_enum: Fix shellcheck warning
Shellcheck versions < v0.7.2 can't follow this path so add the helper to
fix the following warning:

  In tests/shell/trace_btf_enum.sh line 13:
  . "$(dirname $0)"/lib/probe.sh
  ^--------------------------^ SC1090: Can't follow non-constant source.
  Use a directive to specify location.

Fixes: d66763fed3 ("perf test trace_btf_enum: Add regression test for the BTF augmentation of enums in 'perf trace'")
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240809095426.3065163-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-28 18:18:33 -03:00
Leo Yan
d5726f1c8d perf auxtrace: Remove unused 'pmu' pointer from struct auxtrace_record
The 'pmu' pointer in the auxtrace_record structure is not used after
support multiple AUX events, remove it.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240806204130.720977-3-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-28 18:15:16 -03:00
Leo Yan
c87826ddce perf auxtrace: Use evsel__is_aux_event() for checking AUX event
Use evsel__is_aux_event() to decide if an event is a AUX event, this is
a refactoring to replace comparing the PMU type.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240806204130.720977-2-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-28 18:14:42 -03:00