Commit Graph

15 Commits

Author SHA1 Message Date
Howard Chu
395d38419f perf trace augmented_raw_syscalls: Add more checks to pass the verifier
Add some more checks to pass the verifier in more kernels.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241011021403.4089793-3-howardchu95@gmail.com
[ Reduced the patch removing things that can be done later ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23 11:34:56 -03:00
Arnaldo Carvalho de Melo
ecabac70ff perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers
In a RHEL8 kernel (4.18.0-513.11.1.el8_9.x86_64), that, as enterprise
kernels go, have backports from modern kernels, the verifier complains
about lack of bounds check for the index into the array of syscall
arguments, on a BPF bytecode generated by clang 17, with:

  ; } else if (size < 0 && size >= -6) { /* buffer */
  116: (b7) r1 = -6
  117: (2d) if r1 > r6 goto pc-30
   R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv-6 R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6=inv(id=0,umin_value=18446744073709551610,var_off=(0xffffffff00000000; 0xffffffff)) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value
  ; index = -(size + 1);
  118: (a7) r6 ^= -1
  119: (67) r6 <<= 32
  120: (77) r6 >>= 32
  ; aug_size = args->args[index];
  121: (67) r6 <<= 3
  122: (79) r1 = *(u64 *)(r10 -24)
  123: (0f) r1 += r6
  last_idx 123 first_idx 116
  regs=40 stack=0 before 122: (79) r1 = *(u64 *)(r10 -24)
  regs=40 stack=0 before 121: (67) r6 <<= 3
  regs=40 stack=0 before 120: (77) r6 >>= 32
  regs=40 stack=0 before 119: (67) r6 <<= 32
  regs=40 stack=0 before 118: (a7) r6 ^= -1
  regs=40 stack=0 before 117: (2d) if r1 > r6 goto pc-30
  regs=42 stack=0 before 116: (b7) r1 = -6
   R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv1 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_w=inv(id=0) R5_w=inv40 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=0) R7_w=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8_w=invP6 R9_w=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_w=map_value fp-24_r=map_value fp-32_w=inv40 fp-40=ctx fp-48=map_value fp-56_w=inv1 fp-64_w=map_value fp-72=map_value fp-80=map_value
  parent didn't have regs=40 stack=0 marks
  last_idx 110 first_idx 98
  regs=40 stack=0 before 110: (6d) if r1 s> r6 goto pc+5
  regs=42 stack=0 before 109: (b7) r1 = 1
  regs=40 stack=0 before 108: (65) if r6 s> 0x1000 goto pc+7
  regs=40 stack=0 before 98: (55) if r6 != 0x1 goto pc+9
   R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=invP12 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_rw=inv(id=0) R5_w=inv24 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=2147483647) R7_w=map_value(id=0,off=40,ks=4,vs=8272,imm=0) R8_rw=invP4 R9_w=map_value(id=0,off=12,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_rw=map_value fp-24_r=map_value fp-32_rw=invP24 fp-40_r=ctx fp-48_r=map_value fp-56_w=invP1 fp-64_rw=map_value fp-72_r=map_value fp-80_r=map_value
  parent already had regs=40 stack=0 marks
  124: (79) r6 = *(u64 *)(r1 +16)
   R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=map_value(id=0,off=0,ks=4,vs=8272,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6_w=invP(id=0,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value
  R1 unbounded memory access, make sure to bounds check any such access
  processed 466 insns (limit 1000000) max_states_per_insn 2 total_states 20 peak_states 20 mark_read 3

If we add this line, as used in other BPF programs, to cap that index:

   index &= 7;

The generated BPF program is considered safe by that version of the BPF
verifier, allowing perf to collect the syscall args in one more kernel
using the BPF based pointer contents collector.

With the above one-liner it works with that kernel:

  [root@dell-per740-01 ~]# uname -a
  Linux dell-per740-01.khw.eng.rdu2.dc.redhat.com 4.18.0-513.11.1.el8_9.x86_64 #1 SMP Thu Dec 7 03:06:13 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
  [root@dell-per740-01 ~]# ~acme/bin/perf trace -e *sleep* sleep 1.234567890
       0.000 (1234.704 ms): sleep/3863610 nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 })                  = 0
  [root@dell-per740-01 ~]#

As well as with the one in Fedora 40:

  root@number:~# uname -a
  Linux number 6.11.3-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 10 22:31:19 UTC 2024 x86_64 GNU/Linux
  root@number:~# perf trace -e *sleep* sleep 1.234567890
       0.000 (1234.722 ms): sleep/14873 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x7ffe87311a40) = 0
  root@number:~#

Song Liu reported that this one-liner was being optimized out by clang
18, so I suggested and he tested that adding a compiler barrier before
it made clang v18 to keep it and the verifier in the kernel in Song's
case (Meta's 5.12 based kernel) also was happy with the resulting
bytecode.

I'll investigate using virtme-ng[1] to have all the perf BPF based
functionality thoroughly tested over multiple kernels and clang
versions.

[1] https://kernel-recipes.org/en/2024/virtme-ng/

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/lkml/Zw7JgJc0LOwSpuvx@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23 11:34:56 -03:00
Howard Chu
a68fd6a6cd perf trace: Collect augmented data using BPF
Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads
TRACE_AUG_MAX_BUF bytes of buffer maximum.

Determine what type of argument and how many bytes to read from user space, us ing the
value in the beauty_map. This is the relation of parameter type and its corres ponding
value in the beauty map, and how many bytes we read eventually:

string: 1                          -> size of string (till null)
struct: size of struct             -> size of struct
buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF)

After reading from user space, we output the augmented data using
bpf_perf_event_output().

If the struct augmenter, augment_sys_enter() failed, we fall back to
using bpf_tail_call().

I have to make the payload 6 times the size of augmented_arg, to pass the
BPF verifier.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-10-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-7-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:20 -03:00
Howard Chu
7f40306728 perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
Set up beauty_map, load it to BPF, in such format: if argument No.3 is a
struct of size 32 bytes (of syscall number 114) beauty_map[114][2] = 32;

if argument No.3 is a string (of syscall number 114) beauty_map[114][2] =
1;

if argument No.3 is a buffer, its size is indicated by argument No.4 (of
syscall number 114) beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this
buffer size in BPF  */

Committer notes:

Moved syscall_arg_fmt__cache_btf_struct() from a ifdef
HAVE_LIBBPF_SUPPORT to closer to where it is used, that is ifdef'ed on
HAVE_BPF_SKEL and thus breaks the build when building with
BUILD_BPF_SKEL=0, as detected using 'make -C tools/perf build-test'.

Also add 'struct beauty_map_enter' to tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
as we're using it in this patch, otherwise we get this while trying to
build at this point in the original patch series:

  builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’:
  builtin-trace.c:3725:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’
   3725 |         int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
        |

We also have to take into account syscall_arg_fmt.from_user when telling
the kernel what to copy in the sys_enter generic collector, we don't
want to collect bogus data in buffers that will only be available to us
at sys_exit time, i.e. after the kernel has filled it, so leave this for
when we have such a sys_exit based collector.

Committer testing:

Not wired up yet, so all continues to work, using the existing BPF
collector and userspace beautifiers that are augmentation aware:

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
       0.000 ( 0.031 ms): mv/20888 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
       0.000 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
       0.040 ( 0.003 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff17980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
       0.480 ( 0.017 ms): ping/20892 sendto(fd: 5, buff: 0x7ffd82d07150, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20
       0.526 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0
       0.542 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
       0.544 ( 0.004 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.135.100 }, addrlen: 16) = 0
       0.559 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.135.100 }, addrlen: 16PING www.google.com (142.251.135.100) 56(84) bytes of data.
  ) = 0
       0.589 ( 0.058 ms): ping/20892 sendto(fd: 3, buff: 0x560b4ff11ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.135.100 }, addr_len: 0x10) = 64
      45.250 ( 0.029 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
      45.344 ( 0.012 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff19340, len: 111, flags: DONTWAIT|NOSIGNAL) = 111
  64 bytes from rio09s08-in-f4.1e100.net (142.251.135.100): icmp_seq=1 ttl=49 time=44.4 ms

  --- www.google.com ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 44.361/44.361/44.361/0.000 ms
  root@number:~#

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-4-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-3-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:51:59 -03:00
Arnaldo Carvalho de Melo
c90a88d33a perf trace: Use a common encoding for augmented arguments, with size + error + payload
We were using a more compact format, without explicitely encoding the
size and possible error in the payload for an argument.

To do it generically, at least as Howard Chu did in his GSoC activities,
it is more convenient to use the same model that was being used for
string arguments, passing { size, error, payload }.

So use that for the non string syscall args we have so far:

  struct timespec
  struct perf_event_attr
  struct sockaddr (this one has even a variable size)

With this in place we have the userspace pretty printers:

  perf_event_attr___scnprintf()
  syscall_arg__scnprintf_augmented_sockaddr()
  syscall_arg__scnprintf_augmented_timespec()

Ready to have the generic BPF collector in tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
sending its generic payload and thus we'll use them instead of a generic
libbpf btf_dump interface that doesn't know about about the sockaddr
mux, perf_event_attr non-trivial fields (sample_type, etc), leaving it
as a (useful) fallback that prints just basic types until we put in
place a more sophisticated pretty printer infrastructure that associates
synthesized enums to struct fields using the header scrapers we have in
tools/perf/trace/beauty/, some of them in this list:

  $ ls tools/perf/trace/beauty/*.sh
  tools/perf/trace/beauty/arch_errno_names.sh
  tools/perf/trace/beauty/kcmp_type.sh
  tools/perf/trace/beauty/perf_ioctl.sh
  tools/perf/trace/beauty/statx_mask.sh
  tools/perf/trace/beauty/clone.sh
  tools/perf/trace/beauty/kvm_ioctl.sh
  tools/perf/trace/beauty/pkey_alloc_access_rights.sh
  tools/perf/trace/beauty/sync_file_range.sh
  tools/perf/trace/beauty/drm_ioctl.sh
  tools/perf/trace/beauty/madvise_behavior.sh
  tools/perf/trace/beauty/prctl_option.sh
  tools/perf/trace/beauty/usbdevfs_ioctl.sh
  tools/perf/trace/beauty/fadvise.sh
  tools/perf/trace/beauty/mmap_flags.sh
  tools/perf/trace/beauty/rename_flags.sh
  tools/perf/trace/beauty/vhost_virtio_ioctl.sh
  tools/perf/trace/beauty/fs_at_flags.sh
  tools/perf/trace/beauty/mmap_prot.sh
  tools/perf/trace/beauty/sndrv_ctl_ioctl.sh
  tools/perf/trace/beauty/x86_arch_prctl.sh
  tools/perf/trace/beauty/fsconfig.sh
  tools/perf/trace/beauty/mount_flags.sh
  tools/perf/trace/beauty/sndrv_pcm_ioctl.sh
  tools/perf/trace/beauty/fsmount.sh
  tools/perf/trace/beauty/move_mount_flags.sh
  tools/perf/trace/beauty/sockaddr.sh
  tools/perf/trace/beauty/fspick.sh
  tools/perf/trace/beauty/mremap_flags.sh
  tools/perf/trace/beauty/socket.sh
  $

Testing it:

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
     0.000 ( 0.031 ms): mv/1193096 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~# perf trace -e *nanosleep sleep 1.2345678901
       0.000 (1234.654 ms): sleep/1192697 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567891 }, rmtp: 0x7ffe1ea80460) = 0
  root@number:~# perf trace -e perf_event_open* perf stat -e cpu-clock sleep 1
       0.000 ( 0.011 ms): perf/1192701 perf_event_open(attr_uptr: { type: 1 (software), size: 136, config: 0 (PERF_COUNT_SW_CPU_CLOCK), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 1192702 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3

   Performance counter stats for 'sleep 1':

                0.51 msec cpu-clock                        #    0.001 CPUs utilized

         1.001242090 seconds time elapsed

         0.000000000 seconds user
         0.001010000 seconds sys

  root@number:~# perf trace -e connect* ping -c 1 bsky.app
       0.000 ( 0.130 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
      23.907 ( 0.006 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.20.108.158 }, addrlen: 16) = 0
      23.915 PING bsky.app (3.20.108.158) 56(84) bytes of data.
  ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.917 ( 0.002 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.12.170.30 }, addrlen: 16) = 0
      23.921 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.923 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 18.217.70.179 }, addrlen: 16) = 0
      23.925 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.927 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.132.20.46 }, addrlen: 16) = 0
      23.930 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.931 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.142.89.165 }, addrlen: 16) = 0
      23.934 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.935 ( 0.002 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 18.119.147.159 }, addrlen: 16) = 0
      23.938 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.940 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.22.38.164 }, addrlen: 16) = 0
      23.942 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.944 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.13.14.133 }, addrlen: 16) = 0
      23.956 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 3.20.108.158 }, addrlen: 16) = 0
  ^C
  --- bsky.app ping statistics ---
  1 packets transmitted, 0 received, 100% packet loss, time 0ms

  root@number:~#

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fW4=2GoP6foAN6qbrCiUzy0a_TzHbd8rvDsakTPfdzvfg@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:17:03 -03:00
Arnaldo Carvalho de Melo
c1632cc5ed perf trace augmented_syscalls.bpf: Move the renameat aumenter to renameat2, temporarily
While trying to shape Howard Chu's generic BPF augmenter transition into
the codebase I got stuck with the renameat2 syscall.

Until I noticed that the attempt at reusing augmenters were making it
use the 'openat' syscall augmenter, that collect just one string syscall
arg, for the 'renameat2' syscall, that takes two strings.

So, for the moment, just to help in this transition period, since
'renameat2' is what is used these days in the 'mv' utility, just make
the BPF collector be associated with the more widely used syscall,
hopefully the transition to Howard's generic BPF augmenter will cure
this, so get this out of the way for now!

So now we still have that odd "reuse", but for something we're not
testing so won't get in the way anymore:

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -vv -e rename* mv 123456 987654 |& grep renameat
  Reusing "openat" BPF sys_enter augmenter for "renameat"
       0.000 ( 0.079 ms): mv/1158612 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~#

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fXjGYs=tpBgETK-P9U-CuXssytk9pSnTXpfphrmmOydWA@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:16:26 -03:00
Arnaldo Carvalho de Melo
a9f4c6c999 perf trace: Collect sys_nanosleep first argument
That is a 'struct timespec' passed from userspace to the kernel as we
can see with a system wide syscall tracing:

  root@number:~# perf trace -e nanosleep
       0.000 (10.102 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
      38.924 (10.077 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     100.177 (10.107 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     139.171 (10.063 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     200.603 (10.105 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     239.399 (10.064 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     300.994 (10.096 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     339.584 (10.067 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     401.335 (10.057 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     439.758 (10.166 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     501.814 (10.110 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     539.983 (10.227 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     602.284 (10.199 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     640.208 (10.105 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     702.662 (10.163 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     740.440 (10.107 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
     802.993 (10.159 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 })                   = 0
  ^Croot@number:~# strace -p 9150 -e nanosleep

If we then use the ptrace method to look at that podman process:

  root@number:~# strace -p 9150 -e nanosleep
  strace: Process 9150 attached
  nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
  nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
  nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
  nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
  nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
  nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
  nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
  ^Cstrace: Process 9150 detached
  root@number:~#

With some changes we can get something closer to the strace output,
still in system wide mode:

  root@number:~# perf config trace.show_arg_names=false
  root@number:~# perf config trace.show_duration=false
  root@number:~# perf config trace.show_timestamp=false
  root@number:~# perf config trace.show_zeros=true
  root@number:~# perf config trace.args_alignment=0
  root@number:~# perf trace -e nanosleep --max-events=10
  podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
  podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
  podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
  podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
  podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
  podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
  podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
  podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
  podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
  podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
  root@number:~#
  root@number:~# perf config
  trace.show_arg_names=false
  trace.show_duration=false
  trace.show_timestamp=false
  trace.show_zeros=true
  trace.args_alignment=0
  root@number:~# cat ~/.perfconfig
  # this file is auto-generated.
  [trace]
  	show_arg_names = false
  	show_duration = false
  	show_timestamp = false
  	show_zeros = true
  	args_alignment = 0
  root@number:~#

This will not get reused by any other syscall as nanosleep is the only
one to have as its first argument a 'struct timespec" pointer argument
passed from userspace to the kernel:

  root@number:~# grep timespec /sys/kernel/tracing/events/syscalls/sys_enter_*/format | grep offset:16
  /sys/kernel/tracing/events/syscalls/sys_enter_nanosleep/format:	field:struct __kernel_timespec * rqtp;	offset:16;	size:8;	signed:0;
  root@number:~#

BTF based pretty printing will simplify all this, but then lets just get
the low hanging fruits first.

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zbq72dJRpOlfRWnf@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:26 -03:00
Arnaldo Carvalho de Melo
29d16de26d perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h
If we instead decide to generate vmlinux.h from BTF info, it will be
there:

  $ pahole timespec64
  struct timespec64 {
  	time64_t                   tv_sec;               /*     0     8 */
  	long int                   tv_nsec;              /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* last cacheline: 16 bytes */
  };

  $

pahole manages to find it from /sys/kernel/btf/vmlinux, that is
generated from the kernel types.

With this linux/bpf.h doesn't need to be included, as its already in the
minimalistic tools/perf/util/bpf_skel/vmlinux/vmlinux.h file or what we
need comes when generating a vmlinux.h file from BTF info, i.e. when
using GEN_VMLINUX_H=1, as noticed by Namyung in a build break before
removing linux/bpf.h.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/Zc_fp6CgDClPhS_O@x1
2024-02-16 15:19:57 -08:00
Thomas Richter
5069211e2f perf trace: Use the right bpf_probe_read(_str) variant for reading user data
Perf test case 111 Check open filename arg using perf trace + vfs_getname
fails on s390. This is caused by a failing function
bpf_probe_read() in file util/bpf_skel/augmented_raw_syscalls.bpf.c.

The root cause is the lookup by address. Function bpf_probe_read()
is used. This function works only for architectures
with ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

On s390 is not possible to determine from the address to which
address space the address belongs to (user or kernel space).

Replace bpf_probe_read() by bpf_probe_read_kernel()
and bpf_probe_read_str() by bpf_probe_read_user_str() to
explicity specify the address space the address refers to.

Output before:
 # ./perf trace -eopen,openat -- touch /tmp/111
 libbpf: prog 'sys_enter': BPF program load failed: Invalid argument
 libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG --
 reg type unsupported for arg#0 function sys_enter#75
 0: R1=ctx(off=0,imm=0) R10=fp0
 ; int sys_enter(struct syscall_enter_args *args)
 0: (bf) r6 = r1           ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
 ; return bpf_get_current_pid_tgid();
 1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
 2: (63) *(u32 *)(r10 -8) = r0 ; R0_w=scalar() R10=fp0 fp-8=????mmmm
 3: (bf) r2 = r10              ; R2_w=fp0 R10=fp0
 ;
 .....
 lines deleted here
 .....
 23: (bf) r3 = r6              ; R3_w=ctx(off=0,imm=0) R6=ctx(off=0,imm=0)
 24: (85) call bpf_probe_read#4
 unknown func bpf_probe_read#4
 processed 23 insns (limit 1000000) max_states_per_insn 0 \
	 total_states 2 peak_states 2 mark_read 2
 -- END PROG LOAD LOG --
 libbpf: prog 'sys_enter': failed to load: -22
 libbpf: failed to load object 'augmented_raw_syscalls_bpf'
 libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -22
 ....

Output after:
 # ./perf test -Fv 111
 111: Check open filename arg using perf trace + vfs_getname          :
 --- start ---
     1.085 ( 0.011 ms): touch/320753 openat(dfd: CWD, filename: \
	"/tmp/temporary_file.SWH85", \
	flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
 ---- end ----
 Check open filename arg using perf trace + vfs_getname: Ok
 #

Test with the sleep command shows:
Output before:
 # ./perf trace -e *sleep sleep 1.234567890
     0.000 (1234.681 ms): sleep/63114 clock_nanosleep(rqtp: \
         { .tv_sec: 0, .tv_nsec: 0 }, rmtp: 0x3ffe0979720) = 0
 #

Output after:
 # ./perf trace -e *sleep sleep 1.234567890
     0.000 (1234.686 ms): sleep/64277 clock_nanosleep(rqtp: \
         { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x3fff3df9ea0) = 0
 #

Fixes: 14e4b9f428 ("perf trace: Raw augmented syscalls fix libbpf 1.0+ compatibility")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Co-developed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: sumanthk@linux.ibm.com
Cc: svens@linux.ibm.com
Link: https://lore.kernel.org/r/20231019082642.3286650-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-19 22:41:46 -07:00
Ian Rogers
33b725ce7b perf trace: Avoid compile error wrt redefining bool
Make part of an existing TODO conditional to avoid the following build
error:
```
tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c:26:14: error: cannot combine with previous 'char' declaration specifier
   26 | typedef char bool;
      |              ^
include/stdbool.h:20:14: note: expanded from macro 'bool'
   20 | #define bool _Bool
      |              ^
tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c:26:1: error: typedef requires a name [-Werror,-Wmissing-declarations]
   26 | typedef char bool;
      | ^~~~~~~~~~~~~~~~~
2 errors generated.
```

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230913184957.230076-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-17 15:51:32 -07:00
Arnaldo Carvalho de Melo
7d9642311b perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(augmented_arg->value) is a power of two.
Similar to what was done in the previous cset for sizeof(saddr), we need
to make sure sizeof(augmented_arg->value) is a power of two to do bounds
checking using &=:

  augmented_len &= sizeof(augmented_arg->value) - 1;

Suggested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZONrPo0NSqdbXiGx@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 10:49:31 -03:00
Arnaldo Carvalho de Melo
262b54b6c9 perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(saddr) is a power of two.
We're using the BPF verifier suggestion:

    22: (85) call bpf_probe_read#4
    R2 min value is negative, either use unsigned or 'var &= const'

That works only when const is a (power of two - 1) so add an assert to
make sure that that is the case.

Suggested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZONrFmJBNlQpSpZj@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 10:48:58 -03:00
Arnaldo Carvalho de Melo
1836480429 perf bpf_skel augmented_raw_syscalls: Cap the socklen parameter using &= sizeof(saddr)
This works with:

  $ clang -v
  clang version 14.0.5 (Fedora 14.0.5-2.fc36)
  $

But not with:

  $ clang -v
  clang version 16.0.6 (Fedora 16.0.6-2.fc38)
  $

  [root@quaco ~]# perf trace -e connect*,sendto* ping -c 10 localhost
  libbpf: prog 'sys_enter_sendto': BPF program load failed: Permission denied
  libbpf: prog 'sys_enter_sendto': -- BEGIN PROG LOAD LOG --
  reg type unsupported for arg#0 function sys_enter_sendto#59
  0: R1=ctx(off=0,imm=0) R10=fp0
  ; int sys_enter_sendto(struct syscall_enter_args *args)
  0: (bf) r6 = r1                       ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
  1: (b7) r1 = 0                        ; R1_w=0
  ; int key = 0;
  2: (63) *(u32 *)(r10 -4) = r1         ; R1_w=0 R10=fp0 fp-8=0000????
  3: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
  ;
  4: (07) r2 += -4                      ; R2_w=fp-4
  ; return bpf_map_lookup_elem(&augmented_args_tmp, &key);
  5: (18) r1 = 0xffff8de5a5b8bc00       ; R1_w=map_ptr(off=0,ks=4,vs=8272,imm=0)
  7: (85) call bpf_map_lookup_elem#1    ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0)
  8: (bf) r7 = r0                       ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0) R7_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0)
  9: (b7) r0 = 1                        ; R0_w=1
  ; if (augmented_args == NULL)
  10: (15) if r7 == 0x0 goto pc+25      ; R7_w=map_value(off=0,ks=4,vs=8272,imm=0)
  ; unsigned int socklen = args->args[5];
  11: (79) r1 = *(u64 *)(r6 +56)        ; R1_w=scalar() R6_w=ctx(off=0,imm=0)
  ;
  12: (bf) r2 = r1                      ; R1_w=scalar(id=2) R2_w=scalar(id=2)
  13: (67) r2 <<= 32                    ; R2_w=scalar(smax=9223372032559808512,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32_max=0)
  14: (77) r2 >>= 32                    ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
  15: (b7) r8 = 128                     ; R8=128
  ; if (socklen > sizeof(augmented_args->saddr))
  16: (25) if r2 > 0x80 goto pc+1       ; R2=scalar(umax=128,var_off=(0x0; 0xff))
  17: (bf) r8 = r1                      ; R1=scalar(id=2) R8_w=scalar(id=2)
  ; const void *sockaddr_arg = (const void *)args->args[4];
  18: (79) r3 = *(u64 *)(r6 +48)        ; R3_w=scalar() R6=ctx(off=0,imm=0)
  ; bpf_probe_read(&augmented_args->saddr, socklen, sockaddr_arg);
  19: (bf) r1 = r7                      ; R1_w=map_value(off=0,ks=4,vs=8272,imm=0) R7=map_value(off=0,ks=4,vs=8272,imm=0)
  20: (07) r1 += 64                     ; R1_w=map_value(off=64,ks=4,vs=8272,imm=0)
  ; bpf_probe_read(&augmented_args->saddr, socklen, sockaddr_arg);
  21: (bf) r2 = r8                      ; R2_w=scalar(id=2) R8_w=scalar(id=2)
  22: (85) call bpf_probe_read#4
  R2 min value is negative, either use unsigned or 'var &= const'
  processed 22 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
  -- END PROG LOAD LOG --
  libbpf: prog 'sys_enter_sendto': failed to load: -13
  libbpf: failed to load object 'augmented_raw_syscalls_bpf'
  libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -13

So use the suggested &= variant since sizeof(saddr) == 128 bytes.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-16 14:02:38 -03:00
Ian Rogers
cd2cece61a perf trace: Tidy comments related to BPF + syscall augmentation
Now tools/perf/examples/bpf/augmented_syscalls.c is
tools/perf/util/bpf_skel/augmented_syscalls.bpf.c and not enabled as a
BPF event, tidy the comments to reflect this.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230810184853.2860737-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-15 16:41:48 -03:00
Ian Rogers
5e6da6be30 perf trace: Migrate BPF augmentation to use a skeleton
Previously a BPF event of augmented_raw_syscalls.c could be used to
enable augmentation of syscalls by perf trace. As BPF events are no
longer supported, switch to using a BPF skeleton which when attached
explicitly opens the sysenter and sysexit tracepoints.

The dump map is removed as debugging wasn't supported by the
augmentation and bpf_printk can be used when necessary.

Remove tools/perf/examples/bpf/augmented_raw_syscalls.c so that the
rename/migration to a BPF skeleton captures that this was the source.

Committer notes:

Some minor stylistic changes to help visualizing the diff.

Use libbpf_strerror when failing to load the augmented raw syscalls BPF.

Use  bpf_object__for_each_program(prog, trace.skel->obj) to disable auto
attachment for all but the sys_enter, sys_exit tracepoints, to avoid
having to add extra lines as we go adding support for more pointer
receiving syscalls.

Committer testing:

  # perf trace -e open*  --max-events=10
     0.000 ( 0.022 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC)    = 11
   208.833 (         ): gnome-terminal/3223 openat(dfd: CWD, filename: "/proc/51250/cmdline")                  ...
   249.993 ( 0.024 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC)    = 11
   250.118 ( 0.030 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.pressure", flags: RDONLY|CLOEXEC) = 11
   250.205 ( 0.016 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.current", flags: RDONLY|CLOEXEC) = 11
   250.244 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.min", flags: RDONLY|CLOEXEC) = 11
   250.282 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.low", flags: RDONLY|CLOEXEC) = 11
   250.320 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.swap.current", flags: RDONLY|CLOEXEC) = 11
   250.355 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.stat", flags: RDONLY|CLOEXEC) = 11
   250.717 ( 0.016 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.pressure", flags: RDONLY|CLOEXEC) = 11
  #
  # perf trace -e *nanosleep*  --max-events=10
         ? (         ): SCTP timer/28304  ... [continued]: clock_nanosleep())                                  = 0
     0.007 (10.058 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0
    10.069 (         ): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) ...
    10.069 (10.056 ms): SCTP timer/28304  ... [continued]: clock_nanosleep())                                  = 0
    17.059 (         ): podman/3572 nanosleep(rqtp: 0x7fc4f4d75be0)                                    ...
    17.059 (10.061 ms): podman/3572  ... [continued]: nanosleep())                                        = 0
    20.131 (10.059 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0
    30.195 (10.038 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0
    40.238 (10.057 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0
    50.301 (         ): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) ...
  #

  # perf trace -e perf_event*  -- perf stat -e instructions,cycles,cache-misses sleep 0.1
     0.000 ( 0.011 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
     0.013 ( 0.003 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
     0.017 ( 0.002 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x3 (PERF_COUNT_HW_CACHE_MISSES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5

 Performance counter stats for 'sleep 0.1':

         1,495,051      instructions                     #    1.11  insn per cycle
         1,347,641      cycles
            35,424      cache-misses

       0.100935279 seconds time elapsed

       0.000924000 seconds user
       0.000000000 seconds sys

  #

  # perf trace -e connect*  ssh localhost
       0.000 ( 0.012 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
       0.118 ( 0.004 ms): ssh/51346 connect(fd: 6, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
       0.399 ( 0.007 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
       0.426 ( 0.003 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
       0.754 ( 0.009 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET, port: 22, addr: 127.0.0.1 }, addrlen: 16) = 0
       0.771 ( 0.010 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET6, port: 22, addr: ::1 }, addrlen: 28) = 0
       0.798 ( 0.053 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET6, port: 22, addr: ::1 }, addrlen: 28) = 0
       0.870 ( 0.004 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
       0.904 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
       0.930 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
       0.957 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
       0.981 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
       1.006 ( 0.004 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
       1.036 ( 0.005 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
      65.077 ( 0.022 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, addrlen: 110) = 0
      66.608 ( 0.014 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, addrlen: 110) = 0
  root@localhost's password:
  #

  # perf trace -e sendto*  ping -c 2 localhost
  PING localhost(localhost (::1)) 56 data bytes
  64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.024 ms
       0.000 ( 0.011 ms): ping/51357 sendto(fd: 5, buff: 0x7ffcca35e620, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20
       0.135 ( 0.026 ms): ping/51357 sendto(fd: 4, buff: 0x5601398f7b20, len: 64, addr: { .family: INET6, port: 58, addr: ::1 }, addr_len: 0x1c) = 64
    1014.929 ( 0.050 ms): ping/51357 sendto(fd: 4, buff: 0x5601398f7b20, len: 64, flags: CONFIRM, addr: { .family: INET6, port: 58, addr: ::1 }, addr_len: 0x1c) = 64
  64 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.046 ms

  --- localhost ping statistics ---
  2 packets transmitted, 2 received, 0% packet loss, time 1015ms
  rtt min/avg/max/mdev = 0.024/0.035/0.046/0.011 ms
  #

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230810184853.2860737-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-15 16:41:48 -03:00